Singular value decomposition (SVD) is a powerful linear algebra and data science tool. It is commonly used for matrix factorization, dimensionality reduction, and data compression. SVD has many important properties, including the fact that it is always possible to decompose a matrix into its singular values and vectors. However, one issue that can arise with SVD is the sign ambiguity of the singular vectors. This blog will explore the sign ambiguity of SVD and how it can affect the interpretation of the results.
The rest of the blog is organized as follows:
numpy
implementation.numpy
.SVD involves factorizing an matrix into the product of three matrices: . Here, the matrix and the transpose of the matrix , denoted as , are both orthogonal.
Note: An orthogonal matrix is a square matrix having its transpose being its inverse.
The matrix is a generalized diagonal matrix.
Note: A rectangular matrix is known as a generalized diagonal matrix if it has nonzero elements only on its diagonal and all off-diagonal entries are zero. On the other hand, a diagonal matrix is a type of square matrix that also qualifies as a generalized diagonal matrix.
To compute the singular value decomposition of a matrix A
using Python’s NumPy library, use the command U,D,Vt = numpy.linalg.svd(A)
. The resulting D
array contains the diagonal entries of the generalized diagonal matrix , while the transpose of the matrix , i.e. Vt
, is returned as Vt
. In the implementation provided below, the diag
function is utilized to transform the D
array into the desired diagonal matrix , which is then stored in the variable S
.
import numpy as npfrom numpy.linalg import svd# Function to construct generalized diagonal matrixdef diag(D, m, n):S = np.zeros((m, n))k = len(D)S[:k, :k]=np.diag(D)return Sm, n = 2, 3A = np.random.rand(m, n)# Compute SVD using NumPyU, D, Vt = svd(A)S = diag(D, m, n)# Compute Frobenius normfrobNorm = np.linalg.norm(U.dot(S).dot(Vt)-A)R = [0, frobNorm]# Use 1e-10 as a threshold for zeroprint(f'The Frobenius norm of USV^T - A is {R[frobNorm>1e-10]}')print(f'U has Orthonormal Columns \n U^T U: \n{abs(np.round(U.T.dot(U), 1))}')print(f'Vt has Orthonormal Rows\n Vt Vt^T: \n{abs(np.round(Vt.dot(Vt.T), 1))}')
It is assumed that the singular values in the generalized diagonal matrix are arranged in descending order. This is also the default order returned by the
numpy.linalg.svd
function, which stores the singular values in theD
array. However, the singular values can be arranged in a different order if the corresponding arrangement of the columns of matrix and the rows of matrix are given.
If the generalized diagonal matrix has some zeros on its diagonal, those columns and rows from the matrices and , respectively, can be eliminated. The resulting matrices and correspond to the nonzero singular values in the diagonal matrix . This allows the original matrix to be expressed as the product of the three matrices , , and , as follows:
This is known as the economic SVD of matrix , where the matrices and may not necessarily be square and the matrix is diagonal. When the matrix has nonzero singular values, which is also the rank of , the relationship between the SVD and economic SVD of can be illustrated in the figure below:
The economicSVD
function below implements the economic SVD for a matrix, A
. It’s worth noticing that the number of nonzero singular values of A
is equal to the rank of A
.
import numpy as npfrom numpy.linalg import svd, matrix_rank as rank# Convert array to diagonal matrixdef diag(D, m, n):S = np.zeros((m, n))k = len(D)S[:k, :k]=np.diag(D)return S# Function for Economic SVDdef economicSVD(A):U, D, Vt = svd(A)# Use 1e-10 as a threshold for zeroidx = np.where(D>1e-10)[0]U_hat, D_hat, Vt_hat = U[:, idx], D[idx], Vt[idx, :]return U_hat, np.diag(D_hat), Vt_hatm, n = 8, 5A = np.random.rand(2, n)# Keep the rank small by repeating rowsA = np.vstack((A, A, A, A))U, D, Vt = svd(A)S = diag(D, m, n)U_hat, S_hat, Vt_hat = economicSVD(A)print(f'Shapes of U, S and Vt are :', U.shape, S.shape, Vt.shape)print(f'Shapes of U_hat, S_hat and Vt_hat are :', U_hat.shape, S_hat.shape, Vt_hat.shape)frobNorm = np.linalg.norm(U.dot(S).dot(Vt)-U_hat.dot(S_hat).dot(Vt_hat))R = [0, frobNorm]print(f'The Frobenius norm of U S V^T - hats(U S V^T) is {R[frobNorm>1e-10]}\nRank(A)={rank(A)}')
The sign ambiguity of SVD refers to the fact that the left and right singular vectors can have arbitrary signs while the singular values are always nonnegative. This means that if there is a matrix and its SVD decomposition is:
where the columns of and the rows of are 's left and right singular vectors, respectively, and is a generalized diagonal matrix containing the singular values of . Then, the signs of the columns of and the corresponding rows of can be flipped and still obtain a valid SVD of .
For example, consider the following matrix:
Its SVD can be computed as:
However, a valid SVD can also be obtained by multiplying the second column of and the second row of by .
Both of these decompositions are valid SVDs of , but the signs of the singular vectors are different. To verify that both of these are valid SVDs of , the following conditions must be true:
The following code verifies the four properties above for the matrices and :
# Import NumPy libraryimport numpy as np# Function to clip values to zero below threshold thdef clip(a,th = 1e-12):# If input is a scalar valueif a.size == 1:# If input is less than the threshold, set it to zeroif a<th:a = 0# If input is an array or matrixelse:# Clip values that are less than the threshold to zeroa[a<th]=0return a# Define matricesA = np.array([[1,2],[2,1]], dtype=np.float64)S = np.array([[3,0],[0,1]], dtype=np.float64)U = np.array([[-0.70710678, -0.70710678],[-0.70710678, 0.70710678]])Vt =np.array([[-0.70710678, -0.70710678],[0.70710678, -0.70710678]])U2 = U.copy()U2[:,1]*=-1V2t = Vt.copy()V2t[1,:]*=-1# Resultsprint('A=\n', A)print('U S V^T=\n', np.round(U.dot(S).dot(Vt)))print('U2 S V2^T=\n',np.round(U2.dot(S).dot(V2t)))print('U^T U=\n', clip(U.T.dot(U)))print('V^T V=\n', clip(Vt.dot(Vt.T)))print('U2^T U2=\n', clip(U2.T.dot(U2)))print('V2^T V2=\n', clip(V2t.dot(V2t.T)))
The sign ambiguity of SVD can have several implications in data science and other fields. Here are a few examples:
Interpretation of singular vectors: The singular vectors obtained from SVD can be interpreted as representing the principal components of the data. However, the sign ambiguity means that the direction of the principal components is not unique. This can make it more challenging to interpret the results of SVD.
Data compression: In some applications of SVD, such as data compression, sign ambiguity can affect the quality of the compressed data. For example, if an image is compressed using SVD, and then the sign of some of the singular vectors are flipped, the resulting compressed image may have artifacts or distortions.
Numerical stability: The sign ambiguity of SVD can also affect the numerical stability of the algorithm. If the signs of the singular vectors are not carefully controlled, rounding errors in the computation can lead to large errors in the results.
Comparing results: Finally, the sign ambiguity of SVD can make it more difficult to compare the results of different SVD decompositions. If two SVDs of the same matrix are obtained and the signs of the singular vectors are different, then it can be challenging to compare the two decompositions and understand how they differ.
There are several ways to handle the sign ambiguity of SVD, depending on the specific application and goals. Here are a few possible approaches:
Consistent sign convention: One common approach is to choose a consistent sign convention for the singular vectors. For example, it can be specified that the first element of each singular vector should be positive. This can help to ensure that the singular vectors are consistent across different SVDs.
In order to compute the matrices , and for an matrix , begin by observing that both and are symmetric matrices. This is because, for any matrices and , we have and . Additionally, the eigenvalues of and are nonnegative. Suppose is an eigenvalue of with a corresponding eigenvector , then:
The proof for is similar.
Note: A symmetric matrix with nonnegative eigenvalues is called a positive semi-definite matrix.
The matrices , and need to be found so that:
If , then:
The expression in LaTeX above represents an orthogonal diagonalization of .
Note: Every symmetric matrix is orthogonally diagonalizable, that is, the matrix is an orthogonal matrix, and the matrix is a diagonal matrix.
This implies that is a matrix of orthonormal eigenvectors of , while is the diagonal matrix of corresponding eigenvalues. It can also be derived that , where is a matrix of orthonormal eigenvectors of and is the diagonal matrix of corresponding eigenvalues. Since the diagonal values of and are identical, the subscript can be omitted and denoted as . Finally, is a generalized diagonal matrix containing the square roots of the diagonal entries of . Since both and are positive semi-definite, all diagonal entries of are nonnegative, which allows real square roots to be obtained.
Given a matrix :
D,U = np.linalg.eigh(A.dot(A.T))
D,V = np.linalg.eigh(A.T.dot(A))
S = diag(D,A.shape[0],A.shape[1])**0.5
There’s a potential problem with the naive SVD algorithm–that the resulting matrices , and may not satisfy the reconstruction condition of , that is, . This is because, the reconstruction constraint was never exploited while computing SVD. To incorporate this missing constraint, compute and by an eigen decomposition of , and then compute the matrix by forcing the constraint as follows:
Given a matrix :
D,V = np.linalg.eigh(A.T.dot(A))
S = diag(D,A.shape[0],A.shape[1])**0.5
S_inv = np.linalg.pinv(S)
Notice that may be singlar, so psuedo inverse is used to be on the safe side.
U = A.dot(V).dot(S_inv)
The following code implements the SVD algorithm:
# Importing NumPy library for scientific computing with Pythonimport numpy as np# Defining a function to create a diagonal matrixdef diag(D, m, n):# Creating a matrix of zeros with the dimensions (m, n)S = np.zeros((m, n))# Setting the first k diagonal elements of S to the values of Dk = len(D)S[:k, :k] = np.diag(D)return S# Defining a function to clip small values in an array to zerodef clip(a, th=1e-12):# If the size of the array is 1 and its value is less than th, set it to zeroif a.size == 1:if a < th:a = 0# Otherwise, set all elements less than th to zeroelse:a[a < th] = 0return a# Generating random values for the dimensions of matrix Am = np.random.randint(1, 20)n = np.random.randint(1, 20)# Generating a random matrix of shape (m, n) using NumPy's random.randn() functionA = np.random.randn(m, n)# Calculating the eigenvalues (D) and eigenvectors (V) of A^T.A using NumPy's linalg.eigh() functionD, V = np.linalg.eigh(A.T.dot(A))# Creating a diagonal matrix S using the clipped eigenvalues and taking the square root of each valueS = diag(clip(D), A.shape[0], A.shape[1]) ** 0.5# Calculating the inverse of S using NumPy's linalg.pinv() functionS_inv = np.linalg.pinv(S)# Calculating the matrix U by multiplying A with V and S_invU = A.dot(V).dot(S_inv)# Reconstructing the matrix A using U, S, and V^TA_hat = U.dot(S).dot(V.T)# Counting the number of sign differences between A and A_hatsign_dif = np.sum(np.sign(A) != np.sign(A_hat))# Calculating the total value difference between A and A_hat using NumPy's linalg.norm() functionvalue_dif = np.linalg.norm(A - A_hat)# Printing the resultsprint('Number of sign differences = ', sign_dif, '\nTotal Value differences = ', clip(value_dif))
SVD is one of the most useful decompositions in data science. (For detailed insights, please head over to the Educative course Linear Algebra for Data Science Using Python The sign ambiguity of SVD is an important issue to consider when working with SVD in data science and other fields. The fact that the singular vectors can have arbitrary signs means that the interpretation of the results can be more challenging, and the numerical stability of the algorithm can be affected. However, there are several ways to handle the sign ambiguity, such as choosing a consistent sign convention. By understanding the sign ambiguity and how to handle it, developers can make the most of the power and flexibility of SVD in their data analysis and modeling.
Free Resources