Home/Blog/Data Science/Sign Ambiguity in Singular Value Decomposition (SVD)
Home/Blog/Data Science/Sign Ambiguity in Singular Value Decomposition (SVD)

Sign Ambiguity in Singular Value Decomposition (SVD)

12 min read
Jun 02, 2023
content
What is SVD?
SVD in numpy
Singular values and vectors
Economic SVD
Implementation of economic SVD
What is the sign ambiguity of SVD?
Derivation of SVD
Naive SVD algorithm
Improved SVD algorithm
Conclusion

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

Singular value decomposition (SVD) is a powerful linear algebra and data science tool. It is commonly used for matrix factorization, dimensionality reduction, and data compression. SVD has many important properties, including the fact that it is always possible to decompose a matrix into its singular values and vectors. However, one issue that can arise with SVD is the sign ambiguity of the singular vectors. This blog will explore the sign ambiguity of SVD and how it can affect the interpretation of the results.

The rest of the blog is organized as follows:

  1. An introduction to SVD and it’s economic variant along with the numpy implementation.
  2. Sign ambiguity, its implications in data science, and its resolution.
  3. How to derive SVD from scratch and implement it in numpy.

What is SVD?#

SVD involves factorizing an n×dn\times d matrix AA into the product of three matrices: A=UΣVTA = U\Sigma V^T. Here, the matrix Un×nU_{n\times n} and the transpose of the matrix Vd×dV_{d\times d}, denoted as VTV^T, are both orthogonal.

Note: An orthogonal matrix is a square matrix having its transpose being its inverse.

The matrix Σn×d\Sigma_{n\times d} is a generalized diagonal matrix.

Note: A rectangular matrix is known as a generalized diagonal matrix if it has nonzero elements only on its diagonal and all off-diagonal entries are zero. On the other hand, a diagonal matrix is a type of square matrix that also qualifies as a generalized diagonal matrix.

SVD in numpy#

To compute the singular value decomposition of a matrix A using Python’s NumPy library, use the command U,D,Vt = numpy.linalg.svd(A). The resulting D array contains the diagonal entries of the generalized diagonal matrix Σ\Sigma, while the transpose of the matrix VV, i.e. Vt, is returned as Vt. In the implementation provided below, the diag function is utilized to transform the D array into the desired diagonal matrix Σ\Sigma, which is then stored in the variable S.

import numpy as np
from numpy.linalg import svd
# Function to construct generalized diagonal matrix
def diag(D, m, n):
S = np.zeros((m, n))
k = len(D)
S[:k, :k]=np.diag(D)
return S
m, n = 2, 3
A = np.random.rand(m, n)
# Compute SVD using NumPy
U, D, Vt = svd(A)
S = diag(D, m, n)
# Compute Frobenius norm
frobNorm = np.linalg.norm(U.dot(S).dot(Vt)-A)
R = [0, frobNorm]
# Use 1e-10 as a threshold for zero
print(f'The Frobenius norm of USV^T - A is {R[frobNorm>1e-10]}')
print(f'U has Orthonormal Columns \n U^T U: \n{abs(np.round(U.T.dot(U), 1))}')
print(f'Vt has Orthonormal Rows\n Vt Vt^T: \n{abs(np.round(Vt.dot(Vt.T), 1))}')

Singular values and vectors#

  • The diagonal values of Σ\Sigma are the singular values of AA.
  • The columns of UU are the left singular vectors of AA.
  • The rows of VTV^T are the right singular vectors of AA.

It is assumed that the singular values in the generalized diagonal matrix Σ\Sigma are arranged in descending order. This is also the default order returned by the numpy.linalg.svd function, which stores the singular values in the D array. However, the singular values can be arranged in a different order if the corresponding arrangement of the columns of matrix UU and the rows of matrix VTV^T are given.

Economic SVD#

If the generalized diagonal matrix Σ\Sigma has some zeros on its diagonal, those columns and rows from the matrices UU and VTV^T, respectively, can be eliminated. The resulting matrices U^\hat U and V^\hat V correspond to the nonzero singular values in the diagonal matrix Σ^\hat\Sigma. This allows the original matrix AA to be expressed as the product of the three matrices U^\hat U, Σ^\hat\Sigma, and V^T\hat V^T, as follows:

A=UΣVT=U^Σ^V^TA=U\Sigma V^T=\hat U \hat \Sigma \hat V^T

This is known as the economic SVD of matrix AA, where the matrices U^\hat U and V^\hat V may not necessarily be square and the matrix Σ^\hat\Sigma is diagonal. When the matrix AA has rr nonzero singular values, which is also the rank of AA, the relationship between the SVD and economic SVD of AA can be illustrated in the figure below:

Economic SVD vs SVD
Economic SVD vs SVD

Implementation of economic SVD#

The economicSVD function below implements the economic SVD for a matrix, A. It’s worth noticing that the number of nonzero singular values of A is equal to the rank of A.

import numpy as np
from numpy.linalg import svd, matrix_rank as rank
# Convert array to diagonal matrix
def diag(D, m, n):
S = np.zeros((m, n))
k = len(D)
S[:k, :k]=np.diag(D)
return S
# Function for Economic SVD
def economicSVD(A):
U, D, Vt = svd(A)
# Use 1e-10 as a threshold for zero
idx = np.where(D>1e-10)[0]
U_hat, D_hat, Vt_hat = U[:, idx], D[idx], Vt[idx, :]
return U_hat, np.diag(D_hat), Vt_hat
m, n = 8, 5
A = np.random.rand(2, n)
# Keep the rank small by repeating rows
A = np.vstack((A, A, A, A))
U, D, Vt = svd(A)
S = diag(D, m, n)
U_hat, S_hat, Vt_hat = economicSVD(A)
print(f'Shapes of U, S and Vt are :', U.shape, S.shape, Vt.shape)
print(f'Shapes of U_hat, S_hat and Vt_hat are :', U_hat.shape, S_hat.shape, Vt_hat.shape)
frobNorm = np.linalg.norm(U.dot(S).dot(Vt)-U_hat.dot(S_hat).dot(Vt_hat))
R = [0, frobNorm]
print(f'The Frobenius norm of U S V^T - hats(U S V^T) is {R[frobNorm>1e-10]}\nRank(A)={rank(A)}')

What is the sign ambiguity of SVD?#

The sign ambiguity of SVD refers to the fact that the left and right singular vectors can have arbitrary signs while the singular values are always nonnegative. This means that if there is a matrix AA and its SVD decomposition is:

A=UΣVTA = U\Sigma V^T

where the columns of UU and the rows of VTV^T are AA's left and right singular vectors, respectively, and Σ\Sigma is a generalized diagonal matrix containing the singular values of AA. Then, the signs of the columns of UU and the corresponding rows of VTV^T can be flipped and still obtain a valid SVD of AA.

For example, consider the following matrix:

A=[1221]A = \begin{bmatrix}1 & 2\\ 2 & 1\end{bmatrix}

Its SVD can be computed as:

Σ=[3001]U=[0.707106780.707106780.707106780.70710678]VT=[0.707106780.707106780.707106780.70710678]\begin{align*} \Sigma =& \begin{bmatrix}3& 0\\ 0& 1\end{bmatrix} \\ U =& \begin{bmatrix}-0.70710678& -0.70710678\\ -0.70710678& 0.70710678\end{bmatrix} \\ V^T =& \begin{bmatrix}-0.70710678& -0.70710678\\ 0.70710678& -0.70710678\end{bmatrix} \end{align*}

However, a valid SVD can also be obtained by multiplying the second column of UU and the second row of VTV^T by 1-1.

Σ=[3001]U2=[0.707106780.707106780.707106780.70710678]V2T=[0.707106780.707106780.707106780.70710678]\begin{align*} \Sigma =& \begin{bmatrix}3& 0\\ 0& 1\end{bmatrix} \\ U_2 =& \begin{bmatrix}-0.70710678& 0.70710678\\ -0.70710678& -0.70710678\end{bmatrix} \\ V_2^T =& \begin{bmatrix}-0.70710678& -0.70710678\\ -0.70710678& 0.70710678\end{bmatrix} \end{align*}

Both of these decompositions are valid SVDs of AA, but the signs of the singular vectors are different. To verify that both of these are valid SVDs of AA, the following conditions must be true:

  1. Σ\Sigma is a generalized diagonal matrix with nonnegative entries in the main diagonal.
  2. UTU=I,U2TU2=IU^TU=I, \quad U^T_2U_2=I.
  3. VTV=I,V2TV2=IV^TV=I, \quad V^T_2V_2=I.
  4. A=UΣVT=U2ΣV2TA=U\Sigma V^T=U_2\Sigma V^T_2.

The following code verifies the four properties above for the matrices A,U,V,U2,V2A,U,V,U_2,V_2 and Σ\Sigma:

# Import NumPy library
import numpy as np
# Function to clip values to zero below threshold th
def clip(a,th = 1e-12):
# If input is a scalar value
if a.size == 1:
# If input is less than the threshold, set it to zero
if a<th:
a = 0
# If input is an array or matrix
else:
# Clip values that are less than the threshold to zero
a[a<th]=0
return a
# Define matrices
A = np.array([[1,2],[2,1]], dtype=np.float64)
S = np.array([[3,0],[0,1]], dtype=np.float64)
U = np.array([[-0.70710678, -0.70710678],[-0.70710678, 0.70710678]])
Vt =np.array([[-0.70710678, -0.70710678],[0.70710678, -0.70710678]])
U2 = U.copy()
U2[:,1]*=-1
V2t = Vt.copy()
V2t[1,:]*=-1
# Results
print('A=\n', A)
print('U S V^T=\n', np.round(U.dot(S).dot(Vt)))
print('U2 S V2^T=\n',np.round(U2.dot(S).dot(V2t)))
print('U^T U=\n', clip(U.T.dot(U)))
print('V^T V=\n', clip(Vt.dot(Vt.T)))
print('U2^T U2=\n', clip(U2.T.dot(U2)))
print('V2^T V2=\n', clip(V2t.dot(V2t.T)))

The sign ambiguity of SVD can have several implications in data science and other fields. Here are a few examples:

Interpretation of singular vectors: The singular vectors obtained from SVD can be interpreted as representing the principal components of the data. However, the sign ambiguity means that the direction of the principal components is not unique. This can make it more challenging to interpret the results of SVD.

Data compression: In some applications of SVD, such as data compression, sign ambiguity can affect the quality of the compressed data. For example, if an image is compressed using SVD, and then the sign of some of the singular vectors are flipped, the resulting compressed image may have artifacts or distortions.

Numerical stability: The sign ambiguity of SVD can also affect the numerical stability of the algorithm. If the signs of the singular vectors are not carefully controlled, rounding errors in the computation can lead to large errors in the results.

Comparing results: Finally, the sign ambiguity of SVD can make it more difficult to compare the results of different SVD decompositions. If two SVDs of the same matrix are obtained and the signs of the singular vectors are different, then it can be challenging to compare the two decompositions and understand how they differ.

There are several ways to handle the sign ambiguity of SVD, depending on the specific application and goals. Here are a few possible approaches:

Consistent sign convention: One common approach is to choose a consistent sign convention for the singular vectors. For example, it can be specified that the first element of each singular vector should be positive. This can help to ensure that the singular vectors are consistent across different SVDs.

Derivation of SVD#

In order to compute the matrices U,ΣU, \Sigma, and VV for an n×dn \times d matrix AA, begin by observing that both ATAA^TA and AATAA^T are symmetric matrices. This is because, for any matrices AA and BB, we have (AB)T=BTAT(AB)^T = B^TA^T and (AT)T=A(A^T)^T = A. Additionally, the eigenvalues of ATAA^TA and AATAA^T are nonnegative. Suppose λ\lambda is an eigenvalue of ATAA^TA with a corresponding eigenvector x\bold x, then:

ATAx=λx    xTATAx=λxTx    λ=Ax2x20A^TA\bold{x}=\lambda \bold{x}\implies \bold{x}^TA^TA\bold{x}=\lambda \bold{x}^T\bold{x} \implies \lambda=\frac{\|A\bold{x}\|^2}{\|\bold{x}\|^2}\ge0

The proof for AATAA^T is similar.

Note: A symmetric matrix with nonnegative eigenvalues is called a positive semi-definite matrix.

The matrices U,ΣU,\Sigma, and VV need to be found so that:

  1. UTU=UUT=IU^TU=UU^T=I.
  2. VTV=VVT=IV^TV=VV^T=I.
  3. Σ\Sigma is a generalized diagonal matrix, and hence, ΣTΣ\Sigma^T\Sigma and ΣΣT\Sigma\Sigma^T are diagonal matrices with the same diagonal entries.
  4. A=UΣVTA=U\Sigma V^T.

If A=UΣVTA=U\Sigma V^T, then:

ATA=(UΣVT)T(UΣVT)=VΣTUTUΣVT=VΣTIΣVT=VΣTΣVT=VD1VT\begin{align*} A^TA&=(U\Sigma V^T)^T(U\Sigma V^T) \\ &=V\Sigma^TU^TU\Sigma V^T \\ &=V\Sigma^TI\Sigma V^T \\ &=V\Sigma^T\Sigma V^T \\ &=VD_1V^T \end{align*}

The expression VD1VTVD_1V^T in LaTeX above represents an orthogonal diagonalization of ATAA^TA.

Note: Every symmetric matrix is orthogonally diagonalizable, that is, the matrix VV is an orthogonal matrix, and the matrix D1D_1 is a diagonal matrix.

This implies that VV is a matrix of orthonormal eigenvectors of ATAA^TA, while D1D_1 is the diagonal matrix of corresponding eigenvalues. It can also be derived that AAT=UD2UTAA^T=UD_2U^T, where UU is a matrix of orthonormal eigenvectors of AATAA^T and D2D_2 is the diagonal matrix of corresponding eigenvalues. Since the diagonal values of D1D_1 and D2D_2 are identical, the subscript can be omitted and denoted as DD. Finally, Σ\Sigma is a generalized diagonal matrix containing the square roots of the diagonal entries of DD. Since both ATAA^TA and AATAA^T are positive semi-definite, all diagonal entries of DD are nonnegative, which allows real square roots to be obtained.

Naive SVD algorithm#

Given a matrix AA:

  1. To compute the eigenvalues(DD) and eigenvectors(UU) of AATAA^T, use the code below:
D,U = np.linalg.eigh(A.dot(A.T))
  1. The eigenvectors(VV) of ATAA^TA can be computed by using the following code:
D,V = np.linalg.eigh(A.T.dot(A))
  1. Finally, make a generalized diagonal matrix(Σ\Sigma) with the square roots of DD by using this code:
S = diag(D,A.shape[0],A.shape[1])**0.5

There’s a potential problem with the naive SVD algorithm–that the resulting matrices U,ΣU,\Sigma, and VTV^T may not satisfy the reconstruction condition of AA, that is, A=UΣVTA=U\Sigma V^T. This is because, the reconstruction constraint was never exploited while computing SVD. To incorporate this missing constraint, compute VV and Σ\Sigma by an eigen decomposition of ATAA^TA, and then compute the matrix UU by forcing the constraint as follows:

A=UΣVTAV=UΣAVΣ1=UU=AVΣ1\begin{align*} &A &=&U\Sigma V^T \\ &AV &=&U\Sigma \\ &AV\Sigma^{-1} &=&U\\ &U&=& AV\Sigma^{-1} \end{align*}

Improved SVD algorithm#

Given a matrix AA:

  1. To compute the eigenvalues(DD) and eigenvectors(VV) of ATAA^TA, use the code below:
D,V = np.linalg.eigh(A.T.dot(A))
  1. Make a generalized diagonal matrix, Σ\Sigma, with the square roots of DD by using this code:
S = diag(D,A.shape[0],A.shape[1])**0.5
  1. Then, compute the inverse of Σ\Sigma using the following code:
S_inv = np.linalg.pinv(S)

Notice that Σ\Sigma may be singlar, so psuedo inverse is used to be on the safe side.

  1. Finally, compute UU by using this code:
U = A.dot(V).dot(S_inv)

The following code implements the SVD algorithm:

# Importing NumPy library for scientific computing with Python
import numpy as np
# Defining a function to create a diagonal matrix
def diag(D, m, n):
# Creating a matrix of zeros with the dimensions (m, n)
S = np.zeros((m, n))
# Setting the first k diagonal elements of S to the values of D
k = len(D)
S[:k, :k] = np.diag(D)
return S
# Defining a function to clip small values in an array to zero
def clip(a, th=1e-12):
# If the size of the array is 1 and its value is less than th, set it to zero
if a.size == 1:
if a < th:
a = 0
# Otherwise, set all elements less than th to zero
else:
a[a < th] = 0
return a
# Generating random values for the dimensions of matrix A
m = np.random.randint(1, 20)
n = np.random.randint(1, 20)
# Generating a random matrix of shape (m, n) using NumPy's random.randn() function
A = np.random.randn(m, n)
# Calculating the eigenvalues (D) and eigenvectors (V) of A^T.A using NumPy's linalg.eigh() function
D, V = np.linalg.eigh(A.T.dot(A))
# Creating a diagonal matrix S using the clipped eigenvalues and taking the square root of each value
S = diag(clip(D), A.shape[0], A.shape[1]) ** 0.5
# Calculating the inverse of S using NumPy's linalg.pinv() function
S_inv = np.linalg.pinv(S)
# Calculating the matrix U by multiplying A with V and S_inv
U = A.dot(V).dot(S_inv)
# Reconstructing the matrix A using U, S, and V^T
A_hat = U.dot(S).dot(V.T)
# Counting the number of sign differences between A and A_hat
sign_dif = np.sum(np.sign(A) != np.sign(A_hat))
# Calculating the total value difference between A and A_hat using NumPy's linalg.norm() function
value_dif = np.linalg.norm(A - A_hat)
# Printing the results
print('Number of sign differences = ', sign_dif, '\nTotal Value differences = ', clip(value_dif))

Conclusion#

SVD is one of the most useful decompositions in data science. (For detailed insights, please head over to the Educative course Linear Algebra for Data Science Using Python The sign ambiguity of SVD is an important issue to consider when working with SVD in data science and other fields. The fact that the singular vectors can have arbitrary signs means that the interpretation of the results can be more challenging, and the numerical stability of the algorithm can be affected. However, there are several ways to handle the sign ambiguity, such as choosing a consistent sign convention. By understanding the sign ambiguity and how to handle it, developers can make the most of the power and flexibility of SVD in their data analysis and modeling.


Written By:

Free Resources