Home/Blog/Data Science/Sign Ambiguity in Singular Value Decomposition (SVD)

Sign Ambiguity in Singular Value Decomposition (SVD)

12 min read

Jun 02, 2023

content

What is SVD?

SVD in numpy

Singular values and vectors

Economic SVD

Implementation of economic SVD

What is the sign ambiguity of SVD?

Derivation of SVD

Naive SVD algorithm

Improved SVD algorithm

Conclusion

Become a Software Engineer in Months, Not Years

From your first line of code, to your first day on the job — Educative has you covered. Join 2M+ developers learning in-demand programming skills.

Singular value decomposition (SVD) is a powerful linear algebra and data science tool. It is commonly used for matrix factorization, dimensionality reduction, and data compression. SVD has many important properties, including the fact that it is always possible to decompose a matrix into its singular values and vectors. However, one issue that can arise with SVD is the sign ambiguity of the singular vectors. This blog will explore the sign ambiguity of SVD and how it can affect the interpretation of the results.

The rest of the blog is organized as follows:

An introduction to SVD and it’s economic variant along with the numpy implementation.
Sign ambiguity, its implications in data science, and its resolution.
How to derive SVD from scratch and implement it in numpy.

What is SVD?#

SVD involves factorizing an $n\times d$ matrix $A$ into the product of three matrices: $A = U\Sigma V^T$ . Here, the matrix $U_{n\times n}$ and the transpose of the matrix $V_{d\times d}$ , denoted as $V^T$ , are both orthogonal.

Note: An orthogonal matrix is a square matrix having its transpose being its inverse.

The matrix $\Sigma_{n\times d}$ is a generalized diagonal matrix.

Note: A rectangular matrix is known as a generalized diagonal matrix if it has nonzero elements only on its diagonal and all off-diagonal entries are zero. On the other hand, a diagonal matrix is a type of square matrix that also qualifies as a generalized diagonal matrix.

SVD in numpy#

To compute the singular value decomposition of a matrix A using Python’s NumPy library, use the command U,D,Vt = numpy.linalg.svd(A). The resulting D array contains the diagonal entries of the generalized diagonal matrix $\Sigma$ , while the transpose of the matrix $V$ , i.e. Vt, is returned as Vt. In the implementation provided below, the diag function is utilized to transform the D array into the desired diagonal matrix $\Sigma$ , which is then stored in the variable S.

Python 3.10.4

import numpy as np
from numpy.linalg import svd
# Function to construct generalized diagonal matrix
def diag(D, m, n):
    S  = np.zeros((m, n))
    k = len(D)
    S[:k, :k]=np.diag(D)
    return S
m, n = 2, 3
A = np.random.rand(m, n)
# Compute SVD using NumPy
U, D, Vt = svd(A)
S = diag(D, m, n)
# Compute Frobenius norm
frobNorm = np.linalg.norm(U.dot(S).dot(Vt)-A)
R = [0, frobNorm]
# Use 1e-10 as a threshold for zero
print(f'The Frobenius norm of USV^T - A is {R[frobNorm>1e-10]}')
print(f'U has Orthonormal Columns \n U^T U: \n{abs(np.round(U.T.dot(U), 1))}')
print(f'Vt has Orthonormal Rows\n Vt Vt^T: \n{abs(np.round(Vt.dot(Vt.T), 1))}')

Singular values and vectors#

The diagonal values of $\Sigma$ are the singular values of $A$ .
The columns of $U$ are the left singular vectors of $A$ .
The rows of $V^T$ are the right singular vectors of $A$ .

It is assumed that the singular values in the generalized diagonal matrix $\Sigma$ are arranged in descending order. This is also the default order returned by the numpy.linalg.svd function, which stores the singular values in the D array. However, the singular values can be arranged in a different order if the corresponding arrangement of the columns of matrix $U$ and the rows of matrix $V^T$ are given.

Economic SVD#

If the generalized diagonal matrix $\Sigma$ has some zeros on its diagonal, those columns and rows from the matrices $U$ and $V^T$ , respectively, can be eliminated. The resulting matrices $\hat U$ and $\hat V$ correspond to the nonzero singular values in the diagonal matrix $\hat\Sigma$ . This allows the original matrix $A$ to be expressed as the product of the three matrices $\hat U$ , $\hat\Sigma$ , and $\hat V^T$ , as follows:

A=U\Sigma V^T=\hat U \hat \Sigma \hat V^T

This is known as the economic SVD of matrix $A$ , where the matrices $\hat U$ and $\hat V$ may not necessarily be square and the matrix $\hat\Sigma$ is diagonal. When the matrix $A$ has $r$ nonzero singular values, which is also the rank of $A$ , the relationship between the SVD and economic SVD of $A$ can be illustrated in the figure below:

Python 3.8

import numpy as np
from numpy.linalg import svd, matrix_rank as rank
# Convert array to diagonal matrix
def diag(D, m, n):
    S  = np.zeros((m, n))
    k = len(D)
    S[:k, :k]=np.diag(D)
    return S
# Function for Economic SVD
def economicSVD(A):
    U, D, Vt = svd(A)
    # Use 1e-10 as a threshold for zero
    idx = np.where(D>1e-10)[0]
    U_hat, D_hat, Vt_hat = U[:, idx], D[idx], Vt[idx, :]
    return U_hat, np.diag(D_hat), Vt_hat
m, n = 8, 5
A = np.random.rand(2, n)
# Keep the rank small by repeating rows
A = np.vstack((A, A, A, A)) 
U, D, Vt = svd(A)
S = diag(D, m, n)
U_hat, S_hat, Vt_hat = economicSVD(A)
print(f'Shapes of U, S and Vt are :', U.shape, S.shape, Vt.shape)
print(f'Shapes of U_hat, S_hat and Vt_hat are :', U_hat.shape, S_hat.shape, Vt_hat.shape)
frobNorm = np.linalg.norm(U.dot(S).dot(Vt)-U_hat.dot(S_hat).dot(Vt_hat))
R = [0, frobNorm]
print(f'The Frobenius norm of U S V^T - hats(U S V^T) is {R[frobNorm>1e-10]}\nRank(A)={rank(A)}')

What is the sign ambiguity of SVD?#

The sign ambiguity of SVD refers to the fact that the left and right singular vectors can have arbitrary signs while the singular values are always nonnegative. This means that if there is a matrix $A$ and its SVD decomposition is:

A = U\Sigma V^T

where the columns of $U$ and the rows of $V^T$ are $A$ 's left and right singular vectors, respectively, and $\Sigma$ is a generalized diagonal matrix containing the singular values of $A$ . Then, the signs of the columns of $U$ and the corresponding rows of $V^T$ can be flipped and still obtain a valid SVD of $A$ .

For example, consider the following matrix:

A = \begin{bmatrix}1 & 2\\ 2 & 1\end{bmatrix}

Its SVD can be computed as:

\begin{align*} \Sigma =& \begin{bmatrix}3& 0\\ 0& 1\end{bmatrix} \\ U =& \begin{bmatrix}-0.70710678& -0.70710678\\ -0.70710678& 0.70710678\end{bmatrix} \\ V^T =& \begin{bmatrix}-0.70710678& -0.70710678\\ 0.70710678& -0.70710678\end{bmatrix} \end{align*}

However, a valid SVD can also be obtained by multiplying the second column of $U$ and the second row of $V^T$ by $-1$ .

\begin{align*} \Sigma =& \begin{bmatrix}3& 0\\ 0& 1\end{bmatrix} \\ U_2 =& \begin{bmatrix}-0.70710678& 0.70710678\\ -0.70710678& -0.70710678\end{bmatrix} \\ V_2^T =& \begin{bmatrix}-0.70710678& -0.70710678\\ -0.70710678& 0.70710678\end{bmatrix} \end{align*}

Both of these decompositions are valid SVDs of $A$ , but the signs of the singular vectors are different. To verify that both of these are valid SVDs of $A$ , the following conditions must be true:

$\Sigma$ is a generalized diagonal matrix with nonnegative entries in the main diagonal.
$U^TU=I, \quad U^T_2U_2=I$ .
$V^TV=I, \quad V^T_2V_2=I$ .
$A=U\Sigma V^T=U_2\Sigma V^T_2$ .

The following code verifies the four properties above for the matrices $A,U,V,U_2,V_2$ and $\Sigma$ :

Python 3.10.4

# Import NumPy library
import numpy as np
# Function to clip values to zero below threshold th
def clip(a,th = 1e-12):
    # If input is a scalar value
    if a.size == 1:
        # If input is less than the threshold, set it to zero
        if a<th:
            a = 0
    # If input is an array or matrix
    else:
        # Clip values that are less than the threshold to zero
        a[a<th]=0
    return a
# Define matrices
A = np.array([[1,2],[2,1]], dtype=np.float64)
S = np.array([[3,0],[0,1]], dtype=np.float64)
U = np.array([[-0.70710678, -0.70710678],[-0.70710678, 0.70710678]])
Vt =np.array([[-0.70710678, -0.70710678],[0.70710678, -0.70710678]])
U2 = U.copy()
U2[:,1]*=-1
V2t = Vt.copy()
V2t[1,:]*=-1
# Results
print('A=\n', A)
print('U S V^T=\n', np.round(U.dot(S).dot(Vt)))
print('U2 S V2^T=\n',np.round(U2.dot(S).dot(V2t)))
print('U^T U=\n', clip(U.T.dot(U)))
print('V^T V=\n', clip(Vt.dot(Vt.T)))
print('U2^T U2=\n', clip(U2.T.dot(U2)))
print('V2^T V2=\n', clip(V2t.dot(V2t.T)))

Data compression: In some applications of SVD, such as data compression, sign ambiguity can affect the quality of the compressed data. For example, if an image is compressed using SVD, and then the sign of some of the singular vectors are flipped, the resulting compressed image may have artifacts or distortions.

Numerical stability: The sign ambiguity of SVD can also affect the numerical stability of the algorithm. If the signs of the singular vectors are not carefully controlled, rounding errors in the computation can lead to large errors in the results.

Comparing results: Finally, the sign ambiguity of SVD can make it more difficult to compare the results of different SVD decompositions. If two SVDs of the same matrix are obtained and the signs of the singular vectors are different, then it can be challenging to compare the two decompositions and understand how they differ.

Derivation of SVD#

In order to compute the matrices $U, \Sigma$ , and $V$ for an $n \times d$ matrix $A$ , begin by observing that both $A^TA$ and $AA^T$ are symmetric matrices. This is because, for any matrices $A$ and $B$ , we have $(AB)^T = B^TA^T$ and $(A^T)^T = A$ . Additionally, the eigenvalues of $A^TA$ and $AA^T$ are nonnegative. Suppose $\lambda$ is an eigenvalue of $A^TA$ with a corresponding eigenvector $\bold x$ , then:

A^TA\bold{x}=\lambda \bold{x}\implies \bold{x}^TA^TA\bold{x}=\lambda \bold{x}^T\bold{x} \implies \lambda=\frac{\|A\bold{x}\|^2}{\|\bold{x}\|^2}\ge0

The proof for $AA^T$ is similar.

Note: A symmetric matrix with nonnegative eigenvalues is called a positive semi-definite matrix.

The matrices $U,\Sigma$ , and $V$ need to be found so that:

$U^TU=UU^T=I$ .
$V^TV=VV^T=I$ .
$\Sigma$ is a generalized diagonal matrix, and hence, $\Sigma^T\Sigma$ and $\Sigma\Sigma^T$ are diagonal matrices with the same diagonal entries.
$A=U\Sigma V^T$ .

If $A=U\Sigma V^T$ , then:

\begin{align*} A^TA&=(U\Sigma V^T)^T(U\Sigma V^T) \\ &=V\Sigma^TU^TU\Sigma V^T \\ &=V\Sigma^TI\Sigma V^T \\ &=V\Sigma^T\Sigma V^T \\ &=VD_1V^T \end{align*}

The expression $VD_1V^T$ in LaTeX above represents an orthogonal diagonalization of $A^TA$ .

Note: Every symmetric matrix is orthogonally diagonalizable, that is, the matrix $V$ is an orthogonal matrix, and the matrix $D_1$ is a diagonal matrix.

This implies that $V$ is a matrix of orthonormal eigenvectors of $A^TA$ , while $D_1$ is the diagonal matrix of corresponding eigenvalues. It can also be derived that $AA^T=UD_2U^T$ , where $U$ is a matrix of orthonormal eigenvectors of $AA^T$ and $D_2$ is the diagonal matrix of corresponding eigenvalues. Since the diagonal values of $D_1$ and $D_2$ are identical, the subscript can be omitted and denoted as $D$ . Finally, $\Sigma$ is a generalized diagonal matrix containing the square roots of the diagonal entries of $D$ . Since both $A^TA$ and $AA^T$ are positive semi-definite, all diagonal entries of $D$ are nonnegative, which allows real square roots to be obtained.

Naive SVD algorithm#

Given a matrix $A$ :

To compute the eigenvalues( $D$ ) and eigenvectors( $U$ ) of $AA^T$ , use the code below:

D,U = np.linalg.eigh(A.dot(A.T))

The eigenvectors( $V$ ) of $A^TA$ can be computed by using the following code:

D,V = np.linalg.eigh(A.T.dot(A))

Finally, make a generalized diagonal matrix( $\Sigma$ ) with the square roots of $D$ by using this code:

S = diag(D,A.shape[0],A.shape[1])**0.5

There’s a potential problem with the naive SVD algorithm–that the resulting matrices $U,\Sigma$ , and $V^T$ may not satisfy the reconstruction condition of $A$ , that is, $A=U\Sigma V^T$ . This is because, the reconstruction constraint was never exploited while computing SVD. To incorporate this missing constraint, compute $V$ and $\Sigma$ by an eigen decomposition of $A^TA$ , and then compute the matrix $U$ by forcing the constraint as follows:

\begin{align*} &A &=&U\Sigma V^T \\ &AV &=&U\Sigma \\ &AV\Sigma^{-1} &=&U\\ &U&=& AV\Sigma^{-1} \end{align*}

Improved SVD algorithm#

Given a matrix $A$ :

To compute the eigenvalues( $D$ ) and eigenvectors( $V$ ) of $A^TA$ , use the code below:

D,V = np.linalg.eigh(A.T.dot(A))

Make a generalized diagonal matrix, $\Sigma$ , with the square roots of $D$ by using this code:

S = diag(D,A.shape[0],A.shape[1])**0.5

Then, compute the inverse of $\Sigma$ using the following code:

S_inv = np.linalg.pinv(S)

Notice that $\Sigma$ may be singlar, so psuedo inverse is used to be on the safe side.

Finally, compute $U$ by using this code:

U = A.dot(V).dot(S_inv)

The following code implements the SVD algorithm:

Python 3.10.4

# Importing NumPy library for scientific computing with Python
import numpy as np
# Defining a function to create a diagonal matrix
def diag(D, m, n):
    # Creating a matrix of zeros with the dimensions (m, n)
    S = np.zeros((m, n))
    # Setting the first k diagonal elements of S to the values of D
    k = len(D)
    S[:k, :k] = np.diag(D)
    return S
# Defining a function to clip small values in an array to zero
def clip(a, th=1e-12):
    # If the size of the array is 1 and its value is less than th, set it to zero
    if a.size == 1:
        if a < th:
            a = 0
    # Otherwise, set all elements less than th to zero
    else:
        a[a < th] = 0
    return a
# Generating random values for the dimensions of matrix A
m = np.random.randint(1, 20)
n = np.random.randint(1, 20)
# Generating a random matrix of shape (m, n) using NumPy's random.randn() function
A = np.random.randn(m, n)
# Calculating the eigenvalues (D) and eigenvectors (V) of A^T.A using NumPy's linalg.eigh() function
D, V = np.linalg.eigh(A.T.dot(A))
# Creating a diagonal matrix S using the clipped eigenvalues and taking the square root of each value
S = diag(clip(D), A.shape[0], A.shape[1]) ** 0.5
# Calculating the inverse of S using NumPy's linalg.pinv() function
S_inv = np.linalg.pinv(S)
# Calculating the matrix U by multiplying A with V and S_inv
U = A.dot(V).dot(S_inv)
# Reconstructing the matrix A using U, S, and V^T
A_hat = U.dot(S).dot(V.T)
# Counting the number of sign differences between A and A_hat
sign_dif = np.sum(np.sign(A) != np.sign(A_hat))
# Calculating the total value difference between A and A_hat using NumPy's linalg.norm() function
value_dif = np.linalg.norm(A - A_hat)
# Printing the results
print('Number of sign differences = ', sign_dif, '\nTotal Value differences = ', clip(value_dif))

Conclusion#

SVD is one of the most useful decompositions in data science. (For detailed insights, please head over to the Educative course Linear Algebra for Data Science Using Python The sign ambiguity of SVD is an important issue to consider when working with SVD in data science and other fields. The fact that the singular vectors can have arbitrary signs means that the interpretation of the results can be more challenging, and the numerical stability of the algorithm can be affected. However, there are several ways to handle the sign ambiguity, such as choosing a consistent sign convention. By understanding the sign ambiguity and how to handle it, developers can make the most of the power and flexibility of SVD in their data analysis and modeling.

Written By: