Automatic Differentiating(AD) of Singular Value Decomposition(SVD)

Problem Define

we have standard \(A(\textbf{q}) = USV^T\), where \(\textbf{q} = [q_0, q_1, ..., q_{Q-1}]\). Find the the differentials \(dU\), \(dS\) and \(dV\) in terms of \(dA(\textbf{q})\), \(U\), \(S\) and \(V\). The dimension details show below,

\(A: m \times n\)

\(U: m \times k\)

\(S: k \times k\)

\(V: n \times k\)

\(k:\) min\((m, n)\)

\(Q\) is the number of independent variables.

Below derivation of \(dU\) is shown. \(dS\) and \(dV\) are similar. In second part, we compare AD result with conventional numerical method to prove the correctness.

Analytical/AD solution

This method is implemented in reverse mode which is more effeciency since all free variables q take advantages of same intermedia SVD result, U_star, S_star and V_star. As explained in [0].

Remember our goal is to get

\[\begin{split}\begin{align} & dU(\textbf{q}) = \frac{\partial U(\textbf{q})}{\partial q_0}\Delta q_0 + \frac{\partial U(\textbf{q})}{\partial q_1}\Delta q_1 ...\frac{\partial U(\textbf{q})}{\partial q_{Q-1}}\Delta q_{Q-1} & (1)\\ \end{align}\end{split}\]

Based on [1], we have result

\[\begin{split}\begin{align} & dU = U(F\circ[U^T(dA)VS + SV^T(dA)^TU]) + (I_m - UU^T)(dA)VS^{-1} & (2)\\ & dS = I_k\circ [U^T(dA)V] & (3)\\ & dV = V(F\circ[SU^T(dA)V + V^T(dA)^TUS]) + (I_n - VV^T)(dA)^TUS^{-1} & (4)\\ \end{align}\end{split}\]

where

\[\begin{split}\begin{equation} F =\begin{cases} \frac{1}{s_j^2 - s_i^2}, & i \neq j.\\ 0, & i = j. \end{cases} \end{equation}\end{split}\]

\(\circ\) is Hadamard product, aka element-wise matrix multiplication.

The \(dU\), \(dS\), \(dV\), \(U\), \(S\) and \(V\) are all implicit function of \(\textbf{q}\). When we considering a specific \(\textbf{q} = \textbf{q}_*\), \(U_*, S_*, V_*^T\) and \(F_*\) which are SVD results of \(A(\textbf{q}_*)\) are all actually constant. Put those in (2), we get

\[\begin{split}\begin{align} & \frac{\partial U(\textbf{q})}{\partial q_i} = U_*(F_*\circ[U_*^T(\frac{\partial A(\textbf{q})}{\partial q_i})V_*S_* + S_*V_*^T(\frac{\partial A(\textbf{q})}{\partial q_i})^TU_*]) + (I_m - U_*U_*^T)(\frac{\partial A(\textbf{q})}{\partial q_i})V_*S_*^{-1} & (5)\\ \end{align}\end{split}\]

where \(i = 0, 1, ..., Q-1\)

\(\frac{\partial A(\textbf{q})} {\partial q_i}\) is element-wise as shown below.

\[\begin{split}\begin{equation*} \frac{\partial A(\textbf{q})} {\partial q_i} = \begin{bmatrix} \frac{\partial A_{0, 0}} {\partial q_i} & \frac{\partial A_{0, 1}} {\partial q_i} & ... & \frac{\partial A_{0, n-1}} {\partial q_i} \\ \frac{\partial A_{1, 0}} {\partial q_i} & \frac{\partial A_{1, 1}} {\partial q_i} & ... & \frac{\partial A_{1, n-1}} {\partial q_i} \\ ... & ... & ... & ... \\ \frac{\partial A_{m-1, 0}} {\partial q_i} & \frac{\partial A_{m-1, 1}} {\partial q_i} & ... & \frac{\partial A_{m-1, n-1}} {\partial q_i} \\ \end{bmatrix} \end{equation*}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (6)\end{split}\]

.

\(\frac{\partial U(\textbf{q})}{\partial q_i}\) is element-wise differentiation so its shape is as same as \(U\), \(m \times k\).

In summary, the jacobian of matrix \(U\) relative to \(\textbf{q}\) is actually a tensor. The \(i\)-th layer is \(m \times k\) matrix of \(\frac{\partial U(\textbf{q})} {\partial q_i}\). The (1) could be simplified to

\[dU(\textbf{q}) = \nabla U(\textbf{q})\Delta{\textbf{q}}\]

where \(\nabla U(\textbf{q})\) is a \(Q \times m \times k\) tensor.

Reference

[0] Matlab Automatic Differentiation Background

[1] James Townsend. Differentiating the Singular Value Decomposition

[2] Mike Giles. An extended collection of matrix derivative results for forward and reverse mode algorithmic differentiation

[3] ZhouQuan Wang Automatic Differentiation for Complex Valued SVD

[1]:
def svd_flip_normal(u, v, u_based_decision=True):
    '''
    Sign correction to ensure deterministic output from SVD.
    Adjusts the columns of u and the rows of v such that the loadings in the
    columns in u that are largest in absolute value are always positive.
    Parameters
    ----------
    u : ndarray
        u and v are the output of `linalg.svd` or
        :func:`~sklearn.utils.extmath.randomized_svd`, with matching inner
        dimensions so one can compute `np.dot(u * s, v)`.
    v : ndarray
        u and v are the output of `linalg.svd` or
        :func:`~sklearn.utils.extmath.randomized_svd`, with matching inner
        dimensions so one can compute `np.dot(u * s, v)`.
        The input v should really be called vt to be consistent with scipy's
        output.
    u_based_decision : bool, default=True
        If True, use the columns of u as the basis for sign flipping.
        Otherwise, use the rows of v. The choice of which variable to base the
        decision on is generally algorithm dependent.
    Returns
    -------
    u_adjusted, v_adjusted : arrays with the same dimensions as the input.
    '''
    if u_based_decision:
        sign = np.sign(u[-1][2])
    else:
        sign = np.sign(v[-1, :][2])

    u[:, -1] *= sign
    v[-1, :] *= sign
    return u, v



def svd_without_sign_ambiguity(X):
    """
    SVD with corrected signs

    Bro, R., Acar, E., & Kolda, T. G. (2008). Resolving the sign ambiguity in the singular value decomposition.
    Journal of Chemometrics: A Journal of the Chemometrics Society, 22(2), 135-140.
    URL: https://prod-ng.sandia.gov/techlib-noauth/access-control.cgi/2007/076422.pdf
    """
    # SDV dimensions:
    # U, S, V = np.linalg.svd(X, full_matrices=False)
    # X = U @ diag(S) @ V
    # (I,J) = (I,K) @ (K,K) @ (K,J)

    U, S, V = np.linalg.svd(X, full_matrices=False)

    I = U.shape[0]
    J = V.shape[1]
    K = S.shape[0]

    assert U.shape == (I,K)
    assert V.shape == (K,J)
    assert X.shape == (I,J)

    s = {'left': np.zeros(K), 'right': np.zeros(K)}

    for k in range(K):
        mask = np.ones(K).astype(bool)
        mask[k] = False
        # (I,J) = (I,K-1) @ (K-1,K-1) @ (K-1,J)
        Y = X - (U[:, mask] @ np.diag(S[mask]) @ V[mask, :])

        for j in range(J):
            d = np.dot(U[:, k], Y[:, j])
            s['left'][k] += np.sum(np.sign(d) * d**2)
        for i in range(I):
            d = np.dot(V[k, :], Y[i, :])
            s['right'][k] += np.sum(np.sign(d) * d**2)

    for k in range(K):
        if (s['left'][k] * s['right'][k]) < 0:
            if abs(s['left'][k]) < abs(s['right'][k]):
                s['left'][k] = -s['left'][k]
            else:
                s['right'][k] = -s['right'][k]
        U[:, k] = U[:, k] * np.sign(s['left'][k])
        V[k, :] = V[k, :] * np.sign(s['right'][k])

    return U, S, V

def svd_regulated(X):
    u, s, v = svd_without_sign_ambiguity(X)
    if X.shape[0] < X.shape[1]: # bus
        u, v = svd_flip_normal(u, v)
    else: # rocket
        u, v = svd_flip_normal(u, v, u_based_decision=False)
    return u, s, v
[2]:
from sympy import Matrix, Array
import sympy as sp
import numpy as np


def autodiff_svd(A, q_star, *, symbolic=False):
    """
    Automatic Differentiating of SVD

    It takes matrix A, return differentiation of U, S and V with respect
    to q_star. U, S and VT(V.T) are result of standard SVD. This method
    is implemented in reverse mode which is more effeciency since all
    free variables q take advantages of same intermedia SVD result,
    U_star, S_star and V_star. It's return ecomonic results which means
    full_matrices=False in corresponding np.linalg.svd

    Parameters
    ----------
    A : (m, n) Matrix
        A sympy symbolic 2D Matrix, which should be legal for SVD later.
    q_star : dict
        A dict from sympy symbolic variable to value.
    symbolic: bool, optional
        Control return type. Return numerical result, numpy.ndarray, if
        true(default), otherwise return symbolic result, sympy.Array.

    Returns
    -------
    dU : (Q, m, k) array
        Partial derivatives/gradient of U with respect to each variables in

        q_star. Each variable matches a (..., m, k) matrix. Q is number of
        variables. Return sympy.Array if symbolic=True.
    dS : (Q, k, k) array
        Partial derivatives/gradient of S with respect to each variables in
        q_star. Each variable matches a (..., m, k) matrix. k=min(m, n).
        (..., k, k) is a strict diagonal matrix which is guaranteed by
        Hadamard operation. check examples for details.
        Return sympy.Array if symbolic=True.
    dV : (Q, n, k) array
        Partial derivatives/gradient of V with respect to each variables in
        q_star. Each variable matches a (..., n, k) matrix. Return
        sympy.Array if symbolic=True.

    See Also
    --------
    numpy.linalg.svd : Similar function in numpy.
    sympy.Array : 3D tensor in sympy.
    sympy.Matrix.jacobian : jacobian calculation in sympy.

    References
    ----------
    .. [1] James Townsend. Differentiating the Singular Value Decomposition.
    .. [2] Mike Giles. An extended collection of matrix derivative results
           for forward and reverse mode algorithmic differentiation.

    Examples
    --------
    >>> from sympy import sin, cos, symbols, Matrix
    >>> q0, q1 = symbols('q0, q1')
    ... q_star = {q0: 1, q1: 2}
    ... A = Matrix([[2*q0,          1,    q0**3 + 2*q1, sin(q0) + q1],
                    [q0**2 - q1**2, 2*q1, 2*q0 + q1**3, cos(q0) + q1],
                    [q0**2 + q1**2, 2,    3,            sin(q0) + cos(q1)]])

    >>> dU, dS, dV = autodiff_svd(A, q_star)
    >>> dU.shape
    (2, 3, 3)
    >>> dS.shape
    (2, 3, 3)
    >>> dV.shape
    (2, 4, 3)
    >>> dU_sym, dS_sym, dV_sym = autodiff_svd(A, q_star, symbolic=True)

    """

    m, n = A.shape
    k = min(m, n)
    Q = len(q_star)

    # For a specific q0 and q1, get U_star, S_star and V_star_T.
    A_star = np.array(A.subs(q_star), dtype = float)
    U_star, S_star_1d, V_star_T = svd_regulated(A_star) # U_star (m, k), S_star_1d (k, 1), V_star_T (n, n)
    U_star = U_star[:, :k] # Use economic U (m, k)
    V_star_T = V_star_T[:k, :] # Use economic VT (k, n)

    S_star = np.diag(S_star_1d) # Construct S (k, k)
    S_star_inv = np.diag(np.reciprocal(S_star_1d))

    Ik = sp.eye(k)
    U_star_T = U_star.T
    V_star = V_star_T.T

    # F
    F_star = Matrix([[0 if i == j else 1 / (S_star_1d[j]**2 - S_star_1d[i]**2) for j in range(k)] for i in range(k)])

    # dA
    # Reshape A to 1d vector for convenience of jacobian calculation, will reshape back to 2d later
    A_element_wise = A.reshape(m * n, 1)

    q_var = Matrix(list(q_star.keys())) # [q_0, q_1, q_i..., q_Q-1]
    dA_element_wise = A_element_wise.jacobian(q_var) # in shape (m*n, Q)

    # Reserve list of dA, dS, dU and dV for each q variables. E.X. dA_lst[i] is dA with respect to qi
    dA_lst = [None] * Q
    dS_lst = [None] * Q
    dU_lst = [None] * Q
    dV_lst = [None] * Q

    Im = sp.eye(m)
    In = sp.eye(n)
    for i in range(Q):
        # Get matrix dA of eq(6) wrt qi, each element in dA_lst is dA_wrt_qi. dA_wrt_qi is element-wise.
        dA_lst[i] = dA_element_wise.T[i, :].reshape(m, n)
        # Use @ instead of * force all binary operation between np.array(s) and sp.Matrix(s) using matrix product.
        dU_lst[i] = U_star @ F_star.multiply_elementwise(U_star_T @ dA_lst[i] @ V_star @ S_star \
                                                       + S_star @ V_star_T @ dA_lst[i].T @ U_star) \
                    + (Im - U_star @ U_star_T) @ dA_lst[i] @ V_star @ S_star_inv
        dS_lst[i] = Ik.multiply_elementwise(U_star_T @ dA_lst[i] @ V_star)
        dV_lst[i] = V_star @ F_star.multiply_elementwise(S_star @ U_star_T @ dA_lst[i] @ V_star \
                                                       + V_star_T @ dA_lst[i].T @ U_star @ S_star) \
                    + (In - V_star @ V_star_T) @ dA_lst[i].T @ U_star @ S_star_inv

    # Build a 3D tensor for all q variables
    dU_tensor = Array([e.tolist() for e in dU_lst]) # (Q, m, k)
    dS_tensor = Array([e.tolist() for e in dS_lst]) # (Q, k, k)
    dV_tensor = Array([e.tolist() for e in dV_lst]) # (Q, n, k)

    if symbolic:
        return dU_tensor, dS_tensor, dV_tensor
    else:
        return (np.array(dU_tensor.subs(q_star), dtype=float),
                np.array(dS_tensor.subs(q_star), dtype=float),
                np.array(dV_tensor.subs(q_star), dtype=float))
[3]:
# Test input
from sympy import sin, cos, symbols, Matrix
q0, q1, q2, q3 = symbols('q0, q1, q2, q3')
q_star = {q0: 1, q1: 2, q2: 3, q3:4}
A = Matrix([[2*q0,          1,    q2**3 + 2*q3, sin(q0) + q1],
            [q0**2 - q1**2, 2*q1, 2*q0 + q1**3, cos(q0) + q1],
            [q0**2 + q1**2, 2,    3,            sin(q0) + cos(q1)]])

dU_of_MatrixA, dS_of_MatrixA, dV_of_MatrixA = autodiff_svd(A, q_star)

# Extract jacobian of normal vector
# if 3 x N
j_ori_n_of_A = dU_of_MatrixA[:, :, -1].T
[4]:
j_ori_n_of_A
[4]:
array([[-0.14231918, -0.21932216,  0.19752725,  0.01463165],
       [ 0.51855359, -0.01168378,  0.02158681,  0.00159902],
       [-0.53767579, -0.06999999,  0.05269224,  0.00390313]])
[5]:
B = A.T
dU_of_MatrixB, dS_of_MatrixB, dV_of_MatrixB = autodiff_svd(B, q_star)
j_ori_n_of_B = dV_of_MatrixB[:, :, -1].T

# Check uniqueness
np.allclose(j_ori_n_of_A, j_ori_n_of_B)
[5]:
True

Compare with numerical solution

Take \(U\) as a example

\[\frac{\partial U(\textbf{q})} {\partial q_i} = \frac {U(\textbf{q} + \Delta \textbf{q}_i) - U(\textbf{q})}{\Delta q_i}\]

where \(\Delta \textbf{q}_i\) is vector that only index \(i\) has value \(\epsilon\)

\[\Delta \textbf{q}_i = [0, ..., 0, \epsilon, 0, ..., 0]^T\]

\(U(\textbf{q} + \Delta \textbf{q}_i)\) is derived from standard SVD from \(A(\textbf{q} + \Delta \textbf{q}_i)\). \(S\) and \(V\) are similar.

[6]:
# Numerical method /pytest control group
# when Q = 2, aka q = [q0, q1]^T

# For a specific q0 and q1, get U_star, S_star and V_star_T.
m, n = A.shape
k = min(m, n)
Q = len(q_star)



# For a specific q0 and q1, get U_star, S_star and V_star_T.
A_star = np.array(A.subs(q_star), dtype = float)
U_star, S_star_1d, V_star_T = svd_regulated(A_star) # U_star (m, k), S_star_1d (k, 1), V_star_T (n, n)

S_star = np.diag(S_star_1d) # Construct S (k, k)
U_star = U_star[:, :k] # Use economic U (m, k)
V_star_T = V_star_T[:k, :] # Use economic VT (k, n)

dq = 1e-7
U_star_plus_dq = np.empty((Q, m, k))
S_star_plus_dq = np.empty((Q, k, k))
V_star_plus_dq_T = np.empty((Q, k, n))

dS_numerical_q = np.empty((Q, k, k))
dU_numerical_q = np.empty((Q, m, k))
dVT_numerical_q = np.empty((Q, k, n))
dV_numerical_q = np.empty((Q, n, k))

for i in range(Q):
    qi_star_plus_dq = q_star.copy()
    qi_star_plus_dq[eval('q' + str(i))] += dq

    tmp_U, S_star_plus_dqi_1d, tmp_VT = svd_regulated(np.array(A.subs(qi_star_plus_dq), dtype=float))

    U_star_plus_dq[i] = tmp_U[:, :k]
    S_star_plus_dq[i] = np.diag(S_star_plus_dqi_1d)
    V_star_plus_dq_T[i] = tmp_VT[:k, :]
    dS_numerical_q[i] = (S_star_plus_dq[i] - S_star) / dq
    dU_numerical_q[i] = (U_star_plus_dq[i] - U_star) / dq
    dVT_numerical_q[i] = (V_star_plus_dq_T[i] - V_star_T) / dq # Use economic VT (k, n)
    dV_numerical_q[i] = dVT_numerical_q[i].T
[7]:
# Error Check
np.allclose(dU_of_MatrixA, dU_numerical_q) \
and np.allclose(dS_of_MatrixA, dS_numerical_q) \
and np.allclose(dV_of_MatrixA, dV_numerical_q)
[7]:
True
[ ]:

[ ]: