Automatic Differentiating(AD) of Singular Value Decomposition(SVD)
Problem Define
we have standard \(A(\textbf{q}) = USV^T\), where \(\textbf{q} = [q_0, q_1, ..., q_{Q-1}]\). Find the the differentials \(dU\), \(dS\) and \(dV\) in terms of \(dA(\textbf{q})\), \(U\), \(S\) and \(V\). The dimension details show below,
\(A: m \times n\)
\(U: m \times k\)
\(S: k \times k\)
\(V: n \times k\)
\(k:\) min\((m, n)\)
\(Q\) is the number of independent variables.
Below derivation of \(dU\) is shown. \(dS\) and \(dV\) are similar. In second part, we compare AD result with conventional numerical method to prove the correctness.
Analytical/AD solution
This method is implemented in reverse mode which is more effeciency since all free variables q take advantages of same intermedia SVD result, U_star, S_star and V_star. As explained in [0].
Remember our goal is to get
Based on [1], we have result
where
\(\circ\) is Hadamard product, aka element-wise matrix multiplication.
The \(dU\), \(dS\), \(dV\), \(U\), \(S\) and \(V\) are all implicit function of \(\textbf{q}\). When we considering a specific \(\textbf{q} = \textbf{q}_*\), \(U_*, S_*, V_*^T\) and \(F_*\) which are SVD results of \(A(\textbf{q}_*)\) are all actually constant. Put those in (2), we get
where \(i = 0, 1, ..., Q-1\)
\(\frac{\partial A(\textbf{q})} {\partial q_i}\) is element-wise as shown below.
.
\(\frac{\partial U(\textbf{q})}{\partial q_i}\) is element-wise differentiation so its shape is as same as \(U\), \(m \times k\).
In summary, the jacobian of matrix \(U\) relative to \(\textbf{q}\) is actually a tensor. The \(i\)-th layer is \(m \times k\) matrix of \(\frac{\partial U(\textbf{q})} {\partial q_i}\). The (1) could be simplified to
where \(\nabla U(\textbf{q})\) is a \(Q \times m \times k\) tensor.
Reference
[0] Matlab Automatic Differentiation Background
[1] James Townsend. Differentiating the Singular Value Decomposition
[2] Mike Giles. An extended collection of matrix derivative results for forward and reverse mode algorithmic differentiation
[3] ZhouQuan Wang Automatic Differentiation for Complex Valued SVD
[1]:
def svd_flip_normal(u, v, u_based_decision=True):
'''
Sign correction to ensure deterministic output from SVD.
Adjusts the columns of u and the rows of v such that the loadings in the
columns in u that are largest in absolute value are always positive.
Parameters
----------
u : ndarray
u and v are the output of `linalg.svd` or
:func:`~sklearn.utils.extmath.randomized_svd`, with matching inner
dimensions so one can compute `np.dot(u * s, v)`.
v : ndarray
u and v are the output of `linalg.svd` or
:func:`~sklearn.utils.extmath.randomized_svd`, with matching inner
dimensions so one can compute `np.dot(u * s, v)`.
The input v should really be called vt to be consistent with scipy's
output.
u_based_decision : bool, default=True
If True, use the columns of u as the basis for sign flipping.
Otherwise, use the rows of v. The choice of which variable to base the
decision on is generally algorithm dependent.
Returns
-------
u_adjusted, v_adjusted : arrays with the same dimensions as the input.
'''
if u_based_decision:
sign = np.sign(u[-1][2])
else:
sign = np.sign(v[-1, :][2])
u[:, -1] *= sign
v[-1, :] *= sign
return u, v
def svd_without_sign_ambiguity(X):
"""
SVD with corrected signs
Bro, R., Acar, E., & Kolda, T. G. (2008). Resolving the sign ambiguity in the singular value decomposition.
Journal of Chemometrics: A Journal of the Chemometrics Society, 22(2), 135-140.
URL: https://prod-ng.sandia.gov/techlib-noauth/access-control.cgi/2007/076422.pdf
"""
# SDV dimensions:
# U, S, V = np.linalg.svd(X, full_matrices=False)
# X = U @ diag(S) @ V
# (I,J) = (I,K) @ (K,K) @ (K,J)
U, S, V = np.linalg.svd(X, full_matrices=False)
I = U.shape[0]
J = V.shape[1]
K = S.shape[0]
assert U.shape == (I,K)
assert V.shape == (K,J)
assert X.shape == (I,J)
s = {'left': np.zeros(K), 'right': np.zeros(K)}
for k in range(K):
mask = np.ones(K).astype(bool)
mask[k] = False
# (I,J) = (I,K-1) @ (K-1,K-1) @ (K-1,J)
Y = X - (U[:, mask] @ np.diag(S[mask]) @ V[mask, :])
for j in range(J):
d = np.dot(U[:, k], Y[:, j])
s['left'][k] += np.sum(np.sign(d) * d**2)
for i in range(I):
d = np.dot(V[k, :], Y[i, :])
s['right'][k] += np.sum(np.sign(d) * d**2)
for k in range(K):
if (s['left'][k] * s['right'][k]) < 0:
if abs(s['left'][k]) < abs(s['right'][k]):
s['left'][k] = -s['left'][k]
else:
s['right'][k] = -s['right'][k]
U[:, k] = U[:, k] * np.sign(s['left'][k])
V[k, :] = V[k, :] * np.sign(s['right'][k])
return U, S, V
def svd_regulated(X):
u, s, v = svd_without_sign_ambiguity(X)
if X.shape[0] < X.shape[1]: # bus
u, v = svd_flip_normal(u, v)
else: # rocket
u, v = svd_flip_normal(u, v, u_based_decision=False)
return u, s, v
[2]:
from sympy import Matrix, Array
import sympy as sp
import numpy as np
def autodiff_svd(A, q_star, *, symbolic=False):
"""
Automatic Differentiating of SVD
It takes matrix A, return differentiation of U, S and V with respect
to q_star. U, S and VT(V.T) are result of standard SVD. This method
is implemented in reverse mode which is more effeciency since all
free variables q take advantages of same intermedia SVD result,
U_star, S_star and V_star. It's return ecomonic results which means
full_matrices=False in corresponding np.linalg.svd
Parameters
----------
A : (m, n) Matrix
A sympy symbolic 2D Matrix, which should be legal for SVD later.
q_star : dict
A dict from sympy symbolic variable to value.
symbolic: bool, optional
Control return type. Return numerical result, numpy.ndarray, if
true(default), otherwise return symbolic result, sympy.Array.
Returns
-------
dU : (Q, m, k) array
Partial derivatives/gradient of U with respect to each variables in
q_star. Each variable matches a (..., m, k) matrix. Q is number of
variables. Return sympy.Array if symbolic=True.
dS : (Q, k, k) array
Partial derivatives/gradient of S with respect to each variables in
q_star. Each variable matches a (..., m, k) matrix. k=min(m, n).
(..., k, k) is a strict diagonal matrix which is guaranteed by
Hadamard operation. check examples for details.
Return sympy.Array if symbolic=True.
dV : (Q, n, k) array
Partial derivatives/gradient of V with respect to each variables in
q_star. Each variable matches a (..., n, k) matrix. Return
sympy.Array if symbolic=True.
See Also
--------
numpy.linalg.svd : Similar function in numpy.
sympy.Array : 3D tensor in sympy.
sympy.Matrix.jacobian : jacobian calculation in sympy.
References
----------
.. [1] James Townsend. Differentiating the Singular Value Decomposition.
.. [2] Mike Giles. An extended collection of matrix derivative results
for forward and reverse mode algorithmic differentiation.
Examples
--------
>>> from sympy import sin, cos, symbols, Matrix
>>> q0, q1 = symbols('q0, q1')
... q_star = {q0: 1, q1: 2}
... A = Matrix([[2*q0, 1, q0**3 + 2*q1, sin(q0) + q1],
[q0**2 - q1**2, 2*q1, 2*q0 + q1**3, cos(q0) + q1],
[q0**2 + q1**2, 2, 3, sin(q0) + cos(q1)]])
>>> dU, dS, dV = autodiff_svd(A, q_star)
>>> dU.shape
(2, 3, 3)
>>> dS.shape
(2, 3, 3)
>>> dV.shape
(2, 4, 3)
>>> dU_sym, dS_sym, dV_sym = autodiff_svd(A, q_star, symbolic=True)
"""
m, n = A.shape
k = min(m, n)
Q = len(q_star)
# For a specific q0 and q1, get U_star, S_star and V_star_T.
A_star = np.array(A.subs(q_star), dtype = float)
U_star, S_star_1d, V_star_T = svd_regulated(A_star) # U_star (m, k), S_star_1d (k, 1), V_star_T (n, n)
U_star = U_star[:, :k] # Use economic U (m, k)
V_star_T = V_star_T[:k, :] # Use economic VT (k, n)
S_star = np.diag(S_star_1d) # Construct S (k, k)
S_star_inv = np.diag(np.reciprocal(S_star_1d))
Ik = sp.eye(k)
U_star_T = U_star.T
V_star = V_star_T.T
# F
F_star = Matrix([[0 if i == j else 1 / (S_star_1d[j]**2 - S_star_1d[i]**2) for j in range(k)] for i in range(k)])
# dA
# Reshape A to 1d vector for convenience of jacobian calculation, will reshape back to 2d later
A_element_wise = A.reshape(m * n, 1)
q_var = Matrix(list(q_star.keys())) # [q_0, q_1, q_i..., q_Q-1]
dA_element_wise = A_element_wise.jacobian(q_var) # in shape (m*n, Q)
# Reserve list of dA, dS, dU and dV for each q variables. E.X. dA_lst[i] is dA with respect to qi
dA_lst = [None] * Q
dS_lst = [None] * Q
dU_lst = [None] * Q
dV_lst = [None] * Q
Im = sp.eye(m)
In = sp.eye(n)
for i in range(Q):
# Get matrix dA of eq(6) wrt qi, each element in dA_lst is dA_wrt_qi. dA_wrt_qi is element-wise.
dA_lst[i] = dA_element_wise.T[i, :].reshape(m, n)
# Use @ instead of * force all binary operation between np.array(s) and sp.Matrix(s) using matrix product.
dU_lst[i] = U_star @ F_star.multiply_elementwise(U_star_T @ dA_lst[i] @ V_star @ S_star \
+ S_star @ V_star_T @ dA_lst[i].T @ U_star) \
+ (Im - U_star @ U_star_T) @ dA_lst[i] @ V_star @ S_star_inv
dS_lst[i] = Ik.multiply_elementwise(U_star_T @ dA_lst[i] @ V_star)
dV_lst[i] = V_star @ F_star.multiply_elementwise(S_star @ U_star_T @ dA_lst[i] @ V_star \
+ V_star_T @ dA_lst[i].T @ U_star @ S_star) \
+ (In - V_star @ V_star_T) @ dA_lst[i].T @ U_star @ S_star_inv
# Build a 3D tensor for all q variables
dU_tensor = Array([e.tolist() for e in dU_lst]) # (Q, m, k)
dS_tensor = Array([e.tolist() for e in dS_lst]) # (Q, k, k)
dV_tensor = Array([e.tolist() for e in dV_lst]) # (Q, n, k)
if symbolic:
return dU_tensor, dS_tensor, dV_tensor
else:
return (np.array(dU_tensor.subs(q_star), dtype=float),
np.array(dS_tensor.subs(q_star), dtype=float),
np.array(dV_tensor.subs(q_star), dtype=float))
[3]:
# Test input
from sympy import sin, cos, symbols, Matrix
q0, q1, q2, q3 = symbols('q0, q1, q2, q3')
q_star = {q0: 1, q1: 2, q2: 3, q3:4}
A = Matrix([[2*q0, 1, q2**3 + 2*q3, sin(q0) + q1],
[q0**2 - q1**2, 2*q1, 2*q0 + q1**3, cos(q0) + q1],
[q0**2 + q1**2, 2, 3, sin(q0) + cos(q1)]])
dU_of_MatrixA, dS_of_MatrixA, dV_of_MatrixA = autodiff_svd(A, q_star)
# Extract jacobian of normal vector
# if 3 x N
j_ori_n_of_A = dU_of_MatrixA[:, :, -1].T
[4]:
j_ori_n_of_A
[4]:
array([[-0.14231918, -0.21932216, 0.19752725, 0.01463165],
[ 0.51855359, -0.01168378, 0.02158681, 0.00159902],
[-0.53767579, -0.06999999, 0.05269224, 0.00390313]])
[5]:
B = A.T
dU_of_MatrixB, dS_of_MatrixB, dV_of_MatrixB = autodiff_svd(B, q_star)
j_ori_n_of_B = dV_of_MatrixB[:, :, -1].T
# Check uniqueness
np.allclose(j_ori_n_of_A, j_ori_n_of_B)
[5]:
True
Compare with numerical solution
Take \(U\) as a example
where \(\Delta \textbf{q}_i\) is vector that only index \(i\) has value \(\epsilon\)
\(U(\textbf{q} + \Delta \textbf{q}_i)\) is derived from standard SVD from \(A(\textbf{q} + \Delta \textbf{q}_i)\). \(S\) and \(V\) are similar.
[6]:
# Numerical method /pytest control group
# when Q = 2, aka q = [q0, q1]^T
# For a specific q0 and q1, get U_star, S_star and V_star_T.
m, n = A.shape
k = min(m, n)
Q = len(q_star)
# For a specific q0 and q1, get U_star, S_star and V_star_T.
A_star = np.array(A.subs(q_star), dtype = float)
U_star, S_star_1d, V_star_T = svd_regulated(A_star) # U_star (m, k), S_star_1d (k, 1), V_star_T (n, n)
S_star = np.diag(S_star_1d) # Construct S (k, k)
U_star = U_star[:, :k] # Use economic U (m, k)
V_star_T = V_star_T[:k, :] # Use economic VT (k, n)
dq = 1e-7
U_star_plus_dq = np.empty((Q, m, k))
S_star_plus_dq = np.empty((Q, k, k))
V_star_plus_dq_T = np.empty((Q, k, n))
dS_numerical_q = np.empty((Q, k, k))
dU_numerical_q = np.empty((Q, m, k))
dVT_numerical_q = np.empty((Q, k, n))
dV_numerical_q = np.empty((Q, n, k))
for i in range(Q):
qi_star_plus_dq = q_star.copy()
qi_star_plus_dq[eval('q' + str(i))] += dq
tmp_U, S_star_plus_dqi_1d, tmp_VT = svd_regulated(np.array(A.subs(qi_star_plus_dq), dtype=float))
U_star_plus_dq[i] = tmp_U[:, :k]
S_star_plus_dq[i] = np.diag(S_star_plus_dqi_1d)
V_star_plus_dq_T[i] = tmp_VT[:k, :]
dS_numerical_q[i] = (S_star_plus_dq[i] - S_star) / dq
dU_numerical_q[i] = (U_star_plus_dq[i] - U_star) / dq
dVT_numerical_q[i] = (V_star_plus_dq_T[i] - V_star_T) / dq # Use economic VT (k, n)
dV_numerical_q[i] = dVT_numerical_q[i].T
[7]:
# Error Check
np.allclose(dU_of_MatrixA, dU_numerical_q) \
and np.allclose(dS_of_MatrixA, dS_numerical_q) \
and np.allclose(dV_of_MatrixA, dV_numerical_q)
[7]:
True
[ ]:
[ ]: