{ "cells": [ { "cell_type": "markdown", "id": "41260a66", "metadata": {}, "source": [ "## Automatic Differentiating(AD) of Singular Value Decomposition(SVD)\n", "\n", "### Problem Define\n", "we have standard $A(\\textbf{q}) = USV^T$, where $\\textbf{q} = [q_0, q_1, ..., q_{Q-1}]$. Find the the differentials $dU$, $dS$ and $dV$ in terms of $dA(\\textbf{q})$, $U$, $S$ and $V$. The dimension details show below,\n", "\n", "$A: m \\times n$\n", "\n", "$U: m \\times k$\n", "\n", "$S: k \\times k$\n", "\n", "$V: n \\times k$\n", "\n", "$k:$ min$(m, n)$\n", "\n", "$Q$ is the number of independent variables.\n", "\n", "Below derivation of $dU$ is shown. $dS$ and $dV$ are similar. In second part, we compare AD result with conventional numerical method to prove the correctness.\n", "\n", "### Analytical/AD solution\n", "\n", "This method is implemented in reverse mode which is more effeciency since all free variables q take advantages of same intermedia SVD result, U_star, S_star and V_star. As explained in [0].\n", " \n", "Remember our goal is to get\n", "$$\n", "\\begin{align}\n", "& dU(\\textbf{q}) = \\frac{\\partial U(\\textbf{q})}{\\partial q_0}\\Delta q_0 + \\frac{\\partial U(\\textbf{q})}{\\partial q_1}\\Delta q_1 ...\\frac{\\partial U(\\textbf{q})}{\\partial q_{Q-1}}\\Delta q_{Q-1} & (1)\\\\\n", "\\end{align}\n", "$$\n", "\n", "Based on [1], we have result\n", "$$\n", "\\begin{align} \n", "& dU = U(F\\circ[U^T(dA)VS + SV^T(dA)^TU]) + (I_m - UU^T)(dA)VS^{-1} & (2)\\\\\n", "& dS = I_k\\circ [U^T(dA)V] & (3)\\\\\n", "& dV = V(F\\circ[SU^T(dA)V + V^T(dA)^TUS]) + (I_n - VV^T)(dA)^TUS^{-1} & (4)\\\\\n", "\\end{align}\n", "$$\n", "\n", "where\n", "$$\n", "\\begin{equation}\n", " F =\\begin{cases}\n", " \\frac{1}{s_j^2 - s_i^2}, & i \\neq j.\\\\\n", " 0, & i = j.\n", " \\end{cases}\n", "\\end{equation}\n", "$$\n", "$\\circ$ is [*Hadamard product*](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)), aka element-wise matrix multiplication.\n", "\n", "The $dU$, $dS$, $dV$, $U$, $S$ and $V$ are all implicit function of $\\textbf{q}$. When we considering a specific $\\textbf{q} = \\textbf{q}_*$, $U_*, S_*, V_*^T$ and $F_*$ which are **SVD** results of $A(\\textbf{q}_*)$ are all actually constant. Put those in (2), we get\n", "\n", "$$\n", "\\begin{align}\n", "& \\frac{\\partial U(\\textbf{q})}{\\partial q_i} = U_*(F_*\\circ[U_*^T(\\frac{\\partial A(\\textbf{q})}{\\partial q_i})V_*S_* + S_*V_*^T(\\frac{\\partial A(\\textbf{q})}{\\partial q_i})^TU_*]) + (I_m - U_*U_*^T)(\\frac{\\partial A(\\textbf{q})}{\\partial q_i})V_*S_*^{-1} & (5)\\\\\n", "\\end{align}\n", "$$\n", "\n", "where $i = 0, 1, ..., Q-1$\n", "\n", "$\\frac{\\partial A(\\textbf{q})} {\\partial q_i}$ is element-wise as shown below.\n", "$$\n", "\\begin{equation*}\n", "\\frac{\\partial A(\\textbf{q})} {\\partial q_i} =\n", "\\begin{bmatrix}\n", "\\frac{\\partial A_{0, 0}} {\\partial q_i} & \\frac{\\partial A_{0, 1}} {\\partial q_i} & ... & \\frac{\\partial A_{0, n-1}} {\\partial q_i} \\\\\n", "\\frac{\\partial A_{1, 0}} {\\partial q_i} & \\frac{\\partial A_{1, 1}} {\\partial q_i} & ... & \\frac{\\partial A_{1, n-1}} {\\partial q_i} \\\\\n", "... & ... & ... & ... \\\\\n", "\\frac{\\partial A_{m-1, 0}} {\\partial q_i} & \\frac{\\partial A_{m-1, 1}} {\\partial q_i} & ... & \\frac{\\partial A_{m-1, n-1}} {\\partial q_i} \\\\\n", "\\end{bmatrix}\n", "\\end{equation*}\\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ \\ (6)\n", "$$.\n", "\n", "$\\frac{\\partial U(\\textbf{q})}{\\partial q_i}$ is element-wise differentiation so its shape is as same as $U$, $m \\times k$.\n", "\n", "In summary, the jacobian of matrix $U$ relative to $\\textbf{q}$ is actually a tensor. The $i$-th layer is $m \\times k$ matrix of $\\frac{\\partial U(\\textbf{q})} {\\partial q_i}$. The (1) could be simplified to \n", "\n", "$$dU(\\textbf{q}) = \\nabla U(\\textbf{q})\\Delta{\\textbf{q}}$$ \n", "\n", "where $\\nabla U(\\textbf{q})$ is a $Q \\times m \\times k$ tensor. \n", "\n", "\n", "\n", "\n", "#### Reference\n", "[0] Matlab [Automatic Differentiation Background](https://www.mathworks.com/help/optim/ug/autodiff-background.html)\n", "\n", "[1] James Townsend. [Differentiating the Singular Value Decomposition](https://j-towns.github.io/papers/svd-derivative.pdf)\n", "\n", "[2] Mike Giles. [An extended collection of matrix derivative results for forward and reverse mode algorithmic differentiation](https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf)\n", "\n", "[3] ZhouQuan Wang [Automatic Differentiation for Complex Valued SVD](https://arxiv.org/pdf/1909.02659.pdf)" ] }, { "cell_type": "code", "execution_count": 1, "id": "75bfe186", "metadata": {}, "outputs": [], "source": [ "def svd_flip_normal(u, v, u_based_decision=True):\n", " '''\n", " Sign correction to ensure deterministic output from SVD.\n", " Adjusts the columns of u and the rows of v such that the loadings in the\n", " columns in u that are largest in absolute value are always positive.\n", " Parameters\n", " ----------\n", " u : ndarray\n", " u and v are the output of `linalg.svd` or\n", " :func:`~sklearn.utils.extmath.randomized_svd`, with matching inner\n", " dimensions so one can compute `np.dot(u * s, v)`.\n", " v : ndarray\n", " u and v are the output of `linalg.svd` or\n", " :func:`~sklearn.utils.extmath.randomized_svd`, with matching inner\n", " dimensions so one can compute `np.dot(u * s, v)`.\n", " The input v should really be called vt to be consistent with scipy's\n", " output.\n", " u_based_decision : bool, default=True\n", " If True, use the columns of u as the basis for sign flipping.\n", " Otherwise, use the rows of v. The choice of which variable to base the\n", " decision on is generally algorithm dependent.\n", " Returns\n", " -------\n", " u_adjusted, v_adjusted : arrays with the same dimensions as the input.\n", " '''\n", " if u_based_decision:\n", " sign = np.sign(u[-1][2])\n", " else:\n", " sign = np.sign(v[-1, :][2])\n", "\n", " u[:, -1] *= sign\n", " v[-1, :] *= sign\n", " return u, v\n", "\n", "\n", "\n", "def svd_without_sign_ambiguity(X):\n", " \"\"\"\n", " SVD with corrected signs\n", " \n", " Bro, R., Acar, E., & Kolda, T. G. (2008). Resolving the sign ambiguity in the singular value decomposition.\n", " Journal of Chemometrics: A Journal of the Chemometrics Society, 22(2), 135-140.\n", " URL: https://prod-ng.sandia.gov/techlib-noauth/access-control.cgi/2007/076422.pdf\n", " \"\"\"\n", " # SDV dimensions:\n", " # U, S, V = np.linalg.svd(X, full_matrices=False)\n", " # X = U @ diag(S) @ V\n", " # (I,J) = (I,K) @ (K,K) @ (K,J)\n", " \n", " U, S, V = np.linalg.svd(X, full_matrices=False)\n", " \n", " I = U.shape[0]\n", " J = V.shape[1]\n", " K = S.shape[0]\n", " \n", " assert U.shape == (I,K)\n", " assert V.shape == (K,J)\n", " assert X.shape == (I,J)\n", " \n", " s = {'left': np.zeros(K), 'right': np.zeros(K)}\n", "\n", " for k in range(K):\n", " mask = np.ones(K).astype(bool)\n", " mask[k] = False\n", " # (I,J) = (I,K-1) @ (K-1,K-1) @ (K-1,J)\n", " Y = X - (U[:, mask] @ np.diag(S[mask]) @ V[mask, :])\n", " \n", " for j in range(J):\n", " d = np.dot(U[:, k], Y[:, j])\n", " s['left'][k] += np.sum(np.sign(d) * d**2)\n", " for i in range(I):\n", " d = np.dot(V[k, :], Y[i, :])\n", " s['right'][k] += np.sum(np.sign(d) * d**2)\n", "\n", " for k in range(K):\n", " if (s['left'][k] * s['right'][k]) < 0:\n", " if abs(s['left'][k]) < abs(s['right'][k]):\n", " s['left'][k] = -s['left'][k]\n", " else:\n", " s['right'][k] = -s['right'][k]\n", " U[:, k] = U[:, k] * np.sign(s['left'][k])\n", " V[k, :] = V[k, :] * np.sign(s['right'][k])\n", " \n", " return U, S, V\n", "\n", "def svd_regulated(X):\n", " u, s, v = svd_without_sign_ambiguity(X)\n", " if X.shape[0] < X.shape[1]: # bus\n", " u, v = svd_flip_normal(u, v)\n", " else: # rocket\n", " u, v = svd_flip_normal(u, v, u_based_decision=False)\n", " return u, s, v" ] }, { "cell_type": "code", "execution_count": 2, "id": "5821d52a", "metadata": {}, "outputs": [], "source": [ "from sympy import Matrix, Array\n", "import sympy as sp\n", "import numpy as np\n", "\n", "\n", "def autodiff_svd(A, q_star, *, symbolic=False):\n", " \"\"\"\n", " Automatic Differentiating of SVD\n", " \n", " It takes matrix A, return differentiation of U, S and V with respect\n", " to q_star. U, S and VT(V.T) are result of standard SVD. This method\n", " is implemented in reverse mode which is more effeciency since all\n", " free variables q take advantages of same intermedia SVD result,\n", " U_star, S_star and V_star. It's return ecomonic results which means \n", " full_matrices=False in corresponding np.linalg.svd \n", "\n", " Parameters\n", " ----------\n", " A : (m, n) Matrix\n", " A sympy symbolic 2D Matrix, which should be legal for SVD later.\n", " q_star : dict\n", " A dict from sympy symbolic variable to value.\n", " symbolic: bool, optional\n", " Control return type. Return numerical result, numpy.ndarray, if\n", " true(default), otherwise return symbolic result, sympy.Array.\n", " \n", " Returns\n", " -------\n", " dU : (Q, m, k) array\n", " Partial derivatives/gradient of U with respect to each variables in\n", "\n", " q_star. Each variable matches a (..., m, k) matrix. Q is number of\n", " variables. Return sympy.Array if symbolic=True.\n", " dS : (Q, k, k) array\n", " Partial derivatives/gradient of S with respect to each variables in\n", " q_star. Each variable matches a (..., m, k) matrix. k=min(m, n).\n", " (..., k, k) is a strict diagonal matrix which is guaranteed by\n", " Hadamard operation. check examples for details.\n", " Return sympy.Array if symbolic=True.\n", " dV : (Q, n, k) array\n", " Partial derivatives/gradient of V with respect to each variables in\n", " q_star. Each variable matches a (..., n, k) matrix. Return\n", " sympy.Array if symbolic=True.\n", " \n", " See Also\n", " --------\n", " numpy.linalg.svd : Similar function in numpy.\n", " sympy.Array : 3D tensor in sympy.\n", " sympy.Matrix.jacobian : jacobian calculation in sympy.\n", " \n", " References\n", " ----------\n", " .. [1] James Townsend. Differentiating the Singular Value Decomposition.\n", " .. [2] Mike Giles. An extended collection of matrix derivative results\n", " for forward and reverse mode algorithmic differentiation.\n", " \n", " Examples\n", " --------\n", " >>> from sympy import sin, cos, symbols, Matrix\n", " >>> q0, q1 = symbols('q0, q1')\n", " ... q_star = {q0: 1, q1: 2}\n", " ... A = Matrix([[2*q0, 1, q0**3 + 2*q1, sin(q0) + q1], \n", " [q0**2 - q1**2, 2*q1, 2*q0 + q1**3, cos(q0) + q1], \n", " [q0**2 + q1**2, 2, 3, sin(q0) + cos(q1)]])\n", "\n", " >>> dU, dS, dV = autodiff_svd(A, q_star)\n", " >>> dU.shape\n", " (2, 3, 3)\n", " >>> dS.shape\n", " (2, 3, 3)\n", " >>> dV.shape\n", " (2, 4, 3)\n", " >>> dU_sym, dS_sym, dV_sym = autodiff_svd(A, q_star, symbolic=True)\n", "\n", " \"\"\"\n", "\n", " m, n = A.shape\n", " k = min(m, n)\n", " Q = len(q_star)\n", "\n", " # For a specific q0 and q1, get U_star, S_star and V_star_T.\n", " A_star = np.array(A.subs(q_star), dtype = float)\n", " U_star, S_star_1d, V_star_T = svd_regulated(A_star) # U_star (m, k), S_star_1d (k, 1), V_star_T (n, n)\n", " U_star = U_star[:, :k] # Use economic U (m, k)\n", " V_star_T = V_star_T[:k, :] # Use economic VT (k, n)\n", "\n", " S_star = np.diag(S_star_1d) # Construct S (k, k)\n", " S_star_inv = np.diag(np.reciprocal(S_star_1d))\n", "\n", " Ik = sp.eye(k)\n", " U_star_T = U_star.T\n", " V_star = V_star_T.T\n", "\n", " # F\n", " F_star = Matrix([[0 if i == j else 1 / (S_star_1d[j]**2 - S_star_1d[i]**2) for j in range(k)] for i in range(k)])\n", " \n", " # dA\n", " # Reshape A to 1d vector for convenience of jacobian calculation, will reshape back to 2d later\n", " A_element_wise = A.reshape(m * n, 1)\n", " \n", " q_var = Matrix(list(q_star.keys())) # [q_0, q_1, q_i..., q_Q-1]\n", " dA_element_wise = A_element_wise.jacobian(q_var) # in shape (m*n, Q)\n", " \n", " # Reserve list of dA, dS, dU and dV for each q variables. E.X. dA_lst[i] is dA with respect to qi\n", " dA_lst = [None] * Q\n", " dS_lst = [None] * Q\n", " dU_lst = [None] * Q\n", " dV_lst = [None] * Q\n", " \n", " Im = sp.eye(m)\n", " In = sp.eye(n)\n", " for i in range(Q):\n", " # Get matrix dA of eq(6) wrt qi, each element in dA_lst is dA_wrt_qi. dA_wrt_qi is element-wise.\n", " dA_lst[i] = dA_element_wise.T[i, :].reshape(m, n)\n", " # Use @ instead of * force all binary operation between np.array(s) and sp.Matrix(s) using matrix product.\n", " dU_lst[i] = U_star @ F_star.multiply_elementwise(U_star_T @ dA_lst[i] @ V_star @ S_star \\\n", " + S_star @ V_star_T @ dA_lst[i].T @ U_star) \\\n", " + (Im - U_star @ U_star_T) @ dA_lst[i] @ V_star @ S_star_inv\n", " dS_lst[i] = Ik.multiply_elementwise(U_star_T @ dA_lst[i] @ V_star)\n", " dV_lst[i] = V_star @ F_star.multiply_elementwise(S_star @ U_star_T @ dA_lst[i] @ V_star \\\n", " + V_star_T @ dA_lst[i].T @ U_star @ S_star) \\\n", " + (In - V_star @ V_star_T) @ dA_lst[i].T @ U_star @ S_star_inv\n", "\n", " # Build a 3D tensor for all q variables\n", " dU_tensor = Array([e.tolist() for e in dU_lst]) # (Q, m, k)\n", " dS_tensor = Array([e.tolist() for e in dS_lst]) # (Q, k, k)\n", " dV_tensor = Array([e.tolist() for e in dV_lst]) # (Q, n, k)\n", " \n", " if symbolic:\n", " return dU_tensor, dS_tensor, dV_tensor\n", " else:\n", " return (np.array(dU_tensor.subs(q_star), dtype=float),\n", " np.array(dS_tensor.subs(q_star), dtype=float),\n", " np.array(dV_tensor.subs(q_star), dtype=float))" ] }, { "cell_type": "code", "execution_count": 3, "id": "0cf91aac", "metadata": {}, "outputs": [], "source": [ "# Test input\n", "from sympy import sin, cos, symbols, Matrix\n", "q0, q1, q2, q3 = symbols('q0, q1, q2, q3')\n", "q_star = {q0: 1, q1: 2, q2: 3, q3:4}\n", "A = Matrix([[2*q0, 1, q2**3 + 2*q3, sin(q0) + q1], \n", " [q0**2 - q1**2, 2*q1, 2*q0 + q1**3, cos(q0) + q1], \n", " [q0**2 + q1**2, 2, 3, sin(q0) + cos(q1)]])\n", "\n", "dU_of_MatrixA, dS_of_MatrixA, dV_of_MatrixA = autodiff_svd(A, q_star)\n", "\n", "# Extract jacobian of normal vector\n", "# if 3 x N\n", "j_ori_n_of_A = dU_of_MatrixA[:, :, -1].T" ] }, { "cell_type": "code", "execution_count": 4, "id": "e9132bc1", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[-0.14231918, -0.21932216, 0.19752725, 0.01463165],\n", " [ 0.51855359, -0.01168378, 0.02158681, 0.00159902],\n", " [-0.53767579, -0.06999999, 0.05269224, 0.00390313]])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "j_ori_n_of_A" ] }, { "cell_type": "code", "execution_count": 5, "id": "a90da873", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "B = A.T\n", "dU_of_MatrixB, dS_of_MatrixB, dV_of_MatrixB = autodiff_svd(B, q_star)\n", "j_ori_n_of_B = dV_of_MatrixB[:, :, -1].T\n", "\n", "# Check uniqueness\n", "np.allclose(j_ori_n_of_A, j_ori_n_of_B)" ] }, { "cell_type": "markdown", "id": "4dab0103", "metadata": {}, "source": [ "### Compare with numerical solution\n", "\n", "Take $U$ as a example\n", "$$\\frac{\\partial U(\\textbf{q})} {\\partial q_i} = \\frac {U(\\textbf{q} + \\Delta \\textbf{q}_i) - U(\\textbf{q})}{\\Delta q_i}$$\n", "\n", "where $\\Delta \\textbf{q}_i$ is vector that only index $i$ has value $\\epsilon$\n", "\n", "$$\\Delta \\textbf{q}_i = [0, ..., 0, \\epsilon, 0, ..., 0]^T$$\n", "\n", "$U(\\textbf{q} + \\Delta \\textbf{q}_i)$ is derived from standard **SVD** from $A(\\textbf{q} + \\Delta \\textbf{q}_i)$. $S$ and $V$ are similar." ] }, { "cell_type": "code", "execution_count": 6, "id": "0bbf5403", "metadata": {}, "outputs": [], "source": [ "# Numerical method /pytest control group\n", "# when Q = 2, aka q = [q0, q1]^T\n", "\n", "# For a specific q0 and q1, get U_star, S_star and V_star_T.\n", "m, n = A.shape\n", "k = min(m, n)\n", "Q = len(q_star)\n", "\n", "\n", "\n", "# For a specific q0 and q1, get U_star, S_star and V_star_T.\n", "A_star = np.array(A.subs(q_star), dtype = float)\n", "U_star, S_star_1d, V_star_T = svd_regulated(A_star) # U_star (m, k), S_star_1d (k, 1), V_star_T (n, n)\n", "\n", "S_star = np.diag(S_star_1d) # Construct S (k, k)\n", "U_star = U_star[:, :k] # Use economic U (m, k)\n", "V_star_T = V_star_T[:k, :] # Use economic VT (k, n)\n", "\n", "dq = 1e-7\n", "U_star_plus_dq = np.empty((Q, m, k))\n", "S_star_plus_dq = np.empty((Q, k, k))\n", "V_star_plus_dq_T = np.empty((Q, k, n))\n", "\n", "dS_numerical_q = np.empty((Q, k, k)) \n", "dU_numerical_q = np.empty((Q, m, k))\n", "dVT_numerical_q = np.empty((Q, k, n))\n", "dV_numerical_q = np.empty((Q, n, k))\n", "\n", "for i in range(Q):\n", " qi_star_plus_dq = q_star.copy()\n", " qi_star_plus_dq[eval('q' + str(i))] += dq\n", " \n", " tmp_U, S_star_plus_dqi_1d, tmp_VT = svd_regulated(np.array(A.subs(qi_star_plus_dq), dtype=float))\n", "\n", " U_star_plus_dq[i] = tmp_U[:, :k]\n", " S_star_plus_dq[i] = np.diag(S_star_plus_dqi_1d)\n", " V_star_plus_dq_T[i] = tmp_VT[:k, :]\n", " dS_numerical_q[i] = (S_star_plus_dq[i] - S_star) / dq\n", " dU_numerical_q[i] = (U_star_plus_dq[i] - U_star) / dq\n", " dVT_numerical_q[i] = (V_star_plus_dq_T[i] - V_star_T) / dq # Use economic VT (k, n)\n", " dV_numerical_q[i] = dVT_numerical_q[i].T" ] }, { "cell_type": "code", "execution_count": 7, "id": "b84e1d70", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "True" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Error Check\n", "np.allclose(dU_of_MatrixA, dU_numerical_q) \\\n", "and np.allclose(dS_of_MatrixA, dS_numerical_q) \\\n", "and np.allclose(dV_of_MatrixA, dV_numerical_q)" ] }, { "cell_type": "code", "execution_count": null, "id": "e71d74a8", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "81227b18", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.0" } }, "nbformat": 4, "nbformat_minor": 5 }