Coding truncatedSVD with scikit-learn
Using scikit-learn
can help us to understand m, A, U, Σ, V T. The main class is truncatedSVD()
. Let’s assume matrix A is a 5 x 10 matrix. In LSI, it means there are 5 documents and 10 unique words. Let’s fill in random integer values between 1 and 50 (low = 1 and high = 50):
import numpy as npA = np.random.randint(low=1, high=50, size = (5,10)) print(A)
The output looks like this:
[[ 2, 23, 38, 24, 32, 20, 22, 38, 4, 6],[35, 20, 47, 49, 29, 39, 15, 15, 8, 28], [35, 8, 47, 2, 40, 24, 21, 37, 12, 25], [43, 41, 22, 41, 27, 45, 41, 31, 36, 28], [19, 17, 8, 39, 40, 24, 43, 16, 33, 22]]
We will decompose matrix A using TruncatedSVD
.
Using TruncatedSVD
The TruncatedSVD
function of sklearn.decomposition
takes an input parameter, n_components
. Let’s declare the TruncatedSVD
object (known as svd
) and assume there are three topics (n_components=3
). I will explain n_components
later:
from sklearn.decomposition...