Before proceeding, depending on your system, you may need to clean up the memory a bit and free space for machine learning models from previously used data structures. This is done using gc.collect, after deleting any past variables not required anymore, and then checking the available memory by exact reporting from the psutil.virtualmemory function:
import gc
import psutil
del([tfv_q1, tfv_q2, tfv, q1q2,
question1_vectors, question2_vectors, svd_q1,
svd_q2, q1_tfidf, q2_tfidf])
del([w2v_q1, w2v_q2])
del([model])
gc.collect()
psutil.virtual_memory()
At this point, we simply recap the different features created up to now, and their meaning in terms of generated features:
- fs_1: List of basic features
- fs_2: List of fuzzy features
- fs3_1: Sparse data matrix of TFIDF for separated questions
- fs3_2: Sparse data matrix of TFIDF for combined questions...