PV-DM
The word pairs in PV-DM are arranged as shown in Figure 8.4. It adds the paragraph IDs in the texts then uses a sliding window to form word pairs. For example, Paragraph “1” has “Jupiter overtakes Saturn...”, so the word pairs are (“1”, Saturn), (Jupiter, Saturn), and (overtakes, Saturn).
Figure 8.4 – An overview of Data preparation for PV-DM
Figure 8.5 shows the neural network for PV-DM.
Figure 8.5 – PV-DM
The word pairs In Figure 8.4 are the inputs for the input and output layers. Different from the word pairs of PV-DBOW in Figure 8.3, the word pairs in Figure 8.4 only have one instance for the paragraph ID. Each paragraph ID is one-hot encoded as a 1 x 500 vector. Again, for example, paragraph “123” shall become a 1 x 500 vector where the 123rd element in the array is 1 and the rest are zeros. All the words are one-hot encoded to be 1 x 10,000 vectors...