LASER maps a sentence in any language to a point in a high-dimensional space such that the same sentence in any language will end up in the same neighborhood. This representation could also be a universal language in a semantic vector space. The Facebook post reads, “We have observed that the distance in that space correlates very well to the semantic closeness of the sentences.” The sentence embeddings are used for initializing the decoder LSTM through a linear transformation and are also concatenated to its input embeddings at every time step.
The approach behind this project is based on neural machine translation, an encoder/decoder approach which is also known as sequence-to-sequence processing. LASER uses one shared encoder for all input languages and a shared decoder for generating the output language.
LASER uses a 1,024-dimension fixed-size vector for representing the input sentence. The decoder is instructed about which language needs to be generated. As the encoder has no explicit signal for indicating the input language, this method encourages it to learn language-independent representations.
The team at Facebook AI-research has trained their systems on 223 million sentences of public parallel data, aligned with either English or Spanish. By using a shared BPE vocabulary trained on the concatenation of all languages, it was possible to benefit low-resource languages from high-resource languages of the same family.
LASER achieves excellent results in cross-lingual natural language inference (NLI). The Facebook’s AI research team considers the zero-shot setting as they train the NLI classifier on English and then apply it to all target languages with no fine tuning or target-language resources.
The distances between all sentence pairs are calculated and the closest ones are selected. For more precision, the margin between the closest sentence and the other nearest neighbors is considered. This search is performed using Facebook’s FAISS library.
The team outperformed the state of the art on the shared BUCC task by a large margin. The team improved the F1 score from 85.5 to 96.2 for German/English, from 81.5 to 93.9 for French/English, from 81.3 to 93.3 for Russian/English, and from 77.5 to 92.3 for Chinese/English.
To know more about LASER, check out the official post by Facebook.
Trick or Treat – New Facebook Community Actions for users to create petitions and connect with public officials
Russia opens civil cases against Facebook and Twitter over local data laws
FTC officials plan to impose a fine of over $22.5 billion on Facebook for privacy violations, Washington Post reports