A pair of researchers, John Wieting, Carnegie Mellon University, and Douwe Kiela, Facebook AI Research, published a paper titled “No training required: Exploring random encoders for sentence classification”, earlier this week.
Sentence embedding refers to a vector representation of the meaning of a sentence. It’s most often created by a transformation of word embeddings using a composition function, which is often nonlinear and recurrent in nature. Most of these word embeddings get initialized using pre-trained embeddings. These sentence embeddings are then used as features for a collection of downstream tasks (that receive data from the server).
The paper explores three different approaches for computing sentence representations from pre-trained word embeddings (that use nothing but random parameterizations). It considers two well-known examples of sentence embeddings including, SkipThought (mentioned in Advances in neural information processing systems by Ryan Kiros), and InferSent (mentioned in Supervised learning of universal sentence representations from natural language inference data by Alexis Conneau). As mentioned in the paper, SkipThought took around one month to train, while InferSent requires large amounts of annotated data.
“We examine to what extent we can match the performance of these systems by exploring different ways for combining nothing but the pre-trained word embeddings. Our goal is not to obtain a new state of the art but to put the current state of the art methods on more solid footing”, states the researchers.
The paper mentions three different approaches for computing sentence representation from pre-trained word embeddings as follows:
In the paper, the researchers have diverged from the typical per-timestep ESN setting, and have instead used ESN to produce a random representation of a sentence. A bidirectional ESN is used where the reservoir states get concatenated for both directions. These states are then pooled over to generate a sentence representation.
For evaluation purposes, following set of downstream tasks are used: sentiment analysis (MR, SST), question-type (TREC), product reviews (CR), subjectivity (SUBJ), opinion polarity (MPQA), paraphrasing (MRPC), entailment (SICK-E, SNLI) and semantic relatedness (SICK-R, STSB). The three models are evaluated against random sentence encoders, InferSent and SkipThought models.
As per the results:
“The point of these results is not that random methods are better than these other encoders, but rather that we can get very close and sometimes even outperform those methods without any training at all, from just using the pre-trained word embeddings,” state the researchers.
For more information, check out the official research paper.
Amazon Alexa AI researchers develop new method to compress Neural Networks and preserves accuracy of system
Researchers introduce a machine learning model where the learning cannot be proved
Researchers introduce a deep learning method that converts mono audio recordings into 3D sounds using video scenes