In a paper published earlier this month, a team of AI researchers at Facebook have been looking closely at how AI agents ‘understand’ images and the extent to which they can be said to develop a shared conceptual language. Building on earlier research that indicates “(AI) agents are now developing conceptual symbol meanings,” the Facebook research team attempted to dive deeper and look closely at how AI agents develop representations of visual inputs.
What they found was intriguing - the conceptual ‘language’ that the AI agents seemed to share wasn’t in any way related to the input data, but instead what the researchers describe as a ‘shared conceptual pact’.
This research is significant as it opens the lid on how agents in deep learning systems, and opens up new possibilities for understanding how they work.
Researchers take their cue from current research into AI agents. This research runs visual ‘games’..“This… allows us to address the exciting issue of whether the needs of goal-directed communication will lead agents to associate visually-grounded conceptual representations to discrete symbols, developing natural language-like word meanings” reads the paper.
However, most of the existing studies present only the analysis of the agents’ symbol usage. Very little attention is given to the representation of the visual input developed by the agents during the interaction process. Researchers have made use of the referential games of Angeliki Lazaridou, a research scientist at Deepmind, where a pair of agents communicates about images using a fixed-size vocabulary.
“Unlike in those previous studies, which suggested that the agents developed a shared understanding of what the images represented, our researchers found that they extracted no concept-level information”, reads the research paper. The paired AI agents would arrive at an image-based decision depending only on the low-level feature similarities.
Researchers implemented Lazaridou’s, same-image game and the different image game. In the same image game, the Sender and Receiver are shown the same two images (that are always of different concepts). In the different-image game, the Receiver is shown different images than the Sender’s every time. The experiments were repeated using 100 random initialization seeds. Researchers first looked at how playing the game affects the way agents “see” the input data. This involves figuring out which of the image embeddings differ from the input image representations, and from each other.
Researchers then further predicted that as the training continues, Sender and Receiver representations become quite similar to each other, as well as the input ones. To finally compare the similarity structure of the input, Sender and the Receiver spaces, representational similarity analysis (RSA) from computational neuroscience is used by the researchers.
The paired agents in the game arrived at an ‘image-based consensus’ depending solely on low-level feature similarities, without determining, for instance, that pictures of a Boston terrier and a Chihuahua both represent dogs. In fact, the agents were able to reach this consensus despite being presented with similar patterns of visual noise, which included no recognizable objects. This confirmed the hypothesis that the Sender and Receiver are capable of communicating about the input data with no conceptual content at all.
This, in turn, suggests that no concept-level information (e.g., features that would allow to identify the instances of the dog or chair category) has been extracted by the agents during the training process.
For more information, check out the official research paper.
UK researchers have developed a new PyTorch framework for preserving privacy in deep learning
Researchers show that randomly initialized gradient descent can achieve zero training loss in deep learning
UK researchers build the world’s first quantum compass to overthrow GPS