MIT researchers have developed an unsupervised language translation model that could create fast, computer-based language translation software.
Google, Facebook and Amazon’s translation systems require training models that search for patterns in documents that were already translated by humans. When provided with new words, these models attempt to find a matching word or phrase in another language.
This technique is satisfactory, but these systems are sometimes inaccurate because not all words and phrases have a perfect translation in other languages. It is also time-consuming and hard to gather all the correct information.
Researchers sought out to create a more effective and accurate translation system. The new model uses a statistics metric called Gromov-Wasserstein distance.
"The model sees the words in the two languages as sets of vectors, and maps [those vectors] from one set to the other by essentially preserving relationships," said the paper's co-author Tommi Jaakkola, a CSAIL researcher and the Thomas Siebel Professor in the department of electrical engineering and computer science and the Institute for Data, Systems and Society. "The approach could help translate low-resource languages or dialects, so long as they come with enough monolingual content."
“The model represents a step toward one of the major goals of machine translation, which is fully unsupervised word alignment," said first author David Alvarez-Melis, a CSAIL Ph.D student. "If you don't have any data that matches two languages ... you can map two languages and, using these distance measurements, align them."
When tested, the model performed just as well as current state-of-the-art monolingual models, but the new model is faster and uses less power.
Aligning word embeddings has traditionally needed adjustment to improve accuracy, which is inefficient and time-consuming. The team found that measuring and matching vectors based on relational distances is more efficient. The distance between the words on a vector doesn’t change, making it easier for the system to create a match.
"Those distances are invariant," Alvarez-Melis said. "By looking at distance, and not the absolute positions of vectors, then you can skip the alignment and go directly to matching the correspondences between vectors."
This is where the Gromov-Wasserstein metric comes in.
According to Alvarez-Melis, "If there are points, or words, that are close together in one space, Gromov-Wasserstein is automatically going to try to find the corresponding cluster of points in the other space. It's kind of like a ‘soft translation,’ because instead of just returning a single word translation, it tells you 'this vector, or word, has a strong correspondence with this word, or words, in the other language.'"
For example, the months of a year are clustered into 12 vectors.
"The model doesn't know these are months," Alvarez-Melis said. "It just knows there is a cluster of 12 points that align with a cluster of 12 points in the other language, but they're different to the rest of the words, so they probably go together well. By finding these correspondences for each word, it then aligns the whole space simultaneously."
The paper on the new model will be presented at the Conference on Empirical Methods in Natural Language Processing.