Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have developed a new system that lets robots inspect random objects and understand them visually enough to accomplish specific tasks without ever having seen the objects before.
The method may be a significant breakthrough in computer vision that has been hampered by robots not truly understanding the shape of objects even after making basic distinctions between them.
The method, called Dense Object Nets (DON), makes objects appear as a collection of points that serve as visual roadmaps to robots. This lets these robots better understand and manipulate items, even pick a specific object up among a clutter of similar objects. MIT CSAIL said the method could make robots in industrial warehouses at Amazon or Walmart smarter.
DON allows a robot to grab a specific spot on an object, such as the tongue of a shoe. It then can look at a shoe it has never seen before and successfully grab its tongue.
"Many approaches to manipulation can’t identify specific parts of an object across the many orientations that object may encounter,” said Lucas Manuelli, a MIT CSAIL student who led the research. “For example, existing algorithms would be unable to grasp a mug by its handle, especially if the mug could be in multiple orientations, like upright, or on its side."
While this technology would obviously be useful in industrial environments, it could also be used in homes where robots could clean a house after seeing a photo of a clean house or put away dishes once it sees where the dishes go in the home.
How They Did It
The DON system creates a series of coordinates on a given object to serve as a visual roadmap of the objects, which gives the robot a better understanding of what it needs to grasp and where.
The team trained the DON system to look at objects as a series of points that make up a larger coordinate system. The robot can then map different points to visualize an object’s 3D shape, similar to how panoramic photos are stitched together from multiple photos, MIT said.
In one set of tests on a soft caterpillar toy, a Kuka robotic arm using DON could grasp the toy’s right ear from a range of different configurations. This showed that the system has the ability to distinguish between left and right on objects, MIT said. Testing on baseball hats, DON could pick out a specific target hat despite all of the hats being very similar in design. This was done having never seen pictures of the hats.
“In factories robots often need complex part feeders to work reliably,” Manuelli said. “But a system like this that can understand objects’ orientations could just take a picture and be able to grasp and adjust the object accordingly.”
The next steps include improving the system to a point where it can perform specific tasks with a deeper understanding of the corresponding objects, like how to grasp an object or move it with the goal of something more complex like cleaning a desk.