Much like humans watch tutorial videos to learn some new things, Cornell University researchers are teaching robots to watch instructional videos – then come up with some step-by-step instructions to perform a particular task. According to Cornell, this method will become so effective that, you won’t even have to turn on the DVD player – the robot can look up what it needs on YouTube.
Robots watch videos on how-to topics and a computer finds instructions they have in common. (Source: Cornell University)
The Cornell researchers’ project, RoboWatch, is based on thoughts of a future when personal robots can perform daily household chores like cooking, doing laundry, and feeding pets. Part of what makes this concept possible is that there is a common underlying structure to most how-to videos and plenty of source material available. For example, YouTube offers 180,000 videos on “How to make an omelet” and 281,000 on “How to tie a bowtie.” So if a computer were to scan numerous videos on the same task, it would be able to find what they all have in common and come up with step-by-step instructions.
According to graduate student Ozan Sener, a key aspect of the system is that it is “unsupervised,” whereas in most previous work, robot learning is accomplished with the help of humans explaining different observations. For example, teaching a robot to recognize objects by showing it pictures of the objects while a human labels them by name. Here, a robot with a job to do can look up the instructions and figure them out for itself.
How it works
If the robot doesn’t know how to perform a task, its computer brain sends a query to YouTube to find a collection of how-to videos on the topic. The algorithm includes routines to omit “outliers,” or videos that fit the keywords but are not instructional. The computer then scans the videos frame by frame, looking for objects that appear often, and reads the accompanying narration (using subtitles) in search of frequently repeated words. Using these markers it matches similar segments in the various videos and orders them into a single sequence.
Using the subtitles, it can even create written instructions. In the future, information from other sources such as Wikipedia might be added.The learned knowledge from the YouTube videos is made available via RoboBrain, an online knowledge base that robots can consult to help them do their jobs.