Researchers at MIT and the Qatar Computing Research Institute (QCRI) have developed a system that can automatically convert 2-D video into 3-D video by using graphics-rendering software that appears in popular sports video games.
The researchers’ converted video can be played on any 3-D device such as a 3-D TV, Google Cardbord, or Oculus Rift.
“Any TV these days is capable of 3-D,” says Wojciech Matusik, an associate professor of electrical engineering and computer science at MIT. “There’s just no content. So we see that the production of high-quality content is the main thing that should happen. But sports is very hard. With movies, you have artists who paint the depth map. Here, there is no luxury of hiring 100 artists to do the conversion. This has to happen in real-time.”
The researchers have been trying to develop general-purpose systems that turn 2-D video into 3-D, but they have not been very successful, producing visual objects that take away from the actual 3D-viewing experience.
"Our advantage is that we can develop it for a very specific problem domain," says Matusik. "We are developing a conversion pipeline for a specific sport. We would like to do it at broadcast quality, and we would like to do it in real-time. What we have noticed is that we can leverage video games."
Video games that are played now come equipped with detailed 3-D maps of the environment being played in so when the player moves, the game knows how to adjust accordingly, and quickly to create a 2-D version of the 3-D scene that the player is viewing.
The MIT and QCRI researchers worked with this video game technology, but used the system backwards.
They practiced with a realistic Microsoft soccer game, "FIFA13", which was set to play over and over again. Then, they used a Microsoft video-game analysis tool to store screen shots of the action. Each time a screen shot was taken, it also produced the corresponding 3-D map.
They then employed a standard algorithm for gauging the difference between two images, winnowed out most of the screen shots and kept just the ones that best captured the range of possible viewing angles and player configurations that the game presented.
For every frame of 2-D video of an actual soccer game, the system looks forabout 10 screen shots in the database that best correspond to it. Then, it looks for the best matches between smaller regions of the video feed and smaller regions of the screen shots. When it finds the ones that match, it overlays information from the screen shots onto the corresponding sections of the video feed, stitches it up nicely, resulting in a 3-D effect.
According to the team, the system takes about a third of a second to process a frame of video. They are currently working to reduce the conversion time.
For more information, visit MIT’s website.
To contact the author of this article, email firstname.lastname@example.org