Using deep learning techniques, a group of researchers has trained a computer to recognise events in videos on YouTube — even the ones the software has never seen before like riding a horse, baking cookies or eating at a restaurant.
Researchers from Disney Research and Shanghai’s Fudan University used both scene and object features from the video and enabled link between these visual elements and each type of event to be automatically determined by a machine-learning architecture known as neural network.
“Notably, this approach not only works better than other methods in recognising events in videos, but is significantly better at identifying events that the computer programme has never or rarely encountered previously,” said Leonid Sigal, senior research scientist at Disney Research.
Automated techniques are essential for indexing, searching and analysing the incredible amount of video being created and uploaded daily to the Internet.
“With multiple hours of video being uploaded to YouTube every second, there is no way to describe all of that content manually. If we don’t know what’s in all those videos, we can’t find things we need and much of the videos’ potential value is lost,” noted Jessica Hodgins, vice president at Disney Research.
When presented with an event that it has not previously encountered, the model can identify objects and scenes that it already has associated with similar events to help it classify the new event.