Project Video At Your Fingertips (N7)
The explosive growth of multimedia means more data to search in order to find the right information. This brings along a growing need for methods and techniques for automatic indexing, organizing and structuring of audiovisual data, using keywords or -phrases. These methods and techniques (‘tools’) add intelligence to the existing functionality of multimedia equipment, such as VCR’s and video cameras. The tools function on underlying algorithms that can extract semantic significance from signals.
Within MultimediaN, the project Video at Your Fingers occupies itself with this kind of VCA-algorithms, VCA meaning Video Content Analysis. The project distinguishes four types of VCA-algorithms.
First of all, an algorithm needs to be capable of distinguishing different scenes in order to be able to analyze a data stream. These can be scenes from a movie or items in a news show. This first type of VCA algorithms is called parsing.
The second type of algorithms detects a specific object or happening in video images. For example detecting the faces of hooligans on a security video (object) or finding the goals scored in a soccer match broadcast on tv (happening).
Semantic scene interpretation
This type is created by combining the first two types of algorithms. This algorithm can draw conclusions regarding the general content of a video. For example, a sequence of scenes that alternately show two faces indicates that this group of scenes is a dialogue. Furthermore, detecting people on a security video and monitoring their number, their individual actions (walking, running) and group actions (fighting) within a certain timeframe can draw a conclusion on exactly how suspect they are. Using this conclusion, a security camera that works with this type of VCA algorithm can automatically alarm the police.
This fourth type of VCA algorithms adds mood and emotion to the cognitive criteria of the first three algorithms (objects, scenes, events, stories, themes, etc.). The affective VCA algorithm is able to measure the amount of feelings or emotions in a signal. A good practical example is the search for highlights (and low points) in sports programs (goals, penalties, yellow and red cards) or pop concerts.
Another example is attuning image, video, and music to personal preferences. In the future, when a person watches a video, intuitive personalization mechanisms in the VCR will record the viewer’s preferences. At the next movie the VCR will indicate whether that one also fits the viewer’s preferences. Combining these local mechanisms in VCR’s creates peer to peer networks.
Consumers can expect a flourishing market for video devices, equipped with intelligence to unlatch content. We are talking about the latest generation of personal VCR’s that are able to make a pre-selection out of the explosively growing television supply. But also about mobile video devices that can show a personal recap of the most recent news programs, and highlights of soccer matches or pop concerts.
For the professional market, applications are possible in the area of safety and people monitoring with smart cameras, equipped with VCA algorithms. The business market can profit from the same type of cameras. An economic application could be the monitoring of consumer behavior in shopping centers, which can provide valuable information for an optimal product offer.
The researchers in the MultimediaN project Video At Your Fingers (Philips and TU Delft) work together with Fabchannel and Auxilium. Both first users develop new products with this intelligence and test them in practice.
Project leader: Dr. Alan Hanjalic, Technische Universiteit Delft
Learning Features (N1)
Multimodal Interaction (N2)
Ambient Multimedia Databases (N3)
Semantic Multimedia Access (N5)
Professional's Dashboard (N6)
Video At Your Fingertips (N7)
PERsonal Information Services (N9MI)