The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However it does not model the temporal information of the video stream. We are working at a novel method to introduce temporal information within the BoW approach by modeling a video clip as a sequence of histograms of visual features, computed from each frame using the traditional BoW model.
The sequences are treated as strings where each histogram is considered as a character. Event classification of these sequences of variable size, depending on the length of the video clip, are performed using SVM classifiers with a string kernel (e.g using the Needlemann-Wunsch edit distance). Experimental results, performed on two domains, soccer video and TRECVID 2005, demonstrate the validity of the proposed approach.
- , vol. 48, pp. 69–87, May, 2010. Multimedia Tools and Applications
- , vol. 51, pp. 279–302, January, 2011. Multimedia Tools and Applications
- , Chania, Crete, June, 2009. Proc. of IEEE International Workshop on Content-Based Multimedia Indexing (CBMI)
- , Salerno, Italy, September, 2009. Proc. of International Conference on Image Analysis and Processing (ICIAP)