The Bag of Words paradigm has been the baseline from which several successful image classification solutions were developed in the last decade. These represent images by quantizing local descriptors and summarizing their distribution. The quantization step introduces a dependency on the dataset, that even if in some contexts significantly boosts the performance, severely limits its generalization capabilities. Differently, in this paper, we propose to model the local features distribution with a multivariate Gaussian, without any quantization.
Social interactions are so natural that we rarely stop wondering who is interacting with whom or which people are gathering into a group and who are not. Nevertheless, humans naturally do that neglecting that the complexity of this task increases when only visual cues are available. Different situations need different behaviors: while we accept to stand in close proximity to strangers when we at- tend some kind of public event, we would feel uncomfortable in having people we do not know close to us when we have a coffee. In fact, we rarely exchange mutual gaze with people we are not interacting with, an important clue when trying to discern different social clusters.
Augmented Reality and Humanity present the opportunity for more customization of the museum experience, such as new varieties of self-guided tours or real-time translation of interpretive. At the end of this year several companies will release wearable computers with a head-mounted display (such as Google or Vuzix). We’d like to investigate the usage of these devices for Cultural Heritage applications.
Object class recognition in images has been steadily gaining importance in the computer vision research community. In this paper we present a novel method to improve the flexibility of descriptor matching for image recognition by using local multiresolution pyramids in feature space.
The availability of measures of appearance of trademarks and logos in a video is important in fields of marketing and sponsoring. These statistics can, in fact, be used by the sponsors to estimate the number TV viewers that noticed them and then evaluate the effects of the sponsorship.
In many application scenarios digital images play a basic role and often it is important to assess if their content is realistic or has been manipulated to mislead watcher’s opinion. Image forensics tools provide answers to similar questions. We are working on a novel method that focuses in particular on the problem of detecting if a feigned image has been created by cloning an area of the image onto another zone to make a duplication or to cancel something awkward.
Building a general human activity recognition and classification system is a challenging problem, because of the variations in environment, people and actions. In fact environment variation can be caused by cluttered or moving background, camera motion, illumination changes. People may have different size, shape and posture appearance. Recently, interest-points based models have been successfully applied to the human action classification problem, because they overcome some limitations of holistic models such as the necessity of performing background subtraction and tracking. We are working at a novel method based on the visual bag-of-words model and on a new spatio-temporal descriptor.
The recognition of events in videos is a relevant and challenging task of automatic semantic video analysis. At present one of the most successful frameworks, used for object recognition tasks, is the bag-of-words (BoW) approach. However it does not model the temporal information of the video stream. We are working at a novel method to introduce temporal information within the BoW approach by modeling a video clip as a sequence of histograms of visual features, computed from each frame using the traditional BoW model.
Video is vital to society and economy. It plays a key role in the news, cultural heritage documentaries and surveillance, and it will soon be the natural form of communication for the Internet and mobile phones. Digital video will bring more formats and opportunities and it is certain the the consumer and the professional need advanced storage and search technology for the management of large-scale video assets. This project takes on the challenge of creating a substantially enhanced semantic access to video, implemented in a search engine.
In the context of visual surveillance one of the most important problem is the observation of human activity. This problem is greatly simplified when metric information can be computed. The goal of this project is to development and test new algorithms to determine metric information automatically by observing the scene.