ImageCLEF 2013: Scalable Concept Image Annotation. We got the best MAP!!!

“In this task, the objective is to develop systems that can easily change or scale the list of concepts used for image annotation. In other words, the list of concepts can also be considered to be an input to the system. Thus the system when given an input image and a list of concepts, its job is to give a score to each of the concepts in the list and decide how many of them assign as annotations. To observe this scalable characteristic of the systems, the list of concepts will be different for the development and test sets”.

About 40 research groups have been registered and 13 of these submitted their results.

Few days ago the ImageCLEF organizers has posted the results for the participant’s submissions. The UNIMORE group (G. Serra, C. Grana, M. Manfredi and R. Cucchiara) got the best result in term of mAP (UNIMORE_5 run: 45.6%)!! We worked hard but we are very happy for the obtained results!! The working notes paper will soon be available.

2nd International Workshop on Multimedia for Cultural Heritage

I am a co-organizer, with Costantino Grana and Johan Oomen, of the  2nd International Workshop on Multimedia for Cultural Heritage that will be held on the 9-10th of September 2013, in conjunction with the 17th International Conference on Image Analysis and Processing (ICIAP), Naples, Italy.

Multimedia technologies have recently created the conditions for a true revolution in the Cultural Heritage area, with reference to the study, valorization, and fruition of artistic works. The use of these technologies allow creating new digital cultural experiences by means of personalized and engaging interaction.
New multimedia technologies could be used to design new approaches to the comprehension and fruition of the artistic heritage for example through smart, context-aware artifacts and enhanced interfaces with the support of features like story-telling, gaming and learning. To these aims, open and flexible platforms are needed, to allow building services that support use of cultural resources for research and education. A likely expectation is the involvement of a wider range of users of cultural resources in diverse contexts and considerably altered ways to experience and share cultural knowledge between participants.

Copy-Move Forgery Detection and Localization by Means of Robust Clustering with J-Linkage

Copy-Move Forgery Detection and Localization by Means of Robust Clustering with J-Linkage” by I. Amerini, L. Ballan, A. Del Bimbo, L. Del Tongo and G. Serra  has been accepted for publication in the Signal Processing: Image Communication.

Understanding if a digital image is authentic or not, is a key purpose of image forensics. There are several dierent tampering attacks but, surely, one of the most common and immediate one is copy-move. A recent and eective approach for detecting copy-move forgeries is to use local visual features such as SIFT. In this kind of methods, SIFT matching is often followed by a clustering procedure to group keypoints that are spatially close. Often, this procedure could be unsatisfactory, in particular in those cases in which the copied patch contains pixels that are spatially very distant among them, and when the pasted area is near to the original source. In such cases, a better estimation of the cloned area is necessary in order to obtain an accurate forgery localization. In this paper a novel approach is presented for copy-move forgery detection and localization based on the J-Linkage algorithm, which performs a robust clustering in the space of the geometric transformation. Experimental results, carried out on dierent datasets, show that the proposed method outperforms other similar state-of-the-art techniques both in terms of copy-move forgery detection reliability and of precision in the manipulated patch localization.

Context-Dependent Logo Matching and Recognition

Our paper “Context-Dependent Logo Matching and Recognition” by H. Sahbi, L. Ballan, G. Serra, and A. Del Bimbo has been accepted for publication by the IEEE Transactions on Image Processing.

We contribute through this work to the design of a novel variational framework able to match and recognize multiple instances of multiple reference logos in image archives.
Reference logos as well as test images, are seen as constellations of local feature (interest points, regions, etc.) and matched by minimizing an energy function mixing (i) a fidelity term that measures the quality of feature matching (ii) a neighborhood criterion which captures feature co-occurrence/geometry and (iii) a regularization term that controls the smoothness of the matching solution. We also introduce a detection/recognition procedure and we study its theoretical consistency. Finally, we show the validity of our method through extensive experiments on the challenging MICC-Logos dataset overtaking, by 20%, baseline as well as state-of-the-art matching/recognition procedures. We present also results on another public dataset, the FlickrLogos-27 image collection, to demonstrate the generality of our method.

Generative & discriminative models for classifying social images on the MICC-Flickr101 dataset

Combining Generative and Discriminative Models for Classifying Social Images from 101 Object Categories” has been accepted at ICPR’12. We use a hybrid generative-discriminative approach (LDA + SVM with non-linear kernels) over several visual descriptors (SIFT, GIST, colorSIFT).

We also present a novel dataset, called MICC-Flickr101, based on the popular Caltech 101 and collected from Flickr. This new collection was conceived with the idea of cloning a reference datasets as Caltech 101, but using realistic representations and also textual descriptions.

MATLAB implementation of the SIFT-based forensic method for copy-move detection

We release the MATLAB implementation of the copy-move detection approach presented in Amerini et al., TIFS 2011. We provide some scripts to replicate the detection experiments reported in our paper, and also some functions for copy-move detection in a single image. Please note that our code use several public functions and libraries developed by other authors; regarding these files, for any problem or license information, please refer to the respective authors. – released May 8, 2012   (tested on Linux Ubuntu 10.04)

If you use our software or these datasets, please cite the paper: I. Amerini, L. Ballan, R. Caldelli, A. Del Bimbo, G. Serra. “A SIFT-based forensic method for copy-move attack detection and transformation recovery”, IEEE Transactions on Information Forensics and Security, vol. 6, iss. 3, pp. 1099-1110, 2011.

Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos

Effective Codebooks for Human Action Representation and Classification in Unconstrained Videos” by L. Ballan, M. Bertini, A. Del Bimbo, L. Seidenari and G. Serra has been accepted for publication in the IEEE Transactions on Multimedia.

Recognition and classification of human actions for annotation of unconstrained video sequences has proven to be challenging because of the variations in the environment, appearance of actors, modalities in which the same action is performed by different persons, speed and duration and points of view from which the event is observed. This variability reflects in the difficulty of defining effective descriptors and deriving appropriate and effective codebooks for action categorization. In
this paper we propose a novel and effective solution to classify human actions in unconstrained videos. It improves on previous contributions through the definition of a novel local descriptor
that uses image gradient and optic flow to respectively model the appearance and motion of human actions at interest point regions. In the formation of the codebook we employ radiusbased clustering with soft assignment in order to create a rich vocabulary that may account for the high variability of human actions. We show that our solution scores very good performance with no need of parameter tuning. We also show that a strong reduction of computation time can be obtained by applying codebook size reduction with Deep Belief Networks with little loss of accuracy.

IRCDL 2012 – 8th Italian Research Conference on Digital Libraries

IRCDL is a yearly appointment for Italian researchers on Digital Libraries and related topics.

This year the focus of IRCDL is on legacy and cultural heritage material. Indeed, Digital Library Systems are getting more and more mature and largely deployed. Not only they have to ensure users effective and personalized access to information but it is also now time to face the need for smoothly processing and including in the DL repositories the available legacy and cultural heritage documents, in addition to born-digital ones. This calls for the ability to deal with compound objects in different media, to provide uniform solutions and methodologies across different cultural heritage institutions, and to take into account preservation, restoration, and curation.The IRCDL conferences have been launched and initially sponsored by DELOS, an EU FP6 Network of Excellence on digital libraries ( together with the Department of Information Engineering of the University of Padua. Over the years IRCDL has become a self-sustainable event sponsored and supported by the Italian Digital Library Research Community.

- Submission Deadline: December 18, 2011

Enriching and Localizing Semantic Tags in Internet Videos – ACM Multimedia 2011

The paper  ”Enriching and Localizing Semantic Tags in Internet Videos“ has been accepted by ACM Multimedia 2011.

Tagging of multimedia content is becoming more and more widespread as web 2.0 sites, like Flickr and Facebook for images, YouTube and Vimeo for videos, have popularized tagging functionalities among their users. These user-generated tags are used to retrieve multimedia content, and to ease browsing and exploration of media collections, e.g. using tag clouds. However, not all media are equally tagged by users: using the current browsers is easy to tag a single photo, and even tagging a part of a photo, like a face, has become common in sites like Flickr and Facebook; on the other hand tagging a video sequence is more complicated and time consuming, so that users just tend to tag the overall content of a video. In this paper we present a system for automatic video annotation that increases the number of tags originally provided by users, and localizes them temporally, associating tags to shots. This approach exploits collective knowledge embedded in tags and Wikipedia, and visual similarity of keyframes and images uploaded to social sites like YouTube and Flickr.

Space-time Zernike Moments and Pyramid Kernel Descriptors for Action Classification – ICIAP 2011

Our ICIAP paper “Space-time Zernike Moments and Pyramid Kernel Descriptors for Action Classification” is available online.

Action recognition in videos is a relevant and challenging task of automatic semantic video analysis. Most successful approaches exploit local space-time descriptors. These descriptors are usually carefully engineered in order to obtain feature invariance to photometric and geometric variations. The main drawback of space-time descriptors is high dimensionality and efficiency. In this paper we propose a novel descriptor based on 3D Zernike moments computed for space-time patches.Moments are by construction not redundant and therefore optimal for compactness. Given the hierarchical structure of our descriptor we pro-pose a novel similarity procedure that exploits this structure comparing features as pyramids. The approach is tested on a public dataset and compared with state-of-the art descriptors.


Subscribe to Front page feed