VISIIR
VIsual Seek for Interactive Image Retrieval


Try the demo!


Click here to select or take a picture of a meal

The project


VIsual Seek for Interactive Image Retrieval (VISIIR) is a project aiming at exploring new methods for semantic image annotation. This refers to the ability of predicting a semantic concept based on the visual content of the image. This topic is extensively studied for more than a decade now due to its large number of applications in areas as diverse as Information Retrieval, Computer Vision, Image Processing, and Artificial Intelligence.

Scientific website

About the demo


This demo uses a deep convolutional neural network (CNN) to recognize food images across 101 categories. This CNN follows the architecture of the OverFeat CNN (Sermanet et al. 2013). It has been initialized with the weights of the network trained on the ImageNet dataset and was then re-trained (fine-tuned) on our own dataset for food recognition.

Dataset UPMC Food-101


For this project, we created the UPMC Food-101 dataset. This dataset contains 101 food categories. For each of them constituted, we gathered around 800 to 950 images from a Google Image seach of the title of the category. Because of this, this dataset may contain some noise. Below are 6 randomly chosen images from the dataset. Fell free to explore this dataset more and give us some feedback about the images.

Category
Bruschetta
Category
Prime Rib
Category
Croque Madame
Category
French Fries
Category
Beet Salad
Category
Spring Rolls

About the project


VIsual Seek for Interactive Image Retrieval (VISIIR) is a project aiming at exploring new methods for semantic image annotation. This topic is extensively studied for more than a decade now due to its large number of applications in areas as diverse as Information Retrieval, Computer Vision, Image Processing, and Artificial Intelligence. Semantic annotation refers to the ability of predicting a semantic concept based on the visual content of the image. Filling the semantic gap between visual data and concepts is the main goal followed by researchers in the field. In supervised learning, a large amount of labeled data is mandatory to provide effective semantic annotation tools. In interactive Image Retrieval Systems (CBIR), the annotation requires to formulate the user query with an example, i.e. an image. User feedback interaction is commonly used to interactively refine a query concept by asking the user whether some selected images are relevant or not. To be effective, one major challenge in interactive CBIR is to minimize the required number of feedback loops to grasp the semantic query of the user.

The VISIIR project proposes new interactive methods for providing powerful semantic annotation systems. The originality of the proposal is three-fold:

  • Eye-tracker-driven system. A major specificity of the project is to use the latest eye-tracker techniques for validating and improving the vision and learning models developed by the academic partners.
  • New paradigm for visual representation and learning. We introduce a novel learning scheme combining supervised and interactive methods.
  • Web filtering for food annotation. The new methods developed in the project will be validated in a specific application dedicated to retrieve, filter and classify images of recipes.

In terms of methodology, the first lock for semantic annotation relies on the representation of visual content. In order to make one step further compared to current state-of-the-art methods, we want to develop new bio-inspired representations. One key idea is to provide a hybrid representation, combining visual saliency models and unsupervised deep networks. In the second part of VISIIR, we design new interactive learning schemes. We exploit the additional source of information provided by the eye-tracker to boost the learning quality (i.e. the active learning convergence), at two different levels. Firstly, eye-tracker features are used in conjunction to user’s annotation to jointly optimize the classification function and the visual representations learned off-line in task 1. In addition to this gaze analysis purpose, we propose to use the eye-tracker to control the learning process and develop new Human-Computer Interactions (HCI). Typically, eye-tracking statistics will act as user feedback. Finally, one strong axis of VISIIR is the rigorous evaluation of the proposed semantic annotation methods in a specific web filtering application dedicated to food retrieval. A complete database will be provided through the project with the goal of finding images of recipes. This fine-grained classification task will be used as a use case to validate the visual representations and interactive learning methods of task 1-2. A methodological aspect addressed in this task is the scalability of the interactive search when applied to the huge amount of images harvested from the web. We want to tackle this scalability lock by marrying efficient hashing structures for indexing and search with exploration techniques.

To carry out VISIIR, the various required skills will be provided by the consortium partners. UPMC will bring skills in image classification and statistical learning, I3S on CBIR and scalability, L3I on visual saliency and attentional models, and Tobii on Eye-tracker technology.

Consortium


LIP6

Université Pierre et Marie CURIE

I3S

Université Nice Sophia Antipolis

L3I

Université de La Rochelle

Tobii

Industrial Partner

ANR

Financer