
Finding photographs is now a major activity for computer users, but they suffer from information overload as their attempts to find the photographs of interest to them are frustrated by the enormous and increasing number of photographs. Useful metadata describing photographs bridges the semantic gap between what a photograph means to a user and what it means to computers. This enables search engines to perform the hard work of finding photographs for the user and to thereby alleviate the information overload. To enable the machine to retrieve photographs for the user, we must first examine how users mentally recall photographs themselves. Research indicates that users recall photographs primarily by the following cues:
(i) who is depicted in the photograph;
(ii) where the photograph was taken; and
(iii) what event the photograph covers.
A key challenge is how to create this useful description metadata about photographs. Manual annotation of photographs is tedious and consumes large amounts of time. Automated content-based techniques such as face recognition rely on large training sets, are dependant on the illumination conditions at the scene of photograph capture and fail to recognise many of the abstract cues that people use when recalling photographs. Complimentary context-based approaches provide a lightweight, robust and scalable solution to support the abstract way in which users actually think about photographs and to compliment content-based approaches such as face recognition.
The Annotation CReatiON for Your Media (ACRONYM) framework takes just such a robust, scalable, context-based approach. Instead of using content-recognition techniques the ACRONYM approach leverages context information available both at the scene of photograph capture and in the user's information space. Low level data such as photographer, time, location and the set of devices detected nearby are captured cheaply and automatically by a camera phone via its User Interface (UI), system clock, GPS receiver and Bluetooth transceiver respectively. Higher level information on people, photographs, events and places are indexed from the user's Online Social Network, albums, calendar and the GeoNames online geographical feature database respectively.
Armed with ground truth data about the photograph on one hand and with access to a dataset of people, photographs, events and places on the other, algorithms suggest which people, places and events are represented in an attempt to bridge the semantic gap. The user can then select which of the suggestions are correct, annotating those selected to the photograph. In keeping with the DataPortability and Linked Data efforts, the annotations are exported as portable Semantic Web metadata both for storage inside the photographs and reuse by other applications such as search engines. This process accelerates the photograph annotation process dramatically which in turn aids a wide range of information retrieval and knowledge management tools that currently trawl the billions of photographs stored on the Web, local networks and private machines.
Link: http://acronym.deri.org