Special Session: A System View of Information Retrieval

Third International Symposium on Image and Signal Processing and Analysis

September 18-20, 2003, Rome, Italy


Aims and Scope

While significant progress has been made in several disciplines that relate to efficient information retrieval, questions remain about overall system design and the multidimensional performance metrics that are pertinent to important application classes. The goal of this special session is to focus on a system view of the topic, albeit from the particular vantage points of the cross-disciplinary team that has agreed to contribute.

Session Organizer and Chair

Dr. Nikil Jayant, Georgia Institute of Technology, USA

Invited Speakers

Abstracts

Bill Grosky, Emergent Document Semantics

It is well known that interpretation depends on context, whether for a work of art, a piece of literature, or a natural language utterance. This paper addresses the dynamic context of a collection of linked multimedia documents, of which the web is a perfect example. Contextual document semantics emerge through identification of various users' browsing paths though this multimedia collection. In this paper, we present techniques that use multimedia information as part of this determination. Some implications of our approach are that the author of a webpage cannot completely define that document's semantics and that semantics can emerge through use.

Ramesh Jain, Emerging Requirements for Information Access

Information systems still formulate the retrieval and access problems assuming the requirements of applications that were current more than two decades ago. The nature and amount of information has changed radically over this period. We need to look at information access from the perspective of the evolving applications in this century. These applications use spatio-temporal, live, multimedia data and information. Semantics for this type of data requires different models and system architecture than the traditional architecture. In this paper, first we will discuss a few emerging class of applications, including situation monitoring and telepresence, and then identify their information modeling and access requirements. Then we will present an approach to deal with the emerging information access problems. We will present some examples of our approach in the context of some emerging applications.

B. S. Manjunath, Issues Concerning Dimensionality and Similarity Search

It is generally acknowledged that the high dimensionality of multimedia descriptors pose challenges to efficient search and indexing. By indexing, we mean efficient access to the database items without a complete scan of the entire database. High dimensional indexing has been an active research area within the database community for over 5 years now, and new indexing structures have been developed to address this curse of dimensionality.

An equally important issue is the effectiveness of the multimedia descriptors, and this has been addressed by the content based retrieval community extensively. These include new similarity metrics, learning similarity metrics, and on-line learning systems that use relevance feedback mechanisms to modify the query and/or the similarity computations.

A practical system should address both efficiency and effectiveness concerns. However, not much has been done to-date that would enable, for example, relevance feedback computations using high-dimensional index structures. In this context, recently we have developed new methods that enable fast nearest neighbor computations that make use of the information available during each iteration of a relevance feedback step. For example, one can consider the problem of computing the nearest neighbor of a query feature vector given the current set of nearest neighbors and the modified weight matrix (assuming a quadratic distance measure). This adaptive nearest neighbor computations will enable the retrieval systems to scale to a large number (few hundred thousands and higher) without compromising on the effectiveness side, and would enable applications such as data mining and knowledge discovery.

Other related items that we are investigating include:

Howard D. Wactlar, A System of Video Information Capture, Indexing and Retrieval for Interpreting Human Activity

This system creates a manageable information resource that enables more complete and accurate interpretation, assessment and diagnosis of human behavior in constrained physical spaces. Through activity and environmental monitoring, a continuous, voluminous audio and video record is captured. Through work in information extraction, behavior analysis and synthesis, this record is transformed into an information asset whose efficient, secure presentation empowers specialists with greater insights into problems, effectiveness of treatments, and determination of environmental and social influences. Application environments range from nursery schools to nursing homes. The foundation for this work, the Informedia Digital Video Library, has demonstrated the successful application of speech, image, and natural language processing in automatically creating a rich, indexed, searchable multimedia information resource for broadcast-quality video. This new work builds from these technologies, moving into new information spaces composed of unedited personal experience video augmented with additional sensory and position data.

Hong-Jiang Zhang, Semantic Learning in Content-Based Image Retrieval

Visual Information Retrieval remains a difficult problem despite of decades of research. A major bottleneck is automated indexing which requires understanding of visual semantics in images and videos. Content-based image retrieval (CBIR) approach is an attempt to remove this bottleneck, but the myth about the power of visual feature-based indexing was quickly diminished as such features are far from representing semantic visual contents and produce meaningful indexes. One solution is to apply relevance feedback to refine the query or similarity measures in the search process and apply machine learning techniques to learn semantic annotations. In this paper, we address the key issues involved in relevance feedback of CBIR systems and present a brief overview of a set of commonly used relevance feedback algorithms from system view point. We present a framework of relevance feedback and semantic learning in CBIR. In this framework, low-level features and keyword annotations are integrated in image retrieval and in feedback processes to improve the retrieval performance. We have also extended framework to a content-based web image search engine in which hosting web pages are used to collect relevant annotations for images and users' feedback logs are used to refine annotations. A prototype system has developed to evaluate our proposed schemes, and our experimental result indicated that our approach outperforms traditional CBIR system and relevance feedback approaches.

Tat-Seng Chua, Question-Answering of News Video on the Web

The retrieval of news is one of the most frequent search tasks when users surf the Web. Typically users are interested in browsing the latest news, or in tracking certain news stories. The news could come in the form of text and video, with supplementary images and audio tracks. Until recently, we need to access specific news web sites to retrieve the latest news. Research has been done to segment news video into story units to facilitate organization and browsing of news video. Recent advances in question-answering focuses on adaptive information retrieval to retrieve precise pieces of information based on users' short queries. This paper aims to integrate both approaches to support question answering of news video.

This paper consists of three parts. The first part discusses techniques on segmenting news video into meaningful (semantic) story units. The technique uses multi-modal features to characterize the content of news video shots, and employ a decision tree to classify video shots into pre-defined categories. It then uses HMM to locate story boundaries.

The second part describes recent research on question-answering (QA). In QA a more precise answers, rather than documents, are required in response to questions. One of the major problems in QA is that the queries are either too brief or often do not contain most relevant terms in the target corpus. In order to overcome this semantic gap, modern QA systems combine the strengths of traditional IR, natural language processing (NLP) and information extraction (IE) to provide an appropriate way to retrieve concise answers to natural language questions. In particular, some recent approaches integrate external knowledge extracted from the Web and WordNet to supplement the queries in order to return precise answers.

The third part of this paper combines both lines of research to support personalized news, including video, retrieval. Given a short and imprecise query from the users, like the "latest news on weapon inspection", the system searches the web for latest related news documents and use the evidence extracted from these documents to expand the query and to infer different facets of the stories. The additional evidence found is then used to extract precise video and related text news. The system then presents the retrieved news story, including both video and relevant text, to the users.

This paper discusses the architecture of the system, and the implementation issues in deploying QA techniques in conjunction with news video research to support web searches of news stories.

ISPA home page