VEIL: Combining Semantic Knowledge with Image Understanding

Image understanding is the process of taking a photographic image and producing an interpretation of its contents. The process of developing such an interpretation involves assigning semantics to particular regions of the image. VEIL is an experiment directed at extending the upper levels of semantics used in image interpretation.

A one-page summary of the work in PDF is also available.

Basic Approach

Most previous work on image understanding involves developing algorithms that operate on the pixel-level of the image and produce geometric objects (such as lines, cubes and ribbons) as output. This abstraction process reduces the amount of raw data by substituting a smaller number of geometric shapes for a large number of pixels. The VEIL project explores the next level of semantic abstraction, namely going from geometric objects to domain-level concepts. This abstraction process is illustrated below:

At the domain level, the objects of interest are those which have significance for the users of imagery data. In other words, one is interested in manipulating headquarters rather than collections of cubes. VEIL developed a domain model to use in describing images. This domain model is encoded using the Loom® knowledge representation language.

VEIL Knowledge Base

The domain model allows discourse at the most interesting level for users of imagery. The individual objects (instances) at the domain level are linked to geometric objects in a site model. This link allows VEIL access to information about geometric properties and relations among objects in the image.

Information processing in the VEIL system takes place at multiple levels of abstraction. Geometric queries (such as retrieving the volume of a particular building) are computed using the site model. Information about the function of buildings is answered using the domain model. Reasoning can also be split across levels, in order to find headquarters that are located near barracks. Information about building function comes from the domain model level, whereas computations about proximity are carried out at the site model level.

Event Detection Example

Using Loom's context mechanism, VEIL is also able to reason across multiple images. This capability allows the system to define and retrieve sequences of images that have interesting properties.

To demonstrate this capability we have implmented an event detector that can find multiple images satisifying criteria such as an armored movement or a field training exercise. For details, see the event detection example.

More Details

More details can be found in the VEIL Final Report (68pp., 2.9MB). This report is in PDF format. PDF can be viewed with the free Acrobat® Reader™ from Adobe.

Loom is a registered trademark of the University of Southern California. Acrobat Reader is either a trademark or registered trademark of Adobe Systems Incorporated in the United States and/or other countries.

Document Last Modified: December 5, 2006 20:15 UTC+0