VEIL: Combining Semantic Knowledge with Image Understanding
Image understanding is the process of taking a photographic image and
producing an interpretation of its contents. The process of
developing such an interpretation involves assigning semantics to
particular regions of the image. VEIL is an experiment directed at
extending the upper levels of semantics used in image interpretation.
A one-page summary of the work in PDF
is also available.
Basic Approach
Most previous work on image understanding involves developing
algorithms that operate on the pixel-level of the image and produce
geometric objects (such as lines, cubes and ribbons) as output. This
abstraction process reduces the amount of raw data by substituting a
smaller number of geometric shapes for a large number of pixels. The
VEIL project explores the next level of semantic abstraction, namely
going from geometric objects to domain-level concepts. This
abstraction process is illustrated below:
At the domain level, the objects of interest are those which have
significance for the users of imagery data. In other words, one is
interested in manipulating headquarters rather than collections
of cubes. VEIL developed a domain model to use in describing images.
This domain model is encoded using the Loom® knowledge representation language.
VEIL Knowledge Base
The domain model allows discourse at the most
interesting level for users of imagery. The individual objects
(instances) at the domain level are linked to geometric objects
in a site model. This link allows VEIL access to information
about geometric properties and relations among objects in the
image.
Information processing in the VEIL system takes place at multiple
levels of abstraction. Geometric queries (such as retrieving the
volume of a particular building) are computed using the site model.
Information about the function of buildings is answered using the
domain model. Reasoning can also be split across levels, in order to
find headquarters that are located near barracks. Information about
building function comes from the domain model level, whereas
computations about proximity are carried out at the site model level.
Event Detection Example
Using Loom's context mechanism, VEIL is also able to reason across
multiple images. This capability allows the system to define and
retrieve sequences of images that have interesting
properties.
To demonstrate this capability we have implmented an event detector
that can find multiple images satisifying criteria such as an armored
movement or a field training exercise. For details, see the
event detection example.
More Details
More details can be found in the VEIL Final Report
(68pp., 2.9MB). This report is in PDF format. PDF can be
viewed with the free Acrobat®
Reader from Adobe.
Loom is a registered trademark of the University of Southern
California. Acrobat Reader is either a trademark or registered
trademark of Adobe Systems Incorporated in the United States and/or
other countries.