Exploiting Data Semantics to Discover, Extract, and Model Web Sources
José-Luis Ambite
Craig A. Knoblock
Kristina Lerman
Anon Plangprasopchok
Thomas Russ
Cenk Gazen
Steven Minton
Mark Carman
Abstract
We describe Deimos, a system that automatically discovers and models new
sources of information.The system exploits four core technologies
developed by our group that makes an end-to-end solution to this problem
possible. First, given an example source, Deimos finds other similar
sources online. Second, it invokes and extracts data from these
sources. Third, given the syntactic structure of a source, Deimos maps
its inputs and outputs to semantic types. Finally, it infers the
source's semantic definition, i.e., the function that maps the inputs to
the outputs. Deimos is able to successfully automate these steps by
exploiting a combination of background knowledge and data semantics. We
describe the challenges in integrating separate components into a
unified approach to discovering, extracting and modeling new online
sources. We provide an end-to-end validation of the system in two
information domains to show that it can successfully discover and model
new data sources in those domains.
In IEEE International Conference on Data Mining Workshops, pp. 771-779, 2008.
The full paper is available in PDF (8pp).
Back to Paper List