Discovering and Learning Semantic Models of Online Sources for Information Integration

Abstract

Much work in Information Integration and the Semantic Web assumes that rich semantic models of sources exist. In practice, there is a tremendous amount of data on the Web, but it is typically hard to find, has little or no explicit structure, and there is rarely any semantic description of the data. We describe an integrated end-to-end system that can automatically discover web sources, invoke and extract the data from them, and build their semantic models. We describe the challenges in integrating the component technologies into a unified approach to discovering, extracting and modeling new online sources. We evaluate the integrated system in three different domains and demonstrate that it can automatically discover and model new data sources.

In Proceedings of the IJCAI Workshop on Information Integration on the Web, 2009.

The full paper is available in PDF (6pp).

Back to Paper List