Proceedings of
IJCAI-03 Workshop on
Information
Integration on the Web
(IIWeb-03)
August 9 - 10, 2003
Acapulco, Mexico
Edited by:
Subbarao
Kambhampati, Arizona State University
Craig A. Knoblock, University of Southern California
IJCAI-03 Workshop on
Information Integration on the Web
Organizing
Committee
Craig Knoblock (Co-Chair), University of Southern California
Subbarao Kambhampati (Co-Chair), Arizona State University
Lise Getoor, University of Maryland
Alon Halevy, University of Washington
Sheila McIlraith, Stanford University
Program Committee
William Cohen, Carnegie Mellon University
Hasan Davulcu, Arizona State University
Anhai Doan, University of Illinois, Urbana-Champaign
Juliana Freire, Oregon Graduate Institute
C. Lee Giles, Pennsylvania State University
Joseph M. Hellerstein, University of California, Berkeley
Nick Kushmerick, University College Dublin
Andrew McCallum, University of Massachusetts Amherst
Giansalvatore Mecca, Universit¨¤ della Basilicata
Renee Miller, University of Toronto
Ami Motro, George Mason University
Jeffrey Naughton, University of Wisconsin
Louiqa Raschid, University of Maryland
Marie-Christine Rousset, University of Paris-Sud
Sheila Tejada, University of New Orleans
Sponsored by the Research Institute for Advanced Computer Science (RIACS)
Foreword
Effective integration of heterogeneous databases and information sources has been cited as the most pressing challenge in spheres as diverse as corporate data management, homeland security, counter-terrorism and the human genome project. An important impediment to scaling up integration frameworks to large-scale applications has been the fact that the autonomous and decentralized nature of the data sources constrains the mediators to operate with very little information about the structure, scope, profile, quality and inter-relations of the information sources they are trying to integrate.
The purpose of this workshop is to bring together researchers that are working
in a variety of areas that are all related to the larger problem of integrating
information on the Web. This includes research in the areas of machine
learning, data mining, automated planning, constraint reasoning, databases,
view integration, information extraction, semantic web, web services, and other
related areas.
We were fortunate to assemble a diverse group of researchers from the AI and DB
communities to help us in organizing this workshop. The workshop call for
papers had a very good response. We received 40 submissions spanning a diverse
set of issues relevant to information integration. Each submission was reviewed
by at least two members of the program committee. Lise Getoor independently
coordinated the reviews of papers co-authored by the co-chairs.
To encourage discussion, the workshop program is structured into topic-oriented
panels and poster sessions. In addition to the contributed papers, the program
also contains two invited panels--one on the perspectives of companies engaged
in information integration technology and the other on the perspectives of
funding agencies.
We would like to thank our organizing and program committees for their many
invaluable inputs and thoughtful reviews. We thank Alma Nava for handling the
review process and both Alma Nava and Kristin Ghent for assembling the
proceedings. We would also like to thank the Research Institute for Advanced
Computer Science (RIACS) for providing financial support for the workshop.
Subbarao Kambhampati
Craig Knoblock
Workshop Co-Chairs
Workshop Schedule
Aug 9, 2003
8:45-9:00am Opening Remarks
9:00-10:30am: Wrapping and Extracting (Chair: Nick Kushmerick)
Integrating Information to
Bootstrap Information Extraction from Web Sites
Trainability: Developing a
responsive learning system
Steven N. Minton, Sorinel I.
Ticrea and Jennifer Beach
On the Power of Semantic
Partitioning of Web Documents
Guizhen Yang, Saikat
Mukherjee, Wenfang Tan, I.V. Ramakrishnan & Hasan Davulcu
10:30-11:00am: Coffee Break
11:00-12:00pm: Name Matching (Chair: Andrew McCallum)
Employing Trainable String Similarity Metrics
for Information Integration
Mikhail Bilenko and Raymond J. Mooney
A Comparison of String Distance Metrics for
Name-Matching Tasks
William W. Cohen, Pradeep Ravikumar, Stephen E. Fienberg
12:00-1:00pm: Lunch Provided
1:00-2:30: Schema Matching (Chair: Subbarao Kambhampati)
Evaluating Matching Algorithms: the
Monotonicity Principle
Ateret Anaby-Tavor, Avigdor Gal and Alberto
Trombetta
Object Matching for Information Integration:
A Profiler-Based Approach
AnHai Doan, Ying Lu, Yoonkyong Lee and Jiawei
Han
Corpus-based Schema Matching
2:30-4:30: Poster Session (overlaps with coffee break)
2:30-3:00pm 2-minute poster advertisements
Wrapping and Extracting
Information Extraction from
Tree Documents by Learning Subtree Delimiters
Boris Chidlovskii
Reconfigurable Web Wrapper
Agents for Web Information Integration
Chun-Nan Hsu, Chia-Hui
Chang, Harianto Siek, Jiann-Jyh Lu, Jen-Jie Chiou
Expressive Power of Tree and
String Based Wrappers
Daisuke Ikeda, Yasuhiro
Yamada and Sachio Hirokawa
Domain Event Extraction and
Representation with Domain Ontology
Shih-Hung Wu, Tzong-Han Tsai and Wen-Lian Hsu
Name Matching
Toward Conditional Models of Identity Uncertainty with Application to Proper Noun Coreference
Meta-Data and Statistics
A Method for Semantically
Enhancing the Service Discovery Capabilities of UDDI
Rama Akkiraju, Richard
Goodwin, Prashant Doshi and Sascha Roeder
Source Update Capture in
Information Agents
Naveen Ashish, Deepak
Kulkarni and Yao Wang
Registry-Based Support for
Information Integration
Deborah L. McGuinness and Paulo Pinheiro da Silva
Query Processing and Execution
Combining Classification and
Transduction for Value Prediction in Speculative Plan Execution
Greg Barish and Craig A.
Knoblock
Visual Programming of Web
Data Aggregation Applications
Robert Baumgartner, Georg
Gottlob and Marcus Herzog
Two-phase Query Modification
using Semantic Relations based on Ontologies
Kaoru Hiramatsu, Jun-ichi
Akahani and Tetsuji Satoh
Integrating Information,
Applications and Services on the Web
Juan C. Lavariega and Lorena
G. Gomez-Martinez
Representation & Management
An Ontology-Based Knowledge
Management Platform
Arantza Aldea, Rene
Banares-Alcantara, Jaime Bocio, Javier
Gramajo, David Isern, Antonis Kokossis, Laureano Jim¨¦nez, Antonio Moreno
and David Riano
Concept Linking for
Information Integration in Open Book and Sentinel
Stuart Watt
Building Data Integration
Systems: A Mass Collaboration Approach
AnHai Doan and Robert McCann
3:30-4:00: Coffee Break
4:30-6:00: Panel on The Economics of Information Integration: The
Practical View of II on the Web (Chair: William Cohen)
William Cohen, Carnegie Mellon
Alon Halevy, University of Washington
Steven Minton, Fetch Technologies
David Pennock, Overture
Aug 10, 2003
9:00-10:45am: Meta-Data and Statistics (Chair: Chen Li)
Statistics Gathering for Learning from Distributed, Heterogeneous and Autonomous Data Sources
Doina Caragea, Jaime Reinoso, Adrian Silvescu, and Vasant Hanovar
Deep Annotation for
Information Integration
Siegfried Handschuh, Steffen
Staab, Raphael Volz and Leo Meyer
Automatically attaching
semantic metadata to Web Services
Andreas Hess and Nicholas
Kushmerick
Frequency-Based Coverage
Statistics Mining for Data Integration
Zaiqing Nie and Subbarao
Kambhampati
10:45-11:15am: Coffee Break
11:15-12:30pm: Bio-informatics (Chair: Naveen Ashish)
Exploring Life Sciences Data
Sources
Zoe Lacroix, Felix Naumann, Louiqa Raschid, & Maria Esther Vidal
Query Answering Using Ontologies in Agent-based Resource Sharing Environment for Biological Web Information Integrating
Jiann-Jyh
Lu & Chun-Nan Hsu
12:30-1:30pm: Lunch Provided
1:30pm-3:30: Query Processing and Execution (Chair: Alon Halevy)
Towards Inconsistency
Management in Data Integration Systems
Ariel Fuxman and Renee J.
Miller
Querying Distributed Data
through Distributed Ontologies: A Simple but Scalable Approach
Francois Goasdoue and
Marie-Christine Rousset
Describing and Utilizing
Constraints to Answer Queries in Data-Integration Systems
Chen Li
Efficient Execution of
Recursive Integration Plans
Snehal Thakkar and Craig A.
Knoblock
3:30-4:00: Coffee Break
4:00-5:00: Panel on Future Funding for Information Integration
(Chair: Craig Knoblock)
Michael Pazzani, National Science Foundation
Barney Pell, National Aeronautics and Space Administration
Nick Kushmerick, on the Science Foundation Ireland
5:00-5:30: Closing Discussion
Table of
Contents
Wrapping
and Extracting
Information
Extraction from Tree Documents by Learning Subtree Delimitersˇˇˇˇˇˇˇˇˇ.3 Boris Chidlovskii Integrating
Information to Bootstrap Information Extraction from Web Sitesˇˇˇˇˇˇˇˇˇ...9 Fabio Ciravegna, Alexiei
Dingli, David Guthrie and Yorick Wilks Reconfigurable
Web Wrapper Agents for Web Information Integrationˇˇˇˇˇˇˇˇˇˇˇ..15 Chun-Nan Hsu, Chia-Hui
Chang, Harianto Siek, Jiann-Jyh Lu, Jen-Jie Chiou Expressive
Power of Tree and String Based Wrappersˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ.ˇ21 Daisuke Ikeda, Yasuhiro
Yamada and Sachio Hirokawa Trainability:
Developing a responsive learning systemˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ27 Steven N. Minton, Sorinel
I. Ticrea and Jennifer Beach Domain
Event Extraction and Representation with Domain Ontologyˇˇˇˇˇˇˇˇˇˇˇˇ33 Shih-Hung Wu, Tzong-Han
Tsai and Wen-Lian Hsu On
the Power of Semantic Partitioning of Web Documentsˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ.39 Guizhen Yang, Saikat
Mukherjee, Wenfang Tan, I.V. Ramakrishnan & Hasan Davulcu Schema
Matching
Evaluating
Matching Algorithms: the Monotonicity Principleˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ.47 Ateret Anaby-Tavor,
Avigdor Gal and Alberto Trombetta Object
Matching for Information Integration: A Profiler-Based Approachˇˇˇˇˇˇˇˇˇˇ..53 AnHai Doan, Ying Lu,
Yoonkyong Lee and Jiawei Han Corpus-based
Schema Matchingˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ...59 Jayant Madhavan, Philip Bernstein, Kuang Chen, Alon
Halevy and Pradeep Shenoy Name
Matching
EmployingTrainable
String Similarity Metrics for Information Integrationˇˇˇˇˇˇˇˇˇˇ.67 Mikhail Bilenko and
Raymond J. Mooney A
Comparison of String Distance Metrics for Name-Matching Tasksˇˇˇˇˇˇˇˇˇˇˇˇ.73 William W. Cohen, Pradeep
Ravikumar, Stephen E. Fienberg |
|
Toward
Conditional Models of Identity Uncertainty with Application to Proper Noun Coreferenceˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ79 Andrew
McCallum and Ben Wellner Meta-Data
& Statistics
A
Method for Semantically Enhancing the Service Discovery Capabilities of UDDIˇˇˇˇˇˇ.87 Rama
Akkiraju, Richard Goodwin, Prashant Doshi and Sascha Roeder Source
Update Capture in Information Agentsˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ.....93 Naveen
Ashish, Deepak Kulkarni and Yao Wang Statistics
Gathering for Learning from Distributed, Heterogeneous and Autonomous Data
Sourcesˇ........................................................................................................................................ˇ..99 Doina
Caragea, Jaime Reinoso, Adrian Silvescu, and Vasant Honavar Deep
Annotation for Information Integrationˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ.105 Siegfried
Handschuh, Steffen Staab, Raphael Volz and Leo Meyer Automatically
attaching semantic metadata to Web Servicesˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ.111 Andreas Hess
and Nicholas Kushmerick Registry-Based
Support for Information Integrationˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ...117 Deborah L.
McGuinness and Paulo Pinheiro da Silva Frequency-Based
Coverage Statistics Mining for Data Integrationˇˇˇˇˇˇˇˇˇˇˇˇˇ123 Zaiqing Nie
and Subbarao Kambhampati Query
Processing and Execution
Combining
Classification and Transduction for Value Prediction in Speculative Plan
Executionˇ.131 Greg Barish
and Craig A. Knoblock Visual
Programming of Web Data Aggregation Applicationsˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ137 Robert
Baumgartner, Georg Gottlob and Marcus Herzog Towards
Inconsistency Management in Data Integration Systemsˇˇˇˇˇˇˇˇˇˇˇˇˇ.143 Ariel Fuxman
and Renee J. Miller Querying
Distributed Data through Distributed Ontologies: A Simple but Scalable
Approachˇˇ.149 Francois
Goasdoue and Marie-Christine Rousset Two-phase
Query Modification using Semantic Relations based on Ontologiesˇˇˇˇˇˇˇ...155 Kaoru
Hiramatsu, Jun-ichi Akahani and Tetsuji Satoh Integrating
Information, Applications and Services on the Webˇˇˇˇˇˇˇˇˇˇˇˇˇˇ159 Juan C. Lavariega and Lorena G. Gomez-Martinez Describing
and Utilizing Constraints to Answer Queries in Data-Integration Systemsˇˇˇˇˇ..163 Chen Li Efficient
Execution of Recursive Integration Plansˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ169 Snehal Thakkar and Craig A. Knoblock Representation
and Management
An
Ontology-Based Knowledge Management Platformˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ.177 Arantza Aldea, Rene Banares-Alcantara, Jaime Bocio,
Javier Gramajo, David Isern,
Antonis Kokossis,
Laureano Jim¨¦nez, Antonio Moreno and David Riano Building
Data Integration Systems: A Mass Collaboration Approachˇˇˇˇˇˇˇˇˇˇˇˇ183 AnHai Doan
and Robert McCann Concept
Linking for Information Integration in Open Book and Sentinelˇˇˇˇˇˇˇˇˇˇ..189 Stuart Watt Bio-Informatics
Integration
Query
Answering Using Ontologies in Agent-based Resource Sharing Environment for
Biological Web Information Integratingˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ...197 Jiann-Jyh Lu
& Chun-Nan Hsu Exploring
Life Sciences Data Sourcesˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ203 Zoe Lacroix,
Felix Naumann, Louiqa Raschid, & Maria Esther Vidal Abstracts
Using
Categorical Clustering in Schema Discoveryˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ...211 Periklis
Andritsos and Renee J. Miller Constraint-driven
hierarchical information extractionˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ213 Thomas Lee |
|
Author
Indexˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇˇ...215
|
|
|
|
|
|
|
|
|
|
|
|
|
|