University of Southern California
Scientists today spend a tremendous amount of time extracting, cleaning, and integrating data from various sources and services in course of conducting their research. Today most researchers either perform these tasks manually using a spreadsheet or they write specialized programs to perform these tasks. We are developing an end-user integration framework, called Karma, to support the rapid integration of diverse sources and services. Karma allows users to rapidly import data from a variety of sources, normalize the data fields by example, model the sources with respect to a shared domain ontology to integrate data from diverse sources, and publish the data in any of a variety of formats. In this talk, I will describe the overall goals of the project, present the techniques for modeling and normalizing data, and give a demonstration of the current Karma system, which is available for others to download and use.
Craig Knoblock is a Research Professor in Computer Science at the University of Southern California (USC) and the Director of Information Integration at the USC Information Sciences Institute. He received his Bachelor of Science degree from Syracuse University, and his Master’s and Ph.D. from Carnegie Mellon University, all in computer science. At USC, Dr. Knoblock leads a team of about 20 researchers, staff and students in developing techniques for rapid, efficient information integration. He focuses on constructing distributed, integrated applications from online sources through information extraction, source modeling, record linkage, machine learning and other technologies and has applied them to geospatial and biological data integration. Dr. Knoblock is a Fellow of the Association for the Advancement of Artificial Intelligence (AAAI), a Distinguished Scientist of the Association of Computing Machinery (ACM), and President and Trustee of the International Joint Conference on Artificial Intelligence (IJCAI).
IMPORTANT NOTICE: This ReadyTalk service includes a feature that allows audio and any documents and other materials exchanged or viewed during the session to be recorded. By joining this session, you automatically consent to such recordings. If you do not consent to the recording, discuss your concerns with the meeting host prior to the start of the recording or do not join the session. Please note that any such recordings may be subject to discovery in the event of litigation.