- Instructor: Dr. Yasser El-Manzalawy
- TA: Xianfeng Tang
Course Goals and Objectives
Data science approaches to data analysis often require integration of data from multiple heterogeneous data sources. The ultimate goal of a data integration system is to offer uniform access to a set of autonomous and heterogeneous data sources. The goal of this course is to provide background information for students on the acquisition, processing, integration and fusion of data from heterogeneous data sources to support decision-making, situation awareness, predictive modeling, and scientific discoveries. Upon completion of this course the student should be able to:
- Explain key concepts, theories and algorithms for data integration and fusion
- Apply data integration techniques to enable building systems geared for flexible sharing and integration of data across multiple autonomous data providers
- Design and implement different data integration and/or fusion frameworks based on knowledge-based, probabilistic-based, and graph-based approaches
- Design and implement data and model based fusion models for a variety of real-world applications spanning various domains including Internet of Things (IoT), social networks, recommendation systems, and bioinformatics.
- Anhai Doan, Alon Y. Halevy, Zachary Ives. Principles of Data Integration. Morgan Kaufmann Publishers, 2012.
- Slides and other materials will be posted in Canvas
- Assignments (20%)
- Mid-term Exam (20%)
- Final Exam (20%)
- Project (40%)
- Please check the course Syllabus.
- Introduction to Data Integration and Fusion
- Manipulating Query Expressions
- Describing Data Sources
- String Matching
- Schema Matching and Mapping
- General Schema Manipulation Operators
- Data Matching