Invited Talks

 

Data integration in the AI era: research trends and still open issues

Robert Wrembel (Poznan University of Technology, Poland)

Abstract

Data integration (DI) has been an area for intensive research for decades. These efforts resulted in a few reference DI architectures. They can be categorized as supporting: (1) virtual integration (federated and mediated), (2) physical integration (data warehouse), and (3) hybrid (data lake, data lakehouse, data mesh). Regardless of their specific type, all these architectures rely on a sophisticated integration layer. The layer is implemented by a sophisticated software, for designing, orchestrating, and running the so-called DI processes.
Nowadays, in all business domains, large volumes of highly heterogeneous data are produced, e.g., medical systems, smart cities, smart agriculture, which require further advancements in the data integration technologies. The widespread adoption of artificial intelligence (AI) solutions is now extending towards DI, opening new research paths and generating open problems.
In this talk, I will share my perspective on the application and potential of AI solutions in DI. I will also highlight unresolved issues within the field of DI. Specifically, I will explore: the optimization of DI processes, the role of user-defined functions (UDFs) in DI, data quality with a focus on deduplication, a novel DI architecture based on the connectors-as-a-service paradigm, and data provenance.
The talk will be structured into three main parts: (1) an overview of data integration architectures, (2) selected AI techniques for DI, and (3) still open problems in DI. The findings presented in the talk are based on my experience in running research and development DI projects for various business entities.

Robert Wrembel (PhD, Dr. Habil.) is an associate professor in the Faculty of Computing and Telecommunications, at Poznan University of Technology (Poland). In 2008 he received a post-doctoral degree in computer science (habilitation), specializing in database systems and data warehouses. He has been a deputy dean of the Faculty of Computing and Management (2008-2012) and the Faculty of Computing (2012-2016). Since Jan 2023 he is the chair of the Data Processing Technologies group at Poznan University of Technology. He was a consultant at software house (2002-2003) and a lecturer at Oracle Poland (1998-2005). Currently he is an IT consultant in a private hospital. Within the last 10 years he has realized four R&D projects: for a big financial institution in Poland, one for a company in the energy sector, and two for a corporation in the field of electronics. He cooperates with IBM Software Lab Kraków in Poland. He has led at his University the Erasmus Mundus Joint Doctorate Program - Information Technologies for Business Intelligence - Doctoral College (2013-2020). Robert visited numerous research and education centers, including: INRAE Clermont-Ferrand (France), Free University of Bozen-Bolzano (Italy), Università degli Studi di Milano (Italy), Universitat Politècnica de Catalunya - BarcelonaTech (Spain), Université Lyon 2 (France), Universidad de Costa Rica (Costa Rica), Klagenfurt University (Austria), Loyola University (USA), INRIA Paris-Rocquencourt (France), and Université Paris Dauphine (France). In 2012 he graduated from a 2-months innovation and entrepreneurial program at Stanford University. In 2013 he has done an internship in a BI company Targit (USA). His research interests encompass: data integration, data quality, databases, data warehouses, and data lakes.


Blending Contextual Data with Heterogeneous Time Dimensions for Improved Time Series Analysis

Anton Dignös (Free University of Bozen-Bolzano, Italy)

Abstract

In modern industrial settings, sensors continuously generate vast amounts of time series data critical for automation and process optimization. However, analyzing this data in isolation limits its effectiveness, as it often lacks integration with contextual factors that influence outcomes but are not directly observable. While traditional data fusion techniques aim at combining multimodal data such as images or videos, contextual factors in industrial environments frequently differ not in modality but in temporal structure. We identify four distinct time dimensions - constant, time series, events, and intervals - that commonly characterize contextual data in these settings. By transforming diverse time structures into a unified format, we enable the application of conventional machine learning techniques, enhancing the depth and accuracy of industrial data analysis. This talk presents a case study and initial work on a foundational approach for systematically integrating such temporally heterogeneous contextual factors into time series analysis.

Anton Dignös is an assistant professor in the Faculty of Engineering at the Free University of Bozen-Bolzano (Italy). He received his PhD in Computer Science in 2014 from the Department of Computer Science at the University of Zurich (Switzerland). His research focuses on database technologies for advanced query processing and data analytics, with particular emphasis on temporal data, time series, and data summarization. He is the program co-chair of DAWAK 2025 and previously served as program co-chair of the EDBT PhD Workshop in 2023 and 2024.


A Hybrid Data Model to Support Transportation Analytics of Emergency Service Vehicles

Carson K. Leung (University of Manitoba, Canada)

Abstract

Using a single type of database solution to support real-world applications is becoming more and more challenging because of the volume and variety of data. For instance, the data collected for the transportation industry comprise both structured and unstructured data. Using solely a single type of database solution—relational database system-only or graph database-only—to store and manage data can be challenging. As real-world applications ask even more complex questions related to data, the database solution should be able to facilitate answering these questions in a reasonable time. Hence, in this talk, I present a hybrid model, which integrates data to support transportation analytics. The model consists of relational databases and non-relational databases (namely, graph databases), pooling their strengths to support the demands of the modern application. I also demonstrate this hybrid data model as a practical solution with a case study on improving emergency services—such as emergency medical services (EMS)—response times by having the support of the presented platform.

Carson K. Leung is a Professor in the Department of Computer Science at the University of Manitoba, Canada. He received his B.Sc.(Hons.), M.Sc., and Ph.D. degrees from the University of British Columbia (UBC), Canada. He has contributed more than 400 refereed publications on the topics of analytics, artificial intelligence (AI), big data analytics, bioinformatics, data mining, data science, database, expert system, health informatics, knowledge discovery, machine learning, social network analysis, and visual analytics. These include publications in refereed international journals and conferences such as ACM Transactions on Database Systems (TODS), ACM Transactions on Knowledge Discovery from Data (TKDD), IEEE ICDE, IEEE ICDM, and PAKDD. Moreover, he has served as the Editor-in-Chief for Advances in Data Science and Adaptive Analysis (ADSAA) and for Analytics, as well as an Associate Editor for international journals like Social Network Analysis and Mining (SNAM). He has served on the Organizing Committees of the ACM CIKM, ACM KDD, ACM SIGMOD, DaWaK, IEEE DSAA, IEEE ICDM, and other conferences; he has also served as a PC member of numerous conferences including ACM KDD, ECML/PKDD and PAKDD.