Privacy in the Era of Big Data, Machine Learning, IoT, and 5G
Samuel Conte Professor of Computer Science
CS Department, Purdue University
West Lafayette, IN
Technological advances, such as IoT devices, cyber-physical systems, smart mobile devices, data analytics, social networks, and increased communication capabilities are making possible to capture and to quickly process and analyze huge amounts of data from which to extract information critical for many critical tasks, such as healthcare and cyber security. In the area of cyber security, such tasks include user authentication, access control, anomaly detection, user monitoring, and protection from insider threat. By analyzing and integrating data collected on the Internet and the Web one can identify connections and relationships among individuals that may in turn help with homeland protection. By collecting and mining data concerning user travels, contacts and disease outbreaks one can predict disease spreading across geographical areas. And those are just a few examples. The use of data for those tasks raises however major privacy concerns. Collected data, even if anonymized by removing identifiers such as names or social security numbers, when linked with other data may lead to re-identify the individuals to which specific data items are related to. Also, as organizations, such as governmental agencies, often need to collaborate on security tasks, data sets are exchanged across different organizations, resulting in these data sets being available to many different parties. Privacy breaches may occur at different layers and components in our interconnected systems. In this talk, I first present an interesting privacy attack that exploits paging occasion in 5G cellular networks and possible defenses. Such attack shows that achieving privacy is challenging and there is no unique technique that one can use; rather one must combine different techniques depending also on the intended use of data. Examples of these techniques and their applications are presented. Finally, I discuss the notion of data transparency – critical when dealing with user sensitive data, and elaborate on the different dimensions of data transparency.
Elisa Bertino is the Samuel D. Conte Professor of Computer Science at Purdue University. She serves as Director of the Purdue Cyberspace Security Lab (Cyber2Slab). In her role as Director of Cyber2SLab she leads multi-disciplinary research in data security and privacy. Prior to joining Purdue, she was a professor and department head at the Department of Computer Science and Communication of the University of Milan. She has been a visiting researcher at the IBM Research Laboratory (now Almaden) in San Jose and visiting professor at the Singapore Management University and the National University of Singapore. Her recent research focuses on cybersecurity and privacy of cellular networks and IoT systems, and edge analytics and machine learning for cybersecurity. Elisa Bertino is a Fellow member of IEEE, ACM, and AAAS. She received the 2002 IEEE Computer Society Technical Achievement Award for “For outstanding contributions to database systems and database security and advanced data management systems”, the 2005 IEEE Computer Society Tsutomu Kanai Award for “Pioneering and innovative research contributions to secure distributed systems”, the 2014 ACM SIGSAC Outstanding Contributions Award with citation “For her seminal research contributions and outstanding leadership to data security and privacy for the past 25 years”, and the 2019-2020 ACM Athena Lecturer Award. She is currently serving as ACM Secretary-Treasurer.
Don’t handicap AI without Explicit Knowledge
Prof. Amit Sheth
founding director of the AI Institute (#AIISC) at the University of South Carolina
Knowledge representation as expert system rules or using frames and variety of logics, played a key role in capturing explicit knowledge during the hay days of AI in the past century. Such knowledge, aligned with planning and reasoning are part of what we refer to as Symbolic AI. The resurgent AI of this century in the form of Statistical AI has benefitted from massive data and computing. On some tasks, deep learning methods have even exceeded human performance levels. This gave the false sense that data alone is enough, and explicit knowledge is not needed. But as we start chasing machine intelligence that is comparable with human intelligence, there is an increasing realization that we cannot do without explicit knowledge. Neuroscience (role of long-term memory, strong interactions between different specialized regions of data on tasks such as multimodal sensing), cognitive science (bottom brain versus top brain, perception versus cognition), brain-inspired computing, behavioral economics (system 1 versus system 2), and other disciplines point to need for furthering AI to neuro-symbolic AI (i.e., hybrid of Statistical AI and Symbolic AI, also referred to as the third wave of AI). As we make this progress, the role of explicit knowledge becomes more evident. I will specifically look at our endeavor to support human-like intelligence, our desire for AI systems to interact with humans naturally, and our need to explain the path and reasons for AI systems’ workings. Nevertheless, the variety of knowledge needed to support understanding and intelligence is varied and complex. Using the example of progressing from NLP to NLU, I will demonstrate the dimensions of explicit knowledge, which may include, linguistic, language syntax, common sense, general (world model), specialized (e.g., geographic), and domain-specific (e.g., mental health) knowledge. I will also argue that despite this complexity, such knowledge can be scalability created and maintained (even dynamically or continually). Finally, I will describe our work on knowledge-infused learning as an example strategy for fusing statistical and symbolic AI in a variety of ways.
Prof. Amit Sheth (Home Page, LinkedIn) is an Educator, Researcher, and Entrepreneur. He is the founding director of the AI Institute (#AIISC) at the University of South Carolina. Current areas of his research include knowledge-infused learning and explainable AI, and applications to personalized and public health, social good and preventing social harm, future manufacturing, and disaster management. He is a fellow of the IEEE, AAAI, AAAS, and ACM. His awards include IEEE TCSVC Research Innovation Award, University Trustee Award, 10-year award (Intl Semantic Web Conf), OSU Franklin College Alumni Excellence award, and Ohio Faculty Commercialization Award (runner up). For several years through 2018, he was listed among the top 100 most cited computer scientists. Three of the four companies he has (co)founded involved licensing his university research outcomes, including the first Semantic Web company in 1999 that pioneered technology similar to what is found today in Google Semantic Search and Knowledge Graph, and the fourth company (http://cognovilabs.com) at the intersection of emotion and AI.
Extreme-Scale Model-Based Time Series Management with ModelarDB
Torben Bach Pedersen
Professor of Computer Science at Aalborg University, Denmark
To monitor critical industrial devices such as wind turbines, high quality sensors sampled at a high frequency are increasingly used. Current technology does not handle these extreme-scale time series well, so only simple aggregates are traditionally stored, removing outliers and fluctuations that could indicate problems. As a remedy, we present a model-based approach for managing extreme-scale time series that approximates the time series values using mathematical functions (models) and stores only model coefficients rather than data values. Compression is done both for individual time series and for correlated groups of time series. The keynote will present concepts, techniques, and algorithms from model-based time series management and our implementation of these in the open source Time Series Management System (TSMS) ModelarDB. Furthermore, it will present our experimental evaluation of ModelarDB on extreme-scale real-world time series, which shows that that compared to widely used Big Data formats, ModelarDB provides up to 14x faster ingestion due to high compression, 113x better compression due to its adaptability, 573x faster aggregatation by using models, and close to linear scale-out scalability.
Torben Bach Pedersen is a Professor of Computer Science at Aalborg University, Denmark. His research interests include Extreme-Scale Data Analytics, Data warehouses and Data Lakes, Predictive and Prescriptive Analytics, with a focus on technologies for "Big Multidimensional Data" - the integration and analysis of large amounts of complex and highly dynamic multidimensional data. His major application domain is Digital Energy, where he focuses on energy flexibility and analytics on extreme-scale energy time series. He is an ACM Distinguished Scientist, and a member of the Danish Academy of Technical Sciences, the SSTD Endowment, and the SSDBM Steering Committee. He has served as Area Editor for IEEE Transactions on Big Data, Information Systems and Springer EDBS, PC (Co-)Chair for DaWaK, DOLAP, SSDBM, and DASFAA, and regularly serves on the PCs of the major database conferences like SIGMOD, PVLDB, ICDE and EDBT. He received Best Paper/Demo awards from ACM e-Energy and WWW. He is co-founder of the spin-out companies FlexShape and ModelarData.
TrustBUS 2021 Keynote:
Towards privacy-preserving and trustworthy AI
Associate professor in the Digital Security department at EURECOM
The rise of cloud computing technology led to a paradigm shift in technological services that enabled enterprises to delegate their data analytics tasks to third party (cloud) servers. Machine Learning as a Service (MLaaS) is one such service which provides stakeholders the ease to perform machine learning tasks on a cloud platform. This advantage of outsourcing these computationally-intensive operations, unfortunately comes with a high cost in terms of privacy exposures. The goal is therefore to come up with customized ML algorithms that would by design preserve the privacy of the processed data. Advanced cryptographic techniques such as fully homomorphic encryption or secure multi-party computation enable the execution of some operations over encrypted data and therefore can be considered as potential candidates for these algorithms. Yet, these incur high computational and/or communication costs for some operations. In this talk, we will analyze the tension between ML techniques and relevant cryptographic tools. We will further overview existing solutions addressing both privacy and trust requirements.
Melek Önen is an associate professor in the Digital Security department at EURECOM (Sophia-Antipolis, France). Her research interests are applied cryptography, information security and privacy. She has worked in the design and the development of cryptographic protocols for various technologies including the cloud and the IoT. She holds a PhD in Computer Science from Ecole Nationale Supérieure des Télécommunications de Paris (ENST, 2005) and obtained her "Habilitation à Diriger les Recherches" in 2017. She was/is involved in many European and national French research projects. She is the coordinator of the ongoing EU H2020 PAPAYA project for which her research team developed several privacy-preserving analytics primitives.