aiDM 2021

aiDM 2021
Fourth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM)

Friday, June 25, 2021

Co-located with ACM SIGMOD/PODS 2021 (Online/In-person Conference)

Links

Overview

Topics of Interest

Important Dates

Submission Instructions

Recently, the field of Artificial Intelligence (AI) has been experiencing a resurgence. AI broadly covers a wide swath of techniques, which include logic-based approaches, probabilistic graphical models, machine learning approaches such as deep learning. Advances in specialized hardware capabilities (e.g., Graphics Processing Units (GPUs), Tensor Processing Units (TPUs), Field-Programmable Gate Arrays (FPGAs), etc.), software ecosystem (e.g., programming languages such as Python, Data Science frameworks, and accelerated ML libraries), and systems infrastructure (e.g., cloud servers with AI accelerators) have led to wide-spread adoption of AI techniques in a variety of domains. Examples of such domains include image classification, autonomous driving, automatic speech recognition, and conversational systems (e.g., chatbots). AI solutions not only support multiple data types (e.g., images, speech, or text), but also are available in various configurations and settings, from personal devices to large-scale distributed systems.
In spite of the wide-ranging techniques and applications of AI, their interactions with data management systems remain in infancy. Database management systems have been, for a long time, simply used as repositories for feeding inputs and storing results. Only very recently, we have started seeing some new efforts in using AI techniques in data management systems, e.g., enabling natural language interfaces to relational databases and applying machine learning techniques for query optimization. However, a lot more needs to be done to fully exploit the power of AI for data management systems and workloads.
aiDM is a one-day workshop that will bring together people from academia and industry to discuss various ways of integrating AI techniques with data management systems. The primary goal of the workshop is to explore opportunities for using AI techniques in enhancing various components of data management systems, such as user interfaces, tooling, performance optimization, support for new query types and workloads. Special emphasis will be given to transparent exploitation of AI techniques using existing data management infrastructures for enterprise-class workloads. We hope this workshop will identify important areas of research and spur new efforts in this emerging field.

Topics of Interest

The goal of the workshop is to take a holistic view of various AI technologies and investigate how they can be applied to different component of an end-to-end data management pipeline. Special emphasis would be given to how AI techniques could be used for enhancing user experience by reducing complexity in tools, or providing newer insights, or providing better user interfaces. Topics of interest include, but are not restricted to:

Characterizing different AI approaches: Logic-based, probabilistic graphical models, and machine learning/deep learning approaches

Evaluation of different learning approaches: unsupervised, self-supervised, supervised or reinforced learning, transfer learning, zero-shot learning, adversarial networks, and deep probabilistic models

New AI-enabled business intelligence (BI) queries for relational databases

Natural language enablement (e.g., queries, result summarization, chatbot interfaces, etc.)

Explainability and interpretability

Fairness of AI-based system components

Integration with Data Science and Deep Learning toolkits (e.g., sklearn, TensorFlow, PyTorch, ONNX, etc.)

Evaluating quality of approximate results from AI-enabled queries

Supporting multiple datatypes (e.g., images, time-series data, etc.)

Supporting semi-structured, streaming, and graph databases

Reasoning over knowledge bases

Data exploration and visualization

Integrating structured and unstructured data sources

AI-enabled data integration strategies (e.g., entity resolution, schema matching, etc.)

Reinforcement learning for Database tuning

Impact of AI on tooling, e.g., ETL or data cleaning

Performance implications of AI-enabled queries

Case studies of AI-accelerated workloads

Social Implications of AI-enabled databases (e.g., detection and elimination of bias)

Learned data structures, database algorithms or systems components

Examples of AI-enabled customer usecases

Keynote Speakers

Prof. Sunita Sarawagi, IIT Bombay
Sunita Sarawagi researches in the fields of databases and machine learning. She is institute chair professor at IIT Bombay and head of its AI Center. She did her PhD in databases from the University of California at Berkeley under the guidance of Michael Stonebraker. She has also worked at Google Research (2014-2016), CMU (2004), and IBM Almaden Research Center (1996-1999). She was awarded the Infosys Prize in 2019 for Engineering and Computer Science, and the distinguished Alumnus award from IIT Kharagpur. She has several publications at ACM SIGMOD, VLDB, ICDM, NeurIPS, ICML, and EMNLP conferences.
Modern AI for Age-old problems of Data Analytics and Integration
Modern deep learning methods are pushing the frontiers of many challenging problems in data analytics and integration. We will discuss state of the art models that are providing record-breaking accuracy on age-old tasks such as time series forecasting, missing value imputation, and entity resolution. We are also witnessing brand new capabilities that were not possible a few years back. On multi-dimensional analytical datasets, we can now obtain joint distributions over thousands of interacting time series. We can generate long-term forecasts of realistic looking temporal event sequences. We can perform entity resolution across heterogeneous, multilingual datasets via actively learned nearest neighbor indices, thereby eliminating the need for hand-designing blocking predicates. In this talk we will go over advances in ML research that are enabling these applications, and present directions for future research.

Johannes Gehrke, Microsoft Research
Johannes Gehrke is a Technical Fellow at Microsoft and the Managing Director of Microsoft Research at Redmond and the CTO and head of machine learning for Microsoft Teams. He is an ACM Fellow and an IEEE Fellow. From 1999 to 2015, Johannes was on the faculty in the Department of Computer Science at Cornell University where he graduated 25 PhD students, and from 2005 to 2008, he was Chief Scientist at FAST Search and Transfer.

Database Systems 2.0
Software 2.0 – the augmentation and replacement of traditional code with models, especially deep neural networks – is changing how we develop, deploy, and maintain software. In this talk, I will describe the challenges and opportunities that this change brings with it, focusing on its impact on database research.

Startup Spotlight

Vibhore Kumar and Ufuk Hürriyetoglu, Unscrambl
Have you started speaking to your data, yet?
In this day and age, where users ask Alexa about the nearest COVID-19 vaccination centers, Google notifies them of the possible traffic jam ahead of their evening commute and Netflix knows the show that they are going to watch later in the evening, getting access to data and insights at work continues to be an experience akin to pulling teeth.
The consumerization of Analytics and BI is now a foregone conclusion, business users expect it. Users, if they had their wish, expect a seamless, AI-powered agent to which they could simply ask a question, using text or voice, and relevant data and insights should, well, just appear. This, without the need to rely on a data analyst and some times, serendipitously, even without the need to ask. Importantly, such an agent should be easy to set up, be agile, be able to connect to multiple sources of data, provide built-in analytics and be accessible in collaboration platforms like Microsoft Teams, Slack or Zoom - which have become the de-facto workplaces.
This talk will detail our journey in building and deploying such an AI-powered agent at multiple enterprises and driving data adoption. Importantly, we will talk about the NLQ technology, which given a database and its schema, automatically enables natural language to SQL translation.

Workshop Schedule (9 am - 5 pm EST)

Session 1 (9-10.30 am EST): (Chair: Oded Shmueli)

(Keynote 1): Modern AI for Age-old problems of Data Analytics and Integration, Sunita Sarawagi, IIT Bombay

RUSLI: Real-time Updatable Spline Learned Index, Mayank Mishra and Rekha Singhal, Tata Consultancy Services

Coffee Break (10.30-11 am EST)

Session 2 (11 am - 12.30 pm EST): (Chair: Yael Amsterdamer)

A Tailored Regression for Learned Indexes: Logarithmic Error Regression, Martin Eppert, Philipp Fent and Thomas Neumann, Technische Universität München

Balancing Familiarity and Curiosity in Data Exploration with Deep Reinforcement Learning, Aurélien Personnaz, Sihem Amer-Yahia, CNRS, Univ. Grenoble Alpes; Laure Berti-Equille, IRD, ESPACE_DEV, Montpellier; Maximilian Fabricius and Srividya Subramanian, Max Planck Institute for Extraterrestrial Physics

Pre-Trained Web Table Embeddings for Table Discovery, Michael Günther, Maik Thiele, Julius Gonsior and Wolfgang Lehner, Database Systems Group, Technische Universität Dresden

Lunch Break (12.30 - 1.30 pm EST)

Session 3 (1.30-3 pm EST): (Chair: Rajesh Bordawekar)

(Startup Spotlight): Have you started speaking to your data, yet? Vibhore Kumar and Ufuk Hürriyetoglu, Unscrambl

LEA: A Learned Encoding Advisor for Column Stores, Lujing Cen, Andreas Kipf, Ryan Marcus and Tim Kraska, MIT CSAIL

Coffee Break (3- 3.30 pm EST

Session 4 (3.30-5 pm EST): (Chair: Nesime Tatbul)

Leveraging Approximate Constraints for Localized Data Error Detection, Mohan Zhang, Oliver Schulte and Yudong Luo, Simon Fraser University

(Keynote 2): Database Systems 2.0, Johannes Gehrke, Microsoft Research

Organization

Workshop Co-Chairs

Rajesh Bordawekar, IBM T.J. Watson Research Center

Yael Amsterdamer Department of Computer Science, Bar-Ilan University

Oded Shmueli, Computer Science Department, Technion - Israel Institute of Technology

Nesime Tatbul, Intel Labs and MIT

For questions regarding the workshop please send email to bordaw AT us DOT ibm DOT com.
Program Committee

Dana Van Aken, CMU

Joy Arulraj, Georgia Tech

Carsten Binnig, TU Darmstadt

Thomas Heinis, Imperial College

Andreas Kipf, MIT

Nick Koudas, University of Toronto

Amelie Marian, Rutgers University

Yuval Moskovitch, University of Michigan

Vivek Narasayya, Microsoft Research

Apoorva Nitsure, IBM Research

Sunita Sarawagi, IIT Bombay

Seema Sundara, Oracle Labs

Royi Ronen, Microsoft

Saravanan Thirumuruganathan, QCRI, HBKU

Zongheng Yang, University of California, Berkeley

Submission Instructions

Important Dates

Paper Submission: Monday, 22th March 2021, 12 pm PST

Notification of Acceptance: Friday, 23th April, 2021

Camera-ready Submission: Monday, 3rd May, 2021

Workshop Date: Friday, 25th June, 2021

Submission Site

All submissions will be handled electronically via EasyChair.

Formatting Guidelines

We will use the same document templates as the SIGMOD/PODS'21 conferences (the ACM format).
It is the authors' responsibility to ensure that their submissions adhere strictly to the ACM format. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.

The paper length for a full paper is limited upto 8 pages. However, shorter papers (4 pages) are encouraged as well.

All accepted papers will be indexed via the ACM digital library and available for download from the workshop webpage in the digital library.