aiDM 2023

Recently, the field of Artificial Intelligence (AI) has been experiencing a resurgence. AI broadly covers a wide swath of techniques, which include logic-based approaches, probabilistic graphical models, machine learning approaches such as deep learning. Advances in specialized hardware capabilities (e.g., Graphics Processing Units (GPUs), Tensor Processing Units (TPUs), Field-Programmable Gate Arrays (FPGAs), etc.), software ecosystem (e.g., programming languages such as Python, Data Science frameworks, and accelerated ML libraries), and systems infrastructure (e.g., cloud servers with AI accelerators) have led to wide-spread adoption of AI techniques in a variety of domains. Examples of such domains include image classification, autonomous driving, automatic speech recognition, and conversational systems (e.g., chatbots). AI solutions not only support multiple data types (e.g., images, speech, or text), but also are available in various configurations and settings, from personal devices to large-scale distributed systems.
In spite of the wide-ranging techniques and applications of AI, their interactions with data management systems remain in infancy. Database management systems have been, for a long time, simply used as repositories for feeding inputs and storing results. Only very recently, we have started seeing some new efforts in using AI techniques in data management systems, e.g., enabling natural language interfaces to relational databases and applying machine learning techniques for query optimization. However, a lot more needs to be done to fully exploit the power of AI for data management systems and workloads.
aiDM is a one-day workshop that will bring together people from academia and industry to discuss various ways of integrating AI techniques with data management systems. The primary goal of the workshop is to explore opportunities for using AI techniques in enhancing various components of data management systems, such as user interfaces, tooling, performance optimization, support for new query types and workloads. Special emphasis will be given to transparent exploitation of AI techniques using existing data management infrastructures for enterprise-class workloads. We hope this workshop will identify important areas of research and spur new efforts in this emerging field.

Topics of Interest

The goal of the workshop is to take a holistic view of various AI technologies and investigate how they can be applied to different component of an end-to-end data management pipeline. Special emphasis would be given to how AI techniques could be used for enhancing user experience by reducing complexity in tools, or providing newer insights, or providing better user interfaces. Topics of interest include, but are not restricted to:

Characterizing different AI approaches: Logic-based, probabilistic graphical models, and machine learning/deep learning approaches

Evaluation of different learning approaches: unsupervised, self-supervised, supervised or reinforced learning, transfer learning, zero-shot learning, adversarial networks, and deep probabilistic models

New AI-enabled business intelligence (BI) queries for relational databases

Natural language enablement (e.g., queries, result summarization, chatbot interfaces, etc.)

Explainability and interpretability

Fairness of AI-based system components

Integration with Data Science and Deep Learning toolkits (e.g., sklearn, TensorFlow, PyTorch, ONNX, etc.)

Evaluating quality of approximate results from AI-enabled queries

Supporting multiple datatypes (e.g., images, time-series data, etc.)

Supporting semi-structured, streaming, and graph databases

Reasoning over knowledge bases

Data exploration and visualization

Integrating structured and unstructured data sources

AI-enabled data integration strategies (e.g., entity resolution, schema matching, etc.)

Reinforcement learning for Database tuning

Impact of AI on tooling, e.g., ETL or data cleaning

Performance implications of AI-enabled queries

Case studies of AI-accelerated workloads

Social Implications of AI-enabled databases (e.g., detection and elimination of bias)

Learned data structures, database algorithms or systems components

AI-enabled databases for managing and supporting AI workloads

AI strategies for data provenence, access control, anomaly detection and cyber security

Experiences with database systems employing AI-enhanced components and interaction among AI-enhanced components

Workshop Schedule (8.30am - 5pm EST)

Session 1 (8.30-10am PST) (Chair: Yael Amsterdamer)

Introductory Remarks Yael Amsterdamer, Department of Computer Science, Bar-Ilan University
AutoCure: Automated Tabular Data Curation Technique for ML Pipelines Mohamed Abdelaal, Software AG; Rashmi Koparde, Otto von Guericke University Magdebur; and Harald Schoening, Software AG

Tuple Bubble: Learned Tuple Representation for Tunable Approximate Query Processing Damjan Gjurovski and Sebastian Michel, RPTU Kaiserslautern-Landau

Adversarial and Clean Data Are Not Twins Zhitao Gong, Auburn University and Wenlu Wang, Texas A & M University-Corpus Christi

Zero-Shot Cost Models for Parallel Stream Processing Pratyush Agnihotri, Technical University of Darmstadt ; Boris Koldehofe, Technical University of Ilmenau; Carsten Binnig and Manisha Luthra, Technical University of Darmstadt and DFKI

Coffee Break (10-10.30am PST)

Session 2 (10.30am - 12pm PST) (Chair: Oded Shmueli)

Keynote 1: Reasoning in Natural Language, Dan Roth, VP/Distinguished Scientist, AWS AI Labs and the Eduardo D. Glandt Distinguished Professor, CIS, University of Pennsylvania

Dan Roth is the Eduardo D. Glandt Distinguished Professor at the Department of Computer and Information Science, University of Pennsylvania, a VP/Distinguished Scientist at AWS AI Labs, and a Fellow of the AAAS, the ACM, AAAI, and the ACL. In 2017 Roth was awarded the John McCarthy Award, the highest award the AI community gives to mid-career AI researchers. Roth was recognized “for major conceptual and theoretical advances in the modeling of natural language understanding, machine learning, and reasoning.” Roth has published broadly in machine learning, natural language processing, knowledge representation and reasoning, and learning theory. He was the Editor-in-Chief of the Journal of Artificial Intelligence Research (JAIR), has served as the Program Chair for AAAI, ACL and CoNLL, and as a Conference Chair for a few top conferences. Roth has been involved in several startups; most recently he was a co-founder and chief scientist of NexLP, a startup that leverages the latest advances in Natural Language Processing (NLP), Cognitive Analytics, and Machine Learning in the legal and compliance domains. NexLP was acquired by Reveal in 2020. Prof. Roth received his B.A Summa cum laude in Mathematics from the Technion, Israel, and his Ph.D. in Computer Science from Harvard University in 1995.
OmniscientDB: A Large Language Model-Augmented DBMS That Knows What Other DBMSs Do Not Know Matthias Urban, Duc Dat, and Carsten Binnig, Technical University of Darmstadt

Lunch Break (12-1.30pm PST)

Session 3 (1.30-3pm PST) (Chair: Yael Amsterdamer)

(Keynote 2) Jun Wan, Databricks

Learned Spatial Data Partitioning Keizo Hori, Yuya Sasaki, Daichi Amagata, Yuki Murosaki, and Makoto Onizuka, Osaka University

Coffee Break (3-3.30pm PST)

Session 4 (3.30-5pm PST)

(Panel) Foundation Models and Databases: Opportunities and Challenges, Moderator: Rajesh Bordawekar
- Arvind Arasu, Microsoft Research
- Alekh Jindal, Smart Apps
- Tim Kraska, MIT
- Laurel Orr, Stanford
- Jun Wan, Databricks

Organization

Workshop Steering Committee

Rajesh Bordawekar, IBM T.J. Watson Research Center

Oded Shmueli, Hirundo Ltd., and Emeritus Professor at Technion - Israel Institute of Technology

Workshop Program Chairs

Yael Amsterdamer, Department of Computer Science, Bar-Ilan University

Donatella Firmani, Department of Statistical Sciences, Sapienza University of Rome

Andreas Kipf, Amazon Web Services

Program Committee

Zainab Abbas, KTH
Laure Berti-Equille, IRD
Cansu Kaynak Kocberber, Oracle
Nick Koudas, University of Toronto
Manisha Luthra, Tu Darmstadt
Umar Farooq Minhas, Apple
Felix Naumann, HPI
Rekha Singhal, Tata Consultancy Services
Anthony Tomasic, CMU
Brit Youngmann, MIT

Submission Instructions

Important Dates

Paper Submission: Friday, 17th March 2023, 12 pm PST

Notification of Acceptance: Monday, 17th April, 2023

Camera-ready Submission: Monday, 8th May, 2023

Submission Site

All submissions will be handled electronically via EasyChair.

Formatting Guidelines

We will use the same document templates as the SIGMOD/PODS'23 conferences (the ACM format).
It is the authors' responsibility to ensure that their submissions adhere strictly to the ACM format. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.

The paper length for a full paper is limited upto 12 pages, with unlimited pages of references. However, shorter papers (4 or 8 pages) are encouraged as well.

All accepted papers will be indexed via the ACM digital library and available for download from the workshop webpage in the digital library.