aiDM 2026

Recently, the field of Artificial Intelligence (AI) has been experiencing a resurgence. AI broadly covers a wide swath of techniques, which include logic-based approaches, probabilistic graphical models, machine learning approaches such as deep learning. Advances in specialized hardware capabilities (e.g., Graphics Processing Units (GPUs), Tensor Processing Units (TPUs), Field-Programmable Gate Arrays (FPGAs), etc.), software ecosystem (e.g., programming languages such as Python, Data Science frameworks, and accelerated ML libraries), and systems infrastructure (e.g., cloud servers with AI accelerators) have led to wide-spread adoption of AI techniques in a variety of domains. Examples of such domains include image classification, autonomous driving, automatic speech recognition, and conversational systems (e.g., chatbots). AI solutions not only support multiple data types (e.g., images, speech, or text), but also are available in various configurations and settings, from personal devices to large-scale distributed systems.
Despite the widespread adoption of AI across diverse domains, its integration with data management systems remains in its infancy. Currently, most database management systems (DBMS) serve primarily as repositories for feeding input data to AI models and storing results. Recently, there has been increasing interest in using AI techniques within data management systems, including natural language interfaces to relational databases and machine learning techniques for query optimization and performance tuning. However, significant opportunities remain to harness the full potential of AI for enhancing data management workloads.
aiDM'26 is a one-day workshop that will bring together people from academia and industry to explore innovative ways to integrate AI techniques into data management systems. The workshop will focus on leveraging AI to enhance various components of data management systems, including user interfaces, tooling, performance optimizations, and support for new query types and workloads. Special attention will be given to transparently exploiting AI techniques, such as Generative AI frameworks, for enterprise-class data management workloads. We aim to identify key research areas and inspire new initiatives in this emerging and transformative field.

Keynote Speakers

Bringing AI to Data: Lessons and Experiences with Oracle AI Database Tirthankar Lahiri, Oracle
Tirthankar Lahiri is Senior Vice President of Mission-Critical Data and AI Engines for Oracle Database. He is responsible for data and ai engine technologies for Oracle Database, including areas like AI vector and hybrid search, graph RAG, transactions, indexing, in-memory columnar, data compression, time travel, etc. He also manages the Oracle TimesTen In-Memory Database and the Oracle NoSQLDB product teams. Tirthankar has 30 years of experience in the database industry and has worked on a variety of areas such as performance, scalability, manageability, caching, in-memory architectures, and AI-focused functionality. He has 73 patents, a B.Tech in Computer Science from the Indian Institute of Technology, Kharagpur, and an MS in Electrical Engineering from Stanford University.
Abstract: Enterprise databases are evolving from systems that merely store and retrieve data into platforms that actively reason, orchestrate, and collaborate using enterprise data . This session describes real-world experiences in transforming Oracle Database into an AI-native database platform, integrating capabilities such as AI vector search and hybrid search, built-in RAG pipelines, deep-reasoning retrieval, agentic NL2SQL, and persistent agent memory, directly into the converged data foundation provided by Oracle AI database. We will also highlight agent-builder capabilities that allow declarative agent creation, introduce a new open agent specification to standardize agent definitions, and outline in-database agent execution patterns. The session highlights how converging data management, vector and hybrid search and agentic AI enables a new generation of intelligent, context-aware AI-powered data-driven applications.

Contextualizing Large Language Models over Evolving Internal Data and Knowledge Sunita Sarawagi, IIT Bombay
Sunita Sarawagi researches in the fields of databases, machine learning, and applied NLP. She is Institute Chair Professor in the Computer Science Department and was the founding head of the Center for AI at IIT Bombay. She got her PhD in databases from the University of California at Berkeley and a bachelor's degree from IIT Kharagpur. She has also worked at Google Research, CMU, and IBM Almaden Research Center. She is an ACM fellow, was awarded the Infosys Prize in 2019 for Engineering and Computer Science, and the distinguished Alumnus award from IIT Kharagpur. She has several publications including notable paper awards at ACM SIGMOD, ICDM, and NeuRIPS conferences. Sunita Sarawagi researches in the fields of databases and machine learning. She was the founding head of the Center for Machine Intelligence and Data Science at IIT Bombay. She got her PhD in databases from the University of California at Berkeley and a bachelors degree from IIT Kharagpur. She has also worked at Google Research (2014-2016), CMU (2004), and IBM Almaden Research Center (1996-1999). She is an ACM fellow, was awarded the Infosys Prize in 2019 for Engineering and Computer Science, and the distinguished Alumnus award from IIT Kharagpur. She has several publications including notable paper awards at ACM SIGMOD, ICDM, and NeuRIPS conferences. She has served on the board of directors of the ACM SIGKDD and VLDB foundation, program chair for the ACM SIGKDD 2008 conference, and research track co-chair for the VLDB 2011 conference, and on the editorial boards of the ACM TODS and ACM TKDD journals.

Building Effective Unstructured Data Systems Shreya Shankar, University of California, Berkeley
Shreya Shankar is a graduating PhD student at UC Berkeley and will be joining Carnegie Mellon's Computer Science Department as an Assistant Professor. Her research spans data systems, large language models, and human-computer interaction. In her PhD, advised by Dr. Aditya Parameswaran, she created the DocETL ecosystem for LLM-powered unstructured data analysis, and her work has been recognized with multiple fellowships, EECS Rising Stars, a best paper award at CHI, and a best paper honorable mention at UIST. She also authored the curriculum and companion book for AI Evals for Engineers and PMs, an industry course taken by 4,000+ professionals.
Abstract: Databases and other data systems have successfully democratized data-oriented computation across domains, thanks to decades of research in system internals and end-user interfaces. However, such systems center on structured (i.e., tabular) data; unstructured data—the vast majority of data—has largely been ignored. Large language models (LLMs) now give us a building block for unstructured data analysis, and we face the same questions as in the early days of data systems—e.g., how should users author queries? How do we efficiently execute queries at scale?—but many well-established tenets from traditional data systems no longer hold. In my talk, I will present DocETL, a system I developed for unstructured data analysis. I will discuss how we had to rethink query optimization under these new assumptions, optimizing user-written pipelines for both accuracy and efficiency—as well as end-user interfaces for authoring, iterating on, and debugging pipelines. DocETL is open-source with 3.5k+ GitHub stars; our hosted interface has supported 4.1k+ pipelines across 30+ S&P-500 industries. Query optimization ideas from our work have been adopted in databases such as Snowflake and BigQuery, and our interface design principles have been adopted by companies like LangChain and OpenAI.

Workshop Schedule (9 AM-4.15 PM IST)

Workshop Opening (9-9.10 AM IST) Welcome remarks and overview of the day

Session 1 (9.10-10.15 AM IST)

(Keynote Presentation 1: 9.10-9.55 AM IST) Bringing AI to Data: Lessons and Experiences with Oracle AI Database , Tirthankar Lahiri, Oracle

(9.55-10.15 AM IST) Recall Is Not Enough: Token-Centric Metrics for Agentic Schema Access, Ioana Giurgiu and Michael E. Nidd. IBM Research, Zurich

Coffee Break (10.15-10.35 AM IST)

Session 2 (10.35 AM-12.35 PM IST)

(Keynote Presentation 2: 10.35-11.20 AM IST) Building Effective Unstructured Data Systems, Shreya Shankar, University of California, Berkeley

(11.20-11.45 AM IST AM IST) Same Data, Different Schemas: Robustness of LLM-based Text-to-SQL, Nitin Kanchinadam, Aditya Menachery and Amol Deshpande. University of Maryland at College Park

(11.45 AM-12.10 PM IST) VCL: Bridging Natural Language and Structured Control for Domain-Specific Document Analysis, Renzo Arturo Alva Principe, Filippo Armani, Matteo Palmonari and Carlo Batini. University of Milano-Bicocca

(12.10-12.35 PM IST) EcoLLM: Energy-Aware Benchmarking of LLMs for Data Processing Workloads, Pratyush Agnihotri, DFKI and TU Darmstadt, Manisha Luthra, RUB and TU Darmstadt, and Carsten Binnig, TU Darmstadt, DFKI, and hessian.AI

Lunch Break (12.35-1.45 PM IST)

Session 3 (1.45-3.20 PM IST)

(Keynote Presentation 3) Contextualizing Large Language Models over Evolving Internal Data and Knowledge, Sunite Srawagi, IIT Bombay

(1.45 AM-2.10 PM IST) Deal with it! Towards Self-organizing Data Schemas from Semi-Structured Inserts , Benjamin Hättasch, DFKI and TU Darmstadt, Leon Krüger, TU Darmstadt and Carsten Binnig, DFKI and TU Darmstadt

(2.10-2.35 PM IST) A Workload-Aware Physical Database Design Advisor Using Tree Transformers, Graph Neural Networks, and Reinforcement Learning, Priya Babu, Raji R. Pillai and Unnikrishnan K. Department of Computer Science and Engineering, Rajiv Gandhi Institute of Technology, Kerala, India

Coffee Break (3.20-3.40 PM IST)

Session 4 (3.40-4.05 PM IST)

The Data-Schema Bottleneck: Benchmarking 16 Deep Learning Architectures for Real-Time Starlink LEO Telemetry Management, Tanmoy Debnath, Sourabhi Debnath, Charles Sturt University, Miroslaw Narbutt, Technological University, Dublin, Ireland, and Maumita Bhattacharya, Charles Sturt University.

Closing Remarks (4.05-4.15 PM IST)

Topics of Interest

The goal of the workshop is to take a holistic view of various AI technologies and investigate how they can be applied to different component of an end-to-end data management pipeline. Special emphasis would be given to how AI techniques could be used for enhancing user experience by reducing complexity in tools, or providing newer insights, or providing better user interfaces. Topics of interest include, but are not restricted to:

Integration into Agentic and Orchestration Frameworks

Enabling different types of RAG Capabilities

New AI-enabled business intelligence (BI) queries for relational databases

Integration of Large Language Models with databases and supporting services (e.g., Generative AI)

Supporting Large Reasoning Models

Natural language queries and conversational interfaces

AI-enabled database programming (e.g., natural language queries, SQL co-pilots, etc.)

Design and Implementation of Vector Databases for unstructured data

Ethics, governance, and societal implications of AI-enabled databases

Reasoning over knowledge bases

Self-tuning Databases using reinforced learning

Impact of model interpretability

Supporting multiple datatypes (e.g., images or time-series data)

Supporting semi-structured, streaming, and graph databases

Impact of AI on tooling, e.g., ETL or data cleaning

Performance implications of AI-enabled queries

AI-enabled databases for managing and supporting AI workloads

AI strategies for data provenence, access control, anomaly detection and cyber security

Case studies of AI-accelerated workloads

AI-driven data compression and storage optimization

Workshop Organization

Workshop Steering Committee

Rajesh Bordawekar, NVIDIA

Oded Shmueli, Hirundo Ltd., and Emeritus Professor at Technion - Israel Institute of Technology

Workshop Program Chairs

Kavitha Srinivas, IBM T. J. Watson Research Center
Manisha Luthra, Ruhr University Bochum
Selim Tekin, Georgia Institute of Technology

Program Committee

Anwesha Saha, Boston University

Weiwei Gong, Oracle

Jerry Liu, Columbia University

Liane Vogel, TU Darmstadt

Arijit Khan, Bowling Green

Fatih Illhan, Georgia Institute of Technology

Varun Pandey, Technische Universität Nürnberg

Nils Strassenburg, HPI Potsdam

Thaleia-Dimitra Doudali, IMDEA

Xiao Li, ITU Copenhagen

Maximilian Böther, ETH Zürich

Submission Instructions

Important Dates

Paper Submission: Monday, April 6, 2026, 5 pm EST (UPDATED)

Notification of Acceptance: Friday, 24th April, 2026

Camera-ready Submission: Monday, 4th May, 2026

Submission Site

All submissions will be handled electronically via EasyChair.

Formatting Guidelines

We will use the same document templates as the SIGMOD/PODS conferences (the ACM format). Like SIGMOD/PODS'26, the aiDM submission should be double-blind.
It is the authors' responsibility to ensure that their submissions adhere strictly to the ACM format. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.

The paper length for a full paper is limited upto 12 pages, with unlimited pages of references. However, shorter papers (4 or 8 pages) are encouraged as well.

All accepted papers will be indexed via the ACM digital library and available for download from the workshop webpage in the digital library.