Recently, the field of Artificial Intelligence
(AI) has been experiencing a
resurgence. AI broadly covers
a wide swath of techniques,
which include logic-based
approaches, probabilistic
graphical models, machine
learning approaches such as
deep learning. Advances in
specialized hardware
capabilities (e.g., Graphics
Processing Units (GPUs),
Tensor Processing Units
(TPUs), Field-Programmable
Gate Arrays (FPGAs), etc.),
software ecosystem (e.g.,
programming languages such as
Python, Data Science frameworks, and
accelerated ML libraries), and
systems infrastructure (e.g.,
cloud servers with AI
accelerators) have led to
wide-spread adoption of AI
techniques in a variety of
domains. Examples of such
domains include image
classification, autonomous
driving, automatic speech
recognition, and
conversational systems (e.g.,
chatbots). AI solutions not
only support multiple data
types (e.g., images, speech,
or text), but also are
available in various
configurations and settings,
from personal devices to
large-scale distributed
systems.
Despite the widespread adoption of AI across diverse domains, its
integration with data management systems remains in its
infancy. Currently, most database management systems (DBMS) serve
primarily as repositories for feeding input data to AI models and
storing results. Recently, there has been increasing interest in using
AI techniques within data management systems, including natural
language interfaces to relational databases and machine learning
techniques for query optimization and performance tuning. However,
significant opportunities remain to harness the full potential of AI
for enhancing data management workloads.
aiDM'26 is a one-day workshop that will bring
together people from academia and industry to
explore innovative ways to integrate AI techniques into data
management systems. The workshop will focus on leveraging AI to
enhance various components of data management systems, including user
interfaces, tooling, performance optimizations, and support for new
query types and workloads. Special attention will be given to
transparently exploiting AI techniques, such as Generative AI
frameworks, for enterprise-class data management workloads. We aim to
identify key research areas and inspire new initiatives in this
emerging and transformative field.
- Bringing AI to Data: Lessons and Experiences with Oracle AI Database Tirthankar Lahiri, Oracle
Tirthankar Lahiri is Senior Vice President of Mission-Critical Data and AI Engines for Oracle Database. He is responsible for data and ai engine technologies for Oracle Database, including areas like AI vector and hybrid search, graph RAG, transactions, indexing, in-memory columnar, data compression, time travel, etc. He also manages the Oracle TimesTen In-Memory Database and the Oracle NoSQLDB product teams. Tirthankar has 30 years of experience in the database industry and has worked on a variety of areas such as performance, scalability, manageability, caching, in-memory architectures, and AI-focused functionality. He has 73 patents, a B.Tech in Computer Science from the Indian Institute of Technology, Kharagpur, and an MS in Electrical Engineering from Stanford University.
Abstract: Enterprise databases are evolving from systems that merely store and retrieve data into platforms that actively reason, orchestrate, and collaborate using enterprise data . This session describes real-world experiences in transforming Oracle Database into an AI-native database platform, integrating capabilities such as AI vector search and hybrid search, built-in RAG pipelines, deep-reasoning retrieval, agentic NL2SQL, and persistent agent memory, directly into the converged data foundation provided by Oracle AI database. We will also highlight agent-builder capabilities that allow declarative agent creation, introduce a new open agent specification to standardize agent definitions, and outline in-database agent execution patterns. The session highlights how converging data management, vector and hybrid search and agentic AI enables a new generation of intelligent, context-aware AI-powered data-driven applications.
- Contextualizing Large Language Models over Evolving Internal Data and Knowledge Sunita Sarawagi, IIT Bombay
Sunita Sarawagi researches in the fields of databases, machine
learning, and applied NLP. She is Institute Chair Professor in the Computer Science Department and was the
founding head of the Center for AI at IIT Bombay. She got her PhD in databases from the
University of California at Berkeley and a bachelor's degree from IIT
Kharagpur. She has also worked at Google Research, CMU, and IBM
Almaden Research Center. She is an ACM fellow, was awarded the
Infosys Prize in 2019 for Engineering and
Computer Science, and the distinguished Alumnus award from IIT
Kharagpur. She has several publications including notable paper awards
at ACM SIGMOD, ICDM, and NeuRIPS conferences. Sunita Sarawagi researches in the fields of databases and machine
learning. She was the founding head of the Center for Machine Intelligence and Data
Science at IIT Bombay. She got her PhD in databases from the
University of California at Berkeley and a bachelors degree from IIT
Kharagpur. She has also worked at Google Research (2014-2016), CMU
(2004), and IBM Almaden Research Center (1996-1999). She is an ACM
fellow, was awarded the Infosys Prize in 2019 for Engineering and
Computer Science, and the distinguished Alumnus award from IIT
Kharagpur. She has several publications including notable paper awards
at ACM SIGMOD, ICDM, and NeuRIPS conferences. She has served on the
board of directors of the ACM SIGKDD and VLDB foundation, program
chair for the ACM SIGKDD 2008 conference, and research track co-chair
for the VLDB 2011 conference, and on the editorial boards of the ACM
TODS and ACM TKDD journals.
- Building Effective Unstructured Data Systems Shreya Shankar, University of California, Berkeley
Shreya Shankar is a graduating PhD student at UC Berkeley and will be joining Carnegie Mellon's Computer Science Department as an Assistant Professor. Her research spans data systems, large language models, and human-computer interaction. In her PhD, advised by Dr. Aditya Parameswaran, she created the DocETL ecosystem for LLM-powered unstructured data analysis, and her work has been recognized with multiple fellowships, EECS Rising Stars, a best paper award at CHI, and a best paper honorable mention at UIST. She also authored the curriculum and companion book for AI Evals for Engineers and PMs, an industry course taken by 4,000+ professionals.
Abstract: Databases and other data systems have successfully democratized data-oriented computation across domains, thanks to decades of research in system internals and end-user interfaces. However, such systems center on structured (i.e., tabular) data; unstructured data—the vast majority of data—has largely been ignored. Large language models (LLMs) now give us a building block for unstructured data analysis, and we face the same questions as in the early days of data systems—e.g., how should users author queries? How do we efficiently execute queries at scale?—but many well-established tenets from traditional data systems no longer hold. In my talk, I will present DocETL, a system I developed for unstructured data analysis. I will discuss how we had to rethink query optimization under these new assumptions, optimizing user-written pipelines for both accuracy and efficiency—as well as end-user interfaces for authoring, iterating on, and debugging pipelines. DocETL is open-source with 3.5k+ GitHub stars; our hosted interface has supported 4.1k+ pipelines across 30+ S&P-500 industries. Query optimization ideas from our work have been adopted in databases such as Snowflake and BigQuery, and our interface design principles have been adopted by companies like LangChain and OpenAI.
Workshop Opening (9-9.10 AM IST) Welcome remarks and overview of the day
Session 1 (9.10-10.15 AM IST)
- (Keynote Presentation 1: 9.10-9.55 AM IST) Bringing AI to Data: Lessons and Experiences with Oracle AI Database , Tirthankar Lahiri, Oracle
- (9.55-10.15 AM IST) Recall Is Not Enough: Token-Centric Metrics for Agentic Schema Access, Ioana Giurgiu and Michael E. Nidd. IBM Research, Zurich
Coffee Break (10.15-10.35 AM IST)
Session 2 (10.35 AM-12.35 PM IST)
- (Keynote Presentation 2: 10.35-11.20 AM IST) Building Effective Unstructured Data Systems, Shreya Shankar, University of California, Berkeley
- (11.20-11.45 AM IST AM IST) Same Data, Different Schemas: Robustness of LLM-based Text-to-SQL, Nitin Kanchinadam, Aditya Menachery and Amol Deshpande. University of Maryland at College Park
- (11.45 AM-12.10 PM IST) VCL: Bridging Natural Language and Structured Control for Domain-Specific Document Analysis, Renzo Arturo Alva Principe, Filippo Armani, Matteo Palmonari and Carlo Batini. University of Milano-Bicocca
- (12.10-12.35 PM IST) EcoLLM: Energy-Aware Benchmarking of LLMs for Data Processing Workloads, Pratyush Agnihotri, DFKI and TU Darmstadt, Manisha Luthra, RUB and TU Darmstadt, and Carsten Binnig, TU Darmstadt, DFKI, and hessian.AI
Lunch Break (12.35-1.45 PM IST)
Session 3 (1.45-3.20 PM IST)
- (Keynote Presentation 3) Contextualizing Large Language Models over Evolving Internal Data and Knowledge, Sunite Srawagi, IIT Bombay
- (1.45 AM-2.10 PM IST) Deal with it! Towards Self-organizing Data Schemas from Semi-Structured Inserts , Benjamin Hättasch, DFKI and TU Darmstadt, Leon Krüger, TU Darmstadt and Carsten Binnig, DFKI and TU Darmstadt
- (2.10-2.35 PM IST) A Workload-Aware Physical Database Design Advisor Using Tree Transformers, Graph Neural Networks, and Reinforcement Learning, Priya Babu, Raji R. Pillai and Unnikrishnan K. Department of Computer Science and Engineering, Rajiv Gandhi Institute of Technology, Kerala, India
Coffee Break (3.20-3.40 PM IST)
Session 4 (3.40-4.05 PM IST)
- The Data-Schema Bottleneck: Benchmarking 16 Deep Learning Architectures for Real-Time Starlink LEO Telemetry Management, Tanmoy Debnath, Sourabhi Debnath, Charles Sturt University, Miroslaw Narbutt, Technological University, Dublin, Ireland, and Maumita Bhattacharya, Charles Sturt University.
Closing Remarks (4.05-4.15 PM IST)
The goal of the workshop is to take a holistic view of various AI technologies and
investigate how they can be applied to different component of an end-to-end data management
pipeline. Special emphasis would be given to how AI techniques could be used for enhancing
user experience by reducing complexity in tools, or providing newer insights, or providing
better user interfaces. Topics of interest include, but are not restricted to:
- Integration into Agentic and Orchestration Frameworks
- Enabling different types of RAG Capabilities
- New AI-enabled business intelligence (BI) queries for relational databases
- Integration of Large Language Models with databases and supporting services (e.g., Generative AI)
- Supporting Large Reasoning Models
- Natural language queries and conversational interfaces
- AI-enabled database programming (e.g., natural language queries, SQL co-pilots, etc.)
- Design and Implementation of Vector Databases for unstructured data
- Ethics, governance, and societal implications of AI-enabled databases
- Reasoning over knowledge bases
- Self-tuning Databases using reinforced learning
- Impact of model interpretability
- Supporting multiple datatypes (e.g., images or time-series data)
- Supporting semi-structured, streaming, and graph databases
- Impact of AI on tooling, e.g., ETL or data cleaning
- Performance implications of AI-enabled queries
- AI-enabled databases for managing and supporting AI workloads
- AI strategies for data provenence, access control, anomaly detection and cyber security
- Case studies of AI-accelerated workloads
- AI-driven data compression and storage optimization
Workshop Steering Committee
Workshop Program Chairs
- Kavitha Srinivas, IBM T. J. Watson Research Center
- Manisha Luthra, Ruhr University Bochum
- Selim Tekin, Georgia Institute of Technology
Program Committee
- Anwesha Saha, Boston University
- Weiwei Gong, Oracle
- Jerry Liu, Columbia University
- Liane Vogel, TU Darmstadt
- Arijit Khan, Bowling Green
- Fatih Illhan, Georgia Institute of Technology
- Varun Pandey, Technische Universität Nürnberg
- Nils Strassenburg, HPI Potsdam
- Thaleia-Dimitra Doudali, IMDEA
- Xiao Li, ITU Copenhagen
- Maximilian Böther, ETH Zürich
Important Dates
- Paper Submission: Monday, April 6, 2026, 5 pm EST (UPDATED)
- Notification of Acceptance: Friday, 24th April, 2026
- Camera-ready Submission: Monday, 4th May, 2026
Submission Site
All submissions will be handled electronically via EasyChair.
Formatting Guidelines
We will use the same document templates as the SIGMOD/PODS
conferences (the
ACM format). Like SIGMOD/PODS'26, the aiDM submission should be double-blind. It is the authors' responsibility to ensure that
their submissions adhere
strictly to the ACM
format. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.
The paper length for a full paper is limited upto 12
pages, with unlimited pages of references. However, shorter papers
(4 or 8 pages)
are encouraged as
well.
All accepted papers will be
indexed via the ACM digital
library and available for
download from the workshop
webpage in the digital
library.
|