|
|
Recently, the field of Artificial Intelligence
(AI) has been experiencing a
resurgence. AI broadly covers
a wide swath of techniques,
which include logic-based
approaches, probabilistic
graphical models, machine
learning approaches such as
deep learning. Advances in
specialized hardware
capabilities (e.g., Graphics
Processing Units (GPUs),
Tensor Processing Units
(TPUs), Field-Programmable
Gate Arrays (FPGAs), etc.),
software ecosystem (e.g.,
programming languages such as
Python, Data Science frameworks, and
accelerated ML libraries), and
systems infrastructure (e.g.,
cloud servers with AI
accelerators) have led to
wide-spread adoption of AI
techniques in a variety of
domains. Examples of such
domains include image
classification, autonomous
driving, automatic speech
recognition, and
conversational systems (e.g.,
chatbots). AI solutions not
only support multiple data
types (e.g., images, speech,
or text), but also are
available in various
configurations and settings,
from personal devices to
large-scale distributed
systems.
In spite
of the wide-ranging techniques
and applications of AI, their
interactions with data
management systems remain in
infancy. Database management
systems have been, for a long
time, simply used as
repositories for feeding
inputs and storing
results. Only very recently,
we have started seeing some
new efforts in using AI
techniques in data management
systems, e.g., enabling
natural language interfaces to
relational databases and
applying machine learning
techniques for query
optimization. However, a lot
more needs to be done to fully
exploit the power of AI for
data management systems and
workloads.
aiDM is a one-day workshop that will bring
together people from academia
and industry to discuss
various ways of integrating AI
techniques with data
management systems. The
primary goal of the workshop
is to explore opportunities
for using AI techniques in
enhancing various components
of data management systems,
such as user interfaces, tooling, performance optimization, support
for new query types and workloads. Special emphasis will be given to
transparent exploitation of AI techniques using existing data
management infrastructures for enterprise-class workloads. We hope this workshop will
identify important areas of research and spur new efforts in this
emerging field.
The goal of the workshop is to take a holistic view of various AI technologies and
investigate how they can be applied to different component of an end-to-end data management
pipeline. Special emphasis would be given to how AI techniques could be used for enhancing
user experience by reducing complexity in tools, or providing newer insights, or providing
better user interfaces. Topics of interest include, but are not restricted to:
- Characterizing different AI approaches: Logic-based, probabilistic graphical models, and machine learning/deep learning approaches
- Evaluation of different learning approaches: unsupervised,
self-supervised, supervised or reinforced learning, transfer learning,
zero-shot learning, adversarial networks, and deep probabilistic models
- New AI-enabled business intelligence (BI) queries for relational databases
- Natural language enablement (e.g., queries, result summarization,
chatbot interfaces, etc.)
- Explainability and interpretability
- Fairness of AI-based system components
- Integration with Data Science and Deep Learning toolkits (e.g.,
sklearn, TensorFlow, PyTorch, ONNX, etc.)
- Evaluating quality of approximate results from AI-enabled queries
- Supporting multiple datatypes (e.g., images, time-series data, etc.)
- Supporting semi-structured, streaming, and graph databases
- Reasoning over knowledge bases
- Data exploration and visualization
- Integrating structured and unstructured data sources
- AI-enabled data integration strategies (e.g., entity resolution,
schema matching, etc.)
- Reinforcement learning for Database tuning
- Impact of AI on tooling, e.g., ETL or data cleaning
- Performance implications of AI-enabled queries
- Case studies of AI-accelerated workloads
- Social Implications of AI-enabled databases (e.g., detection and
elimination of bias)
- Learned data structures, database algorithms or systems
components
- Examples of AI-enabled customer usecases
-
Prof. Sunita
Sarawagi, IIT
Bombay
Sunita Sarawagi
researches in the
fields of
databases and
machine
learning. She is institute chair professor at IIT Bombay and
head of its AI Center. She did her PhD in databases from the University
of California at Berkeley under the guidance of Michael Stonebraker.
She has also worked at Google Research (2014-2016), CMU (2004),
and IBM Almaden Research Center (1996-1999). She was awarded the Infosys Prize in 2019 for
Engineering and Computer Science, and the distinguished Alumnus award
from IIT Kharagpur. She has several publications at ACM SIGMOD, VLDB, ICDM, NeurIPS,
ICML, and EMNLP conferences.
Modern AI for
Age-old problems
of Data Analytics
and Integration
Modern deep
learning methods
are pushing the
frontiers of many
challenging
problems in data
analytics and
integration. We
will discuss state
of the art models
that are providing
record-breaking
accuracy on
age-old tasks such
as time series
forecasting,
missing value
imputation, and
entity
resolution. We are
also witnessing
brand new
capabilities that
were not possible
a few years back.
On
multi-dimensional
analytical
datasets, we can
now obtain joint
distributions over
thousands of
interacting time
series. We can
generate long-term
forecasts of
realistic looking
temporal event
sequences. We can
perform entity
resolution across
heterogeneous,
multilingual
datasets via
actively learned
nearest neighbor
indices, thereby
eliminating the
need for
hand-designing
blocking
predicates. In
this talk we will
go over advances
in ML research
that are enabling
these
applications, and
present directions
for future
research.
- Johannes Gehrke,
Microsoft Research
Johannes Gehrke
is a Technical
Fellow at
Microsoft and
the Managing
Director of
Microsoft
Research at
Redmond and the
CTO and head of
machine learning
for Microsoft
Teams. He is an
ACM Fellow and
an IEEE
Fellow. From
1999 to 2015,
Johannes was on
the faculty in
the Department
of Computer
Science at
Cornell
University where
he graduated 25
PhD students,
and from 2005 to
2008, he was
Chief Scientist
at FAST Search
and Transfer.
Database
Systems
2.0
Software 2.0 – the augmentation and replacement of traditional code with models, especially deep neural networks – is changing how we develop, deploy, and maintain software. In this talk, I will describe the challenges and opportunities that this change brings with it, focusing on its impact on database research.
- Vibhore
Kumar
and Ufuk Hürriyetoglu, Unscrambl
Have you started speaking to your data, yet?
In this day and
age, where users
ask Alexa about
the nearest
COVID-19
vaccination
centers, Google
notifies them of
the possible
traffic jam
ahead of their
evening commute
and Netflix
knows the show
that they are
going to watch
later in the
evening, getting
access to data
and insights at
work continues
to be an
experience akin
to pulling
teeth.
The consumerization of Analytics and BI is now a foregone conclusion,
business users
expect
it. Users, if
they had their
wish, expect a
seamless,
AI-powered agent
to which they
could simply ask
a question,
using text or
voice, and
relevant data
and insights
should, well,
just
appear. This,
without the need
to rely on a
data analyst and
some times,
serendipitously,
even without the
need to
ask. Importantly,
such an agent
should be easy
to set up, be
agile, be able
to connect to
multiple sources
of data, provide
built-in
analytics and be
accessible in
collaboration
platforms like
Microsoft Teams,
Slack or Zoom -
which have become the de-facto workplaces.
This talk will detail our journey in building and deploying such an
AI-powered agent at multiple enterprises and driving data
adoption. Importantly, we will talk about the NLQ technology, which
given a database and its schema, automatically enables natural
language to SQL translation.
Session 1 (9-10.30
am EST): (Chair: Oded Shmueli)
- (Keynote 1): Modern AI for Age-old problems of Data Analytics and Integration,
Sunita Sarawagi, IIT
Bombay
- RUSLI: Real-time Updatable Spline Learned
Index, Mayank Mishra and Rekha Singhal, Tata Consultancy Services
Coffee Break
(10.30-11 am EST)
Session 2 (11 am -
12.30 pm EST): (Chair: Yael Amsterdamer)
-
A Tailored Regression for Learned Indexes: Logarithmic Error Regression,
Martin Eppert, Philipp Fent and Thomas Neumann, Technische Universität München
-
Balancing Familiarity and Curiosity in Data Exploration with Deep Reinforcement Learning,
Aurélien Personnaz, Sihem
Amer-Yahia, CNRS,
Univ. Grenoble Alpes; Laure
Berti-Equille, IRD,
ESPACE_DEV, Montpellier;
Maximilian Fabricius and
Srividya Subramanian, Max
Planck Institute for
Extraterrestrial Physics
-
Pre-Trained Web Table Embeddings for Table Discovery,
Michael Günther, Maik Thiele, Julius Gonsior and Wolfgang Lehner, Database Systems Group, Technische Universität Dresden
Lunch Break (12.30 -
1.30 pm EST)
Session 3 (1.30-3 pm
EST): (Chair: Rajesh Bordawekar)
- (Startup Spotlight): Have you started speaking to your data, yet? Vibhore
Kumar and Ufuk Hürriyetoglu,
Unscrambl
- LEA: A Learned Encoding Advisor for Column Stores,
Lujing Cen, Andreas Kipf, Ryan Marcus and Tim Kraska, MIT CSAIL
Coffee Break (3- 3.30
pm EST
Session 4 (3.30-5 pm
EST): (Chair: Nesime Tatbul)
-
Leveraging Approximate Constraints for Localized Data Error Detection,
Mohan Zhang, Oliver
Schulte and Yudong Luo,
Simon Fraser University
-
(Keynote 2): Database Systems 2.0,
Johannes Gehrke, Microsoft
Research
Workshop Co-Chairs
For questions regarding the
workshop please send email to bordaw AT us DOT ibm DOT com.
Program Committee
- Dana Van Aken, CMU
- Joy Arulraj, Georgia Tech
- Carsten Binnig, TU Darmstadt
- Thomas Heinis, Imperial College
- Andreas Kipf, MIT
- Nick Koudas, University of Toronto
- Amelie Marian, Rutgers University
- Yuval Moskovitch, University of Michigan
- Vivek Narasayya, Microsoft Research
- Apoorva Nitsure, IBM Research
- Sunita Sarawagi, IIT Bombay
- Seema Sundara, Oracle Labs
- Royi Ronen, Microsoft
- Saravanan Thirumuruganathan, QCRI, HBKU
- Zongheng Yang, University of California, Berkeley
Important Dates
- Paper Submission: Monday, 22th March 2021, 12 pm PST
- Notification of Acceptance: Friday, 23th April, 2021
- Camera-ready Submission: Monday, 3rd May, 2021
- Workshop Date: Friday, 25th June, 2021
Submission Site
All submissions will be handled electronically via EasyChair.
Formatting Guidelines
We will use the same document templates as the SIGMOD/PODS'21
conferences (the
ACM format). It is the authors' responsibility to ensure that
their submissions adhere
strictly to the ACM
format. In particular, it is not allowed to modify the format with the objective of squeezing in more material. Submissions that do not comply with the formatting detailed here will be rejected without review.
The paper length for a full paper is limited upto 8
pages. However, shorter papers
(4 pages)
are encouraged as
well.
All accepted papers will be
indexed via the ACM digital
library and available for
download from the workshop
webpage in the digital
library.
|
|
|