Learned Systems

Cardinality Estimation Benchmark

Authors: Parimarjan Negi, Ryan Marcus, Andreas Kipf

In this blog post, we want to go over the motivations and applications of the Cardinality Estimation Benchmark (CEB), which was a part of the VLDB 2021 Flow-Loss paper.

There has been a lot of interest in using ML for cardinality estimation. The motivating application is often query optimization: when searching for the best execution plan, a query optimizer needs to estimate intermediate result sizes. In the most simplified setting, a better query plan may need to process smaller sized intermediate results, thereby utilizing fewer resources, and executing faster.

read more

Defeating Duplicates: A Re-Design of the LearnedSort Algorithm

Author: Ani Kristo

LearnedSort is a novel sorting algorithm that uses fast ML models to boost the sorting speed. We introduced the algorithm in SIGMOD 2020 together with a large set of benchmarks that showed outstanding performance as compared to state-of-the-art sorting algorithms.

However, given the nature of the underlying model, its performance was affected on high-duplicate inputs. In this post we introduce LearnedSort 2.0: a re-design of the algorithm that maintains the leading edge even for high-duplicate inputs. Extensive benchmarks demonstrate that it is on average 4.78× faster than the original LearnedSort for high-duplicate datasets, and 1.60× for low-duplicate datasets.

read more

LEA: A Learned Encoding Advisor for Column Stores

Authors: Lujing Cen, Andreas Kipf

We are presenting LEA, our new learned encoding advisor, at aiDM @ SIGMOD 2021. Check out our presentation and paper.

In this blog post, we will be going over a high level overview of LEA. LEA helps the database choose the best encoding for each column. At the moment, it can optimize for compressed size or query speed. On TPC-H, LEA achieves 19% lower query latency while using 26% less space compared to the encoding advisor of a commercial column store.

read more

More Bao Results: Learned Distributed Query Optimization on Vertica, Redshift, and Azure Synapse

Author: Ryan Marcus

Next week, we’ll present our new system for learned query optimization, Bao, SIGMOD21, where we are thrilled to receive a best paper award.

In our paper, we show how Bao can be applied to the open-source PostgreSQL DBMS, as well as an unnamed commercial system. Both DBMSes ran in a traditional, single-node environment. Here, we’ll give a brief overview of the Bao system and then walk through our early attempts at applying Bao to commercial, cloud-based, distributed database management systems.

read more

Announcing the Learned Indexing Leaderboard

Authors: Allen Huang, Andreas Kipf, Ryan Marcus, and Tim Kraska

Learned indexes have received a lot of attention over the past few years. The idea is to replace existing index structures, like B-trees, with learned models. In recent a paper, which we did in collaboration with TU Munich and are going to present at VLDB 2021, we compared learned index structures against various highly tuned traditional index structures for in-memory read-only workloads. The benchmark, which we published as open source including all datasets and implementations, confirmed that learned indexes are indeed significantly smaller while providing similar or better performance than their traditional counterparts on real-world datasets.

read more


In the coming weeks, we’ll start sharing regular research updates on learned systems by MIT DSG. Stay tuned!

To receive email notifications about new posts, you can subscribe here.

read more