Home | Learned Systems
Home | Learned Systems
Authors: Dominik Horn,
Andreas Kipf,
Pascal Pfeil
We are happy to present LSI: A Learned Secondary Index
Structure and our accompanying
open-source C++
implementation that
can easily be included in other projects using CMake
FetchContent.
LSI is a learned secondary index. It offers competitive lookup performance on
real-world datasets while reducing space usage by up to 6x compared to
state-of-the-art secondary index structures.
Authors: Parimarjan Negi,
Ryan Marcus, Andreas Kipf
In this blog post, we want to go over the motivations and applications of the Cardinality Estimation Benchmark (CEB), which was a part of the VLDB 2021 Flow-Loss paper.
There has been a lot of interest in using ML for cardinality estimation. The motivating application is often query optimization: when searching for the best execution plan, a query optimizer needs to estimate intermediate result sizes. In the most simplified setting, a better query plan may need to process smaller sized intermediate results, thereby utilizing fewer resources, and executing faster.
Author: Ani Kristo
LearnedSort is a novel sorting algorithm that uses fast ML models to boost the sorting speed. We introduced the algorithm in SIGMOD 2020 together with a large set of benchmarks that showed outstanding performance as compared to state-of-the-art sorting algorithms.
However, given the nature of the underlying model, its performance was affected on high-duplicate inputs. In this post we introduce LearnedSort 2.0: a re-design of the algorithm that maintains the leading edge even for high-duplicate inputs. Extensive benchmarks demonstrate that it is on average 4.78× faster than the original LearnedSort for high-duplicate datasets, and 1.60× for low-duplicate datasets.
Authors: Lujing Cen, Andreas Kipf
We are presenting LEA, our new learned encoding advisor, at aiDM @ SIGMOD 2021. Check out our presentation and paper.
In this blog post, we will be going over a high level overview of LEA. LEA helps the database choose the best encoding for each column. At the moment, it can optimize for compressed size or query speed. On TPC-H, LEA achieves 19% lower query latency while using 26% less space compared to the encoding advisor of a commercial column store.
Author: Ryan Marcus
Next week, we’ll present our new system for learned query optimization, Bao, SIGMOD21, where we are thrilled to receive a best paper award.
In our paper, we show how Bao can be applied to the open-source PostgreSQL DBMS, as well as an unnamed commercial system. Both DBMSes ran in a traditional, single-node environment. Here, we’ll give a brief overview of the Bao system and then walk through our early attempts at applying Bao to commercial, cloud-based, distributed database management systems.
Authors: Allen Huang, Andreas Kipf, Ryan Marcus, and Tim Kraska
Learned indexes have received a lot of attention over the past few years. The idea is to replace existing index structures, like B-trees, with learned models. In recent a paper, which we did in collaboration with TU Munich and are going to present at VLDB 2021, we compared learned index structures against various highly tuned traditional index structures for in-memory read-only workloads. The benchmark, which we published as open source including all datasets and implementations, confirmed that learned indexes are indeed significantly smaller while providing similar or better performance than their traditional counterparts on real-world datasets.
In the coming weeks, we’ll start sharing regular research updates on learned systems by MIT DSG. Stay tuned!
To receive email notifications about new posts, you can subscribe here.