LFCS Seminar: Tuesday, 26 November: Dan Olteanu
Title: Cardinality Estimation using Lp norms on Degree Sequences
Abstract: Cardinality estimation is the problem of estimating the size of the output of a query, without actually evaluating the query. The cardinality estimator is a critical piece of any query optimizer, and is often the main culprit when the optimizer chooses a poor plan. In this talk, I will introduce LpBound, a pessimistic cardinality estimator that computes a guaranteed upper bound on the size of the query output. Besides common statistics used for cardinality estimation, such as the sizes of the input relations, the domain sizes for various columns, histograms, and the most common key values in join columns, LpBound also uses Lp norms on the degree sequences of the join columns. The computed cardinality bound is an optimal solution of a linear program whose constraints encode such statistics and Shannon inequalities. I will also report on an on-going experimental evaluation of LpBound for a variety of queries from the JOB, STATS, and subgraph matching benchmarks: acyclic and cyclic queries, full and group-by queries, queries with and without range and equality predicates. LpBound takes time and uses memory very close to traditional estimators used by PostgreSQL, DuckDB, and a commercial database system. Yet the range of LpBound overestimates is much smaller than that of the traditional estimators, which underestimate or overestimate by orders of magnitude. When fed the LpBound cardinalities, PostgreSQL can take less time to evaluate the queries than when using its own estimates or those of other estimators. This is joint work with Mahmoud Abo Khamis (RelationalAI), Chris Mayer (U Zurich), Dan Suciu (UW), and Haozhe Zhang (U Zurich).
Bio: Dan Olteanu is a professor at the University of Zurich, where he leads the Data Systems and Theory group, and a computer scientist at RelationalAI, where he currently works on incremental view maintenance, cardinality estimation, and data science projects. He currently serves as chair of the ICDT council and as associate editor for TODS and VLDB Journal. He previously served as associate editor for PVLDB and the SIGMOD Record database principles column, as PC vice-chair for SIGMOD, and as PC chair for ICDT. He co-authored the book "Probabilistic Databases" (2011). He is the recipient of the ICDT 2022 Test of Time award for factorised databases, the ICDT 2019 best paper award for optimal maintenance of counting triangles in a graph, a PVLDB distinguished associate editor award, a SIGMOD distinguished PC member award, an ERC consolidator grant, a Google research faculty award, and an Oxford outstanding teaching award. He recently gave the Gems of PODS 2024 lecture on incremental view maintenance. Half a dozen of his works on cardinality estimation, in-database machine learning, maintenance for queries and analytics, and probabilistic programming were invited to special journal issues of best-of ICDT, EDBT, OOPSLA, and PODS conferences. More than a dozen of his BSc and MSc students received the prize for the best thesis or best overall performance at the University of Oxford. His PhD student Max Schleich received an honorable mention to the 2021 SIGMOD Jim Gray doctoral dissertation award.
LFCS Seminar: Tuesday, 26 November: Dan Olteanu
G.03, IF