Lab Lunch: 18 December 2018 - Yang Cao

Title: Parallel Query Processing with Bounded Communication for SQL-over-NoSQL systems

Abstract:

The SQL-over-NoSQL architecture has found prevalent use in industrial systems to process massive datasets, e.g., Google's Spanner, Facebook's MyRocks, Apache Hive and SparkSQL with Cassandra, among others. In these systems, data is organized as a key-value store in a storage cluster made of commodity machines, while query processing is carried in an elastic SQL layer. Such an architecture offers good horizontal scalability, availability, reliability, and cost-efficiency. However, such systems suffer two major bottlenecks for answering SQL queries: (a) bulk operations like scans are particularly slow over key-value stores, and (b) communication cost is heavy for parallel query evaluation.

In this talk, I will talk about how to mitigate these issues by rethinking the data model used for representing relations in these systems. In particular, we will see how to avoid costly scans and even bound the communication cost with an embarrassingly simple new data model for SQL-over-NoSQL systems.

Dec 18 2018 -

Lab Lunch: 18 December 2018 - Yang Cao

Speaker: Yang Cao

MF2 level 4