Skip to content
This repository has been archived by the owner on Feb 8, 2019. It is now read-only.

Latest commit

 

History

History
108 lines (92 loc) · 5.87 KB

DEV_README.md

File metadata and controls

108 lines (92 loc) · 5.87 KB

Apache Quickstep (Incubating)

Travis Widget

Apache Quickstep is an experimental high-performance database engine designed with the aim of Data at Bare-Metal Speed. It began life in 2011 as a research project at the University of Wisconsin and was acquired by Pivotal in 2015. Quickstep entered incubation at the Apache Software Foundation in April, 2016.

Getting Started (Building)

A build guide is available which includes instructions for building Quickstep for the first time. You may also find it useful to use one of the pre-made Vagrant boxes for Quickstep that are already set up with all of the development tools needed to build Quickstep.

Documentation

All publicly-visible classes and functions in the Quickstep code base have Doxygen documentation. Simply run doxygen in the root of the Quickstep source to generate browsable HTML documentation. Of course, the Doxygen comments should also be useful when reading header files directly.

In addition to the Doxygen and inline code comments explaining implementation details, a high-level overview for each module that comprises Quickstep is included in the README files in each subdirectory.

Architectural Overview

Quickstep is composed of several different modules that handle different concerns of a database system. The main modules are:

  • Utility - Reusable general-purpose code that is used by many other modules.
  • Threading - Provides a cross-platform abstraction for threads and synchronization primitives that abstracts the underlying OS threading features.
  • Types - The core type system used across all of Quickstep. Handles details of how SQL types are stored, parsed, serialized & deserialized, and converted. Also includes basic containers for typed values (tuples and column-vectors) and low-level operations that apply to typed values (e.g. basic arithmetic and comparisons).
  • Catalog - Keeps track of database schema as well as physical storage information for relations (e.g. which physical blocks store a relation's data, and any physical partitioning and placement information).
  • Storage - Handles the physical storage of relation data in self-contained, self-describing blocks, both in-memory and on persistent storage (disk or a distributed filesystem). Also includes some heavyweight run-time data structures used in query processing (e.g. hash tables for join and aggregation). Includes a buffer manager component for managing memory use and a file manager component that handles data persistence.
  • Compression - A simple implementation of ordered dictionary compression. Several storage formats in the Storage module are capable of storing compressed column data and evaluating some expressions directly on compressed data without decompressing. The common code supporting compression is in this module.
  • Expressions - This module builds on the simple operations provided by the Types module to support arbitrarily complex expressions over data, including scalar expressions, predicates, and aggregate functions with and without grouping.
  • Relational Operators - This module provides the building blocks for queries in Quickstep. A query is represented as a directed acyclic graph of relational operators, each of which is responsible for applying some relational-algebraic operation(s) to tranform its input. Operators generate individual self-contained "work orders" that can be executed independently. Most operators are parallelism-friendly and generate one work-order per storage block of input.
  • Query Execution - Handles the actual scheduling and execution of work from a query at runtime. The central class is the Foreman, an independent thread with a global view of the query plan and progress. The Foreman dispatches work-orders to stateless Worker threads and monitors their progress, and also coordinates streaming of partial results between producers and consumers in a query plan DAG to maximize parallelism. This module also includes the QueryContext class, which holds global shared state for an individual query and is designed to support easy serialization/deserialization for distributed execution.
  • Parser - A simple SQL lexer and parser that parses SQL syntax into an abstract syntax tree for consumption by the Query Optimizer.
  • Query Optimizer - Takes the abstract syntax tree generated by the parser and transforms it into a runable query-plan DAG for the Query Execution module. The Query Optimizer is responsible for resolving references to relations and attributes in the query, checking it for semantic correctness, and applying optimizations (e.g. filter pushdown, column pruning, join ordering) as part of the transformation process.
  • Command-Line Interface - An interactive SQL shell interface to Quickstep.

Licensing

Quickstep is licensed under the Apache License, Version 2.0. See LICENSE for the full license text.

Disclaimer

Apache Quickstep is an effort undergoing incubation at the Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.

Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects.

While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.