DuckDB 0.9.0 has been released, bringing together the properties of SQLite, such as compactness, built-in library connectivity, storage of a database in a single file, and a convenient CLI interface, with means and optimizations for executing analytical queries. These queries cover a significant portion of the stored data, such as processing the contents of tables or merging large tables. The project code is distributed under the MIT license. The development is still in the experimental stage, as the storage format has not yet been stabilized and undergoes changes from version to version.
In DuckDB, a wide dialect of the SQL language is provided, including additional capabilities for processing complex and long-term queries. It supports the use of complex types such as arrays, structures, and associations, as well as the execution of arbitrary and nested correlated subqueries. The system also supports simultaneous execution of multiple queries, querying directly from files in CSV format and Parquet. Furthermore, it is possible to import from the PostgreSQL DBMS.
In addition to the SQLite shell code, the project incorporates components from MonetDB, such as the Date Math component, its own implementation of window functions based on the Segment Tree Aggregation algorithm, a regular expressions processor based on the Re2 library, a query optimizer, and a Multi-version Concurrency Control (MVCC) mechanism for simultaneous task control. The project also utilizes a query execution vector based on the Hyper-Pipelining Query Execution, which allows for efficient processing of large sets of values in a single operation.
Notable changes in the new release include:
- Significant improvement in the performance of processing large datasets with “Group By” or “Distinct” expressions. The issue of running out of memory during the operation due to a lack of memory when aggregating hash tables that cannot fit into RAM has been resolved. For example, the processing time for the query “SELECT COUNT(*)