Published Duckdb 0.6.0, option SQLite for analytical queries

available DBMS release Duckdb 0.6.0 , combining SQLite properties such as compactness, the ability to connect in the form of a built-in library, storage of a database in one file and a convenient CLI-interface, with means and optimization for execution analytical queries , covering a significant part of the stored data, for example, aggregate the entire contents of the tables or the merger of several large tables. The project code is distributed under the MIT license. Development is still at the stage of formations experimental issues, since the format of the vault is not yet stabilized and changes from version to version.

In Duckdb it is provided an extended dialect of the SQL language, including additional opportunities for processing very complex and long -term executable requests. The use of complex types (arrays, structures, associations) and the possibility of performing arbitrary and invested correlating subordination are supported. The simultaneous execution of several queries is supported, requests directly from files in CSV format and paralquet . It is possible to import from the DBMS PostgreSQL.

In addition to the SQLite shell code, the project uses a POSTGResQL, a Date Math component from monetdb , its own implementation of window functions ( Based on the Segment Tree Aggregation algorithm), regular expressions processor based on the Re2 library, its own queries optimizer, MVCC-mechanism of control of the simultaneous task (Multi-version concurrence Control), as well as the vectorized service engine on the basis of the HYPER-PIPELING algorithm Allows in one operation to process large sets of values ​​at once.

Among changes in the new issue:

  • Continued work on improving storage format. An optimistic recording mode is implemented on a disk, in which when loading a large set of data in one transaction, the data is compressed and are recorded in streaming mode to the database file without waiting for the confirmation of the transaction by the COMMIT command. During the receipt of the team
    Commit data is already recorded to the disk, and when performing RollBack, they are discarded. Previously, the data first were fully preserved in memory, and under the Committee, they were stored on the disk.
  • Added support for parallel data loading to separate tables, which allows to significantly increase the speed of loading on multi -core systems. For example, in the last issue, the loading of a database from 150 million lines on a 10-core CPU took 91 seconds, and in the new version this operation is performed in 17 seconds. Two modes of parallel loading are provided – while maintaining the record of the records and without preserving order.
/Media reports.