The OpenZFS project has released updates to its implementation of the ZFS file system for Linux and FreeBSD, addressing several issues in versions 2.1.14 and 2.2.2. The updates resolve a problem in the Code of the consistency of the Dnode cache, which was causing damage to data containing empty areas. The issue was eliminated after amendments were made to the code.
Initially, attempts were made to fix the problem in version 2.2.1, but the correction proved to be ineffective. The error went unnoticed for a long time and started appearing after changes were made to the “CP” utility in the Coreutils 9.X package. It is believed that the issue does not affect Red Hat Enterprise Linux and its derivatives, as they use Coreutils 8.x package with a different logic for the “CP” utility.
The problem manifests itself when using file utilities that can determine and optimize empty areas in files. Damage can occur when copying a file if the operation is performed immediately after changes and part of the data is still only in the DIRTY-CASH and has not yet been written to the disk.
To optimize the handling of empty areas in files, OpenZFS has supported SEEK_HOLE and SEEK_DATA operations since release 0.6.2. These operations allow skipping the empty areas of a file when reading from the disk. Recognition of empty areas and storage of information about them is done only after all data related to the file has been written to the disk. OpenZFS includes a verification process that checks for any data discrepancies in the cache and ensures that necessary information for using SEEK_HOLE and SEEK_DATA is written to the disk.
Unfortunately, the audit process was incomplete and, in certain circumstances, the state of the data discharge was not accurately determined. As a result, the disk contained outdated information about the file contents if a request fell within a small temporary window between data discharge operations. At that moment, reading operations that optimized the loading of empty areas could mistakenly skip part of the data, considering it empty. This issue could result in the creation of a copy containing empty areas where they did not exist in the original file when using the “CP” utility.