Josh Poimboeuf from Red Hat recently published a patch for a 2.6% performance improvement in the “Per_THREADA_OPS” evaluation, which measures the number of operations that can be carried out in a single thread. The small patch contains code changes that replace the slow call of Barrier_nospec() with a faster masking of the pointer in the 64-bit function copy_from_user() that is used to copy data blocks from user space into the core.
Linus Torvalds has already incorporated this optimization into the kernel branch 6.12, which can be viewed here.
/Reports, release notes, official announcements.