French Mathematician Fabrice Bellard Introduces New Sound Coding Format
French mathematician Fabrice Bellard, known for his work on projects like QEMU and FFMPEG, recently unveiled a new coding format called TSAC for compressing and unpacking sound files. This format is designed for data transfer with an exceptionally low bitrate of 5.5 KB/S for mono and 7.5 KB/S for stereo, all while maintaining the quality of music and speech. By using TSAC, a 3.5-minute musical composition with a sampling frequency of 44.1 KHZ (stereo) can be compressed into a 192 KB file that sounds nearly identical to the original.
The TSAC project code is released under the MIT license and is based on the descript sound codec, which has been expanded to support stereo sound. Bellard incorporated machine learning models based on a neural network architecture called “Transformer“, enabling increased compression rates by taking into account human auditory perception.
The TSAC encoder can run on CPUs, with AVX2 instructions supported for acceleration, but using a GPU is recommended for optimal performance. Currently, the API CUDA can be utilized for acceleration on NVIDIA GPUs like AMPERE, ADA, and Hopper series (RTX 3090, RTX 4090, RTX A6000, A100, and H100) with a minimum of 4GB of video memory. The conversion of sound files before encoding is facilitated by ffmpeg.
Original | Improved |
---|