Facebook company (prohibited in the Russian Federation) presented the new sound codec encodec , using machine learning methods to improve the degree of compression without loss of quality. A codec can be used both for streaming sound in real time, and for coding for subsequent saving in files. The Encodec reference implementation is written in Python using the Pytorch framework and is distributed under the CC BY-NC 4.0 license (Creative Commons Attribution-Noncommercial), which allows use only for non-profit purposes.
Two ready -made models are offered for loading:
- CAZAL Model using 24 KHZ discretion frequency that supports 24 KHZ Only a monophonic sound and trained on diverse sound data (suitable for coding of speech). The model can be used to pack sound data for transmission with bitrates 1.5, 3, 6, 12 and 24 KBPS.
- Non -casual model operating at a frequency of 48 KHZ, which supports stereo sound and trained only on music. The model is supported by Bitrates 3, 6, 12 and 24 KBPS.
An additional language model has been prepared for each model, which allows to achieve an increase in the degree of compression (up to 40%) without loss of quality. Unlike previously developed projects for the use of machine learning methods to compress sound, Encodec can be used not only for speech packaging, but also to compress music with a sampling frequency of 48 KHZ, corresponding to the level of sound CD.
According to the developers of the new codec when transmitting 64 KBPS, compared with the MP3 format, they managed to increase the degree of compression of the same level of quality (when using mp3, a 64 KBPS bandwidth is required, and to transmit with the same quality In Encodec – 6 kbps).
Architecture Codka is built on the basis of a neural network with architecture “ Transformer