Google introduced new audio codecs lyra , optimized to achieve maximum quality of speech transmission even When using very slow communication channels.
LYRA implementation code is written in C ++ and is open under the APACHE 2.0 license, but the proprietary module of the Linux kernel is present to the dependency necessary for the operation , open code of which is not possible. The specified module is used for mathematical calculations and binds through the libsparse_inference.so library. It is noted that the proprietary module is temporary – in the future Google promises to develop an open replacement and provide support for platforms other than Linux.
The quality of transmitted voice data at low LYRA speeds significantly exceeds the traditional codecs, in which digital signal processing methods are used. To achieve high quality voice transmission in the conditions of limited volume of information transmitted, in addition to conventional sound compression methods and signal conversion, the speech model is used on the basis of the machine learning system, which allows you to recreate the missing information based on typical speech characteristics. The model used to generate sound is trained using several thousand hours with votes records on more than 70 languages.
codec includes an encoder and decoder. The encoder’s work algorithm comes down to extracting voice data parameters every 40 milliseconds, their compression and transmission to the recipient over the network. For data transmission, there is enough communication channel with a speed of 3 kilobit per second. Recoverable sound parameters include logarithmic chalk spectrograms , taking into account the characteristics of speech energy in various frequency bands and prepared taking into account the model of human auditory perception.
In the decoder, is a generative model , which based on the transmitted sound parameters recreates a speech signal. To reduce the complexity of calculations, a light model was applied on the basis of a recurrent neural network, which is a variant of the model of speech synthesis wavernn