Amazon has published a data set for speech processing on 51 languages

Amazon published under the CC BY 4.0 license set data “Massive “(Multilingual Amazon Slurp for Slot Filling, Intent Classification, and Virtual-Assistant Evaluation), models for machine learning systems and tools for training your own models that can be used for Understanding information in the natural language (NLU, Natural Language Understanding). The kit includes more than a million annotated and classified statements prepared for 51 languages.

One of the objects of creating and publishing a set is to adapt voice assistants, such as Alexa, to handle information immediately in different languages, as well as stimulating third-party developers to the creation of applications and services that expand the capabilities of voice assistants. To attract the attention of AMAZON developers established a competition for creating a better universal model using a published data set.

Currently, voice assistants support only a few languages ​​and applies machine learning models attached to a specific language. The Massive project aims to eliminate this shortage by creating universal models and machine learning systems that can disassemble and process information immediately in several languages.

As a reference to build a Massive set, a collection slurp , originally accessible to English, which was localized to 50 other languages With the involvement of professional translators. The technology of understanding information in a natural language (NLU) used in the voice Assistant ALEXA (NLU) first transformscribed to the text, after which it uses several NLU models to text that analyze the presence of keywords to determine the essence of the user specified question.

/Media reports.