Explosion AI published release of free library spaCy with the implementation of text processing algorithms in natural language ( NLP , Natural Language
Processing). In practice, the project can be used to build autoresponders, bots, text classifiers and various dialogue systems that determine the meaning of phrases. The library is written in Python with elements in Cython , a Python extension that allows direct calls to C functions. The project code is distributed under the MIT license. Language models prepared for 58 languages, including Russian.
The library is designed to provide a persistent API that is not tied to the algorithms used and ready for use in real products. The library uses the latest advances in NLP and the most efficient algorithms available to process information. If a more efficient algorithm appears, the library is transferred to it, but this transition does not affect the API and applications. A feature of spaCy is also an architecture designed to process entire documents, without preprocessing in preprocessors that break the document into phrases. The models are offered in two versions – for maximum performance and maximum accuracy.
The main features of spaCy:
- Support for about 60 languages.
- Already trained models available for different languages and applications.
- Multitasking training using pre-trained transformers such as BERT ( Bidirectional Encoder Representations from Transformers).
- Support for pretrained word placement and embedding vectors.
- High performance.
- Model training system ready for production use.
- Linguistically motivated tokenization.
- Availability of ready-made components for linking named entities, marking parts of speech, classifying text, parsing dependencies based on tags, splitting sentences, marking parts of speech, morphological analysis, lemmatizations etc.
- Support for extending functionality with custom components and attributes.
- Support for creating your own models based on PyTorch, TensorFlow and other frameworks.
- Built-in syntax visualization and Named Entity Recognition (NER) tools.
- An easy process for packaging and deploying models, and managing the workflow.
- High precision.