Developers of Firefox to Implement Text Generation Capabilities Based on Machine Learning
Developers of Firefox have announced plans to enhance the browser’s capabilities through the utilization of machine learning mechanisms. With the upcoming release of Firefox 130 on September 3, the browser will introduce a feature for automatically generating text descriptions of images. The implementation of this functionality has already commenced in nightly assemblies of Firefox, where a similar functionality is integrated into PDF reviews.
This new feature aims to provide text descriptions that can be read by screen readers, thereby assisting individuals with vision impairments. The text generation is achieved through the integration of a machine learning model that operates locally on the user’s system without the need for external services, similar to the built-in translation system.
The text generation model requires approximately 200 MB of disk space and utilizes the distilgpt2 model with 182 million parameters. For image analysis, the system employs a decoder based on the vision transformer (vit) model. The browser utilizes the ONNX runtime, compiled into WASM format, and the transformers.js library for working with the model.