Google developed a new architecture of neural networks called Titans, which can solve the problem with memory in LLM models. Now models will be able to process more data at a time without increasing the costs of calculations. Titans helps to find and save important pieces of information even in very long texts.
Titans is based on a combination of standard attention blocks and “neural memory”, which allows models to work simultaneously with current information and remember data for long-term use. Such models can cope with millions of tokens and at the same time work faster and more accurately than classic LLM or their alternatives, such as Mamba and Samba.
Conventional models use the self-interpretation mechanism, which analyzes the connections between words. But if the text is too long, the costs of calculations increase sharply. Some new approaches with lower costs are inferior in accuracy since all parts cannot take into account. Titans offers a balance: to save and use both short and long-term memory.
The key idea of Titans is the module of “neural long-term memory”, which can memorize new information right during the operation of the model. For example, if the data is unexpected or differs from the already known, the module considers them important and retains them. So that the memory does not overload, unnecessary information is automatically deleted.
Titans architecture includes three components:
- core. It acts as a short-term memory and processes the context with the help of a classic self-authorization mechanism;
- long-term memory. It stores data outside the current context;
- constant memory. Keeps the basic knowledge gained during training.
In tests on Titans models with 170-760 million parameters, the architecture showed high performance, especially on tasks with long texts. For example, in tasks where you need to find the necessary data among a large amount of information, Titans bypassed GPT-4, GPT-4O-Mini, and LLAMA-3.
Titans context window was increased to 2 million tokens, while memory costs remained low. This opens up great opportunities for creating applications where you can integrate new data directly into requests, without complex additional methods. Google plans to publish Titans for Pytorch and Jax so that developers can use this architecture.