Google's new LLM architecture cuts costs with memory separation

Large language models (LLMs) are getting a significant upgrade through Google’s new Titans architecture, which reimagines how AI systems store and process information by separating different types of memory components.

Key innovation: Google researchers have developed a neural network architecture called Titans that extends model memory capabilities while keeping computational costs manageable.

The architecture introduces a novel three-part system that handles information processing and storage differently from traditional LLMs
By segregating memory functions, Titans can process sequences up to 2 million tokens in length
Early testing shows Titans outperforming GPT-4 on long-sequence tasks despite using fewer parameters

Technical framework: The Titans architecture consists of three distinct modules that work together to process and retain information.

A “core” module manages short-term memory using traditional attention mechanisms
A “long-term memory” component employs neural memory for storing important information
A “persistent memory” module maintains fixed parameters after training, serving as a stable knowledge base

Memory management innovation: Titans employs a sophisticated “surprise” mechanism to determine which information deserves long-term storage.

This selective approach helps optimize memory usage and computational efficiency
The system can maintain longer context windows without the dramatic cost increases typically associated with expanding model capacity
The architecture reduces reliance on retrieval-augmented generation (RAG), a commonly used technique for extending model context

Implementation and accessibility: Google is taking steps to make this technology available to the broader AI community.

Plans are in place to release both PyTorch and JAX implementations for training and evaluation
The architecture could be integrated into Google’s existing models like Gemini and Gemma
The open-source release will allow researchers and developers to build upon and improve the technology

Future implications: The Titans architecture represents a significant step toward making large language models more practical and cost-effective for enterprise applications, though questions remain about real-world performance at scale and integration challenges with existing systems.

Google’s new LLM architecture cuts costs with memory separation

Recent Stories

DOE fusion roadmap targets 2030s commercial deployment as AI drives $9B investment

Tying it all together: Credo’s purple cables power the $4B AI data center boom

Vatican launches Latin American AI network for human development