Memory Lane: Navigating Long-Term Memory in LLMs
Jul 1, 2024
Jul 1, 2024
Jul 1, 2024
Large Language Models (LLMs) are incredible at processing and understanding massive amounts of data. However, they often struggle with retaining information over long periods. This limitation can pose challenges for tasks that require remembering details from previous interactions or maintaining context over extended conversations.
By design, LLMs are stateless, meaning each query is processed independently of previous interactions. Imagine having a conversation with someone who forgets everything you said a moment ago. Frustrating, right?
This is essentially how LLMs operate—they excel at understanding context within a single session but falter when asked to recall information from weeks, months, or even years ago.
Memory of a gold fish 🐠
Even within a single conversation, LLMs can struggle to maintain context, often repeating themselves or losing track of the thread. To give LLMs long-term memory is a hack, not a feature. Models can only handle a limited amount of additional information beyond what they were originally trained on.
When you start a new conversation, the initial context is empty, gradually building up as messages and responses accumulate. However, there's a limit to this context length—typically just a few pages of text.
Solution?
When your chat history hits the limit, there are a couple of solutions:
Active Forgetting: The simplest approach is to drop the oldest messages to make room for new ones. Once a message is dropped, it's gone forever, as if it never happened. This can be done either by reducing the number of conversations or number of tokens. Think of it as selective amnesia for the LLM.
Summarization: A smoother way is to ask the model to summarize the entire conversation history so far. Replace the detailed history with the summary, add the latest message, and proceed. This way, the model retains the essence of the conversation without hitting the context limit.
Introducing the Memory Stream
To create the perception of LLMs remembering details about you, we combine them with a memory abstraction layer. Enter the Memory Stream—a log of all your assistant's interactions. Each new message and response are added to this stream, stored in a vector database. When the assistant receives a new input, it retrieves and adds the most relevant memories to the Memory Stream, which is ephemeral—created, used, and destroyed with each query.
The Memory Stream allows the LLM to develop higher-level understandings through reflection. By dynamically compiling memories during each user query, the assistant can maintain a sense of continuity and context, even if it doesn't remember every single word.
Solving the Memory Puzzle
Imagine each chat message as a dot in a 3D space, with similar messages clustering together. Why not just use 3D? Higher dimensions let us capture more nuanced meanings across hundreds or thousands of axes. Instead of worrying about what to keep or summarize in a limited chat context, we can build context dynamically for each message.
To do this, we convert each message into a high-dimensional vector. These vectors, along with their original messages, are stored as indices in a vector store. When a user asks a question, we fetch the closest, most relevant chat histories to create a custom context for the response. By transforming the new message into the same vector space, we pinpoint its semantic location and retrieve related historical messages.
This approach ensures we pull relevant chat history, avoiding irrelevant clutter, and always have the right memories to handle user queries effectively. It's a smarter, more efficient way to manage and utilize chat context, even if it means dealing with dimensions we can't easily visualize.
Watch the demo
From Forgetfulness to Innovation
The limitations of LLM memory are not a dead end but a springboard for innovation. As we tackle these challenges, we pave the way for a future where AI isn't just smart but also capable of building long-lasting, meaningful relationships, even without perfect recall.
So, next time you chat with an LLM, remember: they might not have the vast memory palaces we do, but they're constantly learning, growing, and evolving. Who knows? Maybe one day, they'll be able to hold onto that delightful joke you told them a year ago, just like the good friends we cherish.
Sources
Large Language Models (LLMs) are incredible at processing and understanding massive amounts of data. However, they often struggle with retaining information over long periods. This limitation can pose challenges for tasks that require remembering details from previous interactions or maintaining context over extended conversations.
By design, LLMs are stateless, meaning each query is processed independently of previous interactions. Imagine having a conversation with someone who forgets everything you said a moment ago. Frustrating, right?
This is essentially how LLMs operate—they excel at understanding context within a single session but falter when asked to recall information from weeks, months, or even years ago.
Memory of a gold fish 🐠
Even within a single conversation, LLMs can struggle to maintain context, often repeating themselves or losing track of the thread. To give LLMs long-term memory is a hack, not a feature. Models can only handle a limited amount of additional information beyond what they were originally trained on.
When you start a new conversation, the initial context is empty, gradually building up as messages and responses accumulate. However, there's a limit to this context length—typically just a few pages of text.
Solution?
When your chat history hits the limit, there are a couple of solutions:
Active Forgetting: The simplest approach is to drop the oldest messages to make room for new ones. Once a message is dropped, it's gone forever, as if it never happened. This can be done either by reducing the number of conversations or number of tokens. Think of it as selective amnesia for the LLM.
Summarization: A smoother way is to ask the model to summarize the entire conversation history so far. Replace the detailed history with the summary, add the latest message, and proceed. This way, the model retains the essence of the conversation without hitting the context limit.
Introducing the Memory Stream
To create the perception of LLMs remembering details about you, we combine them with a memory abstraction layer. Enter the Memory Stream—a log of all your assistant's interactions. Each new message and response are added to this stream, stored in a vector database. When the assistant receives a new input, it retrieves and adds the most relevant memories to the Memory Stream, which is ephemeral—created, used, and destroyed with each query.
The Memory Stream allows the LLM to develop higher-level understandings through reflection. By dynamically compiling memories during each user query, the assistant can maintain a sense of continuity and context, even if it doesn't remember every single word.
Solving the Memory Puzzle
Imagine each chat message as a dot in a 3D space, with similar messages clustering together. Why not just use 3D? Higher dimensions let us capture more nuanced meanings across hundreds or thousands of axes. Instead of worrying about what to keep or summarize in a limited chat context, we can build context dynamically for each message.
To do this, we convert each message into a high-dimensional vector. These vectors, along with their original messages, are stored as indices in a vector store. When a user asks a question, we fetch the closest, most relevant chat histories to create a custom context for the response. By transforming the new message into the same vector space, we pinpoint its semantic location and retrieve related historical messages.
This approach ensures we pull relevant chat history, avoiding irrelevant clutter, and always have the right memories to handle user queries effectively. It's a smarter, more efficient way to manage and utilize chat context, even if it means dealing with dimensions we can't easily visualize.
Watch the demo
From Forgetfulness to Innovation
The limitations of LLM memory are not a dead end but a springboard for innovation. As we tackle these challenges, we pave the way for a future where AI isn't just smart but also capable of building long-lasting, meaningful relationships, even without perfect recall.
So, next time you chat with an LLM, remember: they might not have the vast memory palaces we do, but they're constantly learning, growing, and evolving. Who knows? Maybe one day, they'll be able to hold onto that delightful joke you told them a year ago, just like the good friends we cherish.
Sources