The developers are actively working to put AI agents on the market, but an important obstacle was the lack of memory. Without the ability to recall the past interactions, the agents treat each conversation as if it were the first, leading to repetitive questions, an inability to remember the preferences of users and a general lack of personalization. This results in frustration for users and developers.
Historically, the developers tried to alleviate this by inserting whole session dialogues directly in the context window of an LLM. However, this approach is expensive and ineffective calculationresulting in higher inference costs and slower response times. In addition, food too much information, in particular non -relevant details, can degrade the exit quality of the model, causing problems such as “lost in the middle” and “rot of the context”.
Presentation of the Vertex AI memory bank
To overcome these limits, Google Cloud has announced the public overview of Memory banka new service managed in the Vertex Ai Agent Engine. Memory Bank is designed to help you build highly personalized conversational agents that facilitate more natural, contextual and continuous commitments.
For example, here is a personalized health worker: key information on a user allergy and the previous symptoms mentioned in past sessions are necessary to provide a more informed response in the current session
The memory bank solves the fundamental memory problem in several key ways:
- Customize interactions: It goes beyond generic scripts by remembering the preferences of users, key events and past choices to adapt each response.
- Maintain continuity: Conversations can resume transparently where they had stopped, even on several sessions that could extend days or weeks.
- Provide a better context: Agents are armed with the history necessary for a user, leading to more relevant, insightful and useful responses.
- Improve the user experience: This eliminates the frustration of repetitive users from information, creating more natural, effective and engaging conversations.
How the memory bank works
Memory Bank works through an intelligent process in several steps, taking advantage of Google Gemini models and new research:
- Includes and extract from memories: The memory bank analyzes a user conversation history (stored in engine agent sessions) for extract facts, preferences and the key context. This process occurs asynchronously in the background, generating new memories without obliging developers to build complex extraction pipelines.
- Stores and updates memories intelligently: Key information, such as “I prefer sunny days” is stored and organized by a defined scope, as a user ID. When new information emerges, the memory bank, using Gemini, can consolidate it with existing memories, the resolution of the contradictions and the guarantee that the memories remain up to date.
- Recalls relevant information: When a new conversation session begins, the agent can recover these stored memories. This recovery can be a simple reminder of all the facts or a more advanced Search for similarity using incorporation To find the most relevant memories for the current subject. This guarantees that the agent is always equipped with the right context.
This whole process is based on Google Research's new research methodAccepted by ACL 2025, which provides an intelligent approach based on the section on how agents learn and recall information, establishing a new standard for agent memory performance. An example is how a personal beauty companion agent can remember the skin type evolution of a user to make personalized products recommendations.
Start with the memory bank
The memory bank is integrated into Agent Development Kit (ADK) And Agent engine sessions. Developers can define an agent using ADK and allow engine agent sessions to manage conversations history during individual sessions. The memory bank can then be able to provide long -term memory on several sessions.
You can integrate the memory bank into your agent in two main ways:
- Develop an agent with Google Agent Development Kit (ADK) For a ready -to -use experience.
- Develop an agent who orchestrates API calls to the memory bank if you build your agent with a Other frameworkIncluding the popular like Langgraph and Crewai.
For those who are new on Google Cloud but using ADK, a Recording in express mode For agent engine sessions and the memory bank allows you to register with a Gmail account to receive an API key and build in free level use quotas before going transparently to a complete Google Cloud project for production.
