Everything about AI Memory

1. Why memory is the problem right now

Every major model (Claude, GPT, Gemini, Llama) is stateless. When a conversation ends, the model forgets everything. It doesn't carry forward what you told it, what it learned about you, or what happened last time. The next session starts from zero.

Platforms like claude.ai add memory features on top (project instructions, conversation memory, memory edits). But these are platform features, not model capabilities. The model itself has no persistent state. Everything it "remembers" is text that the platform stuffs into the context window before each message.

This is the core problem the entire field is trying to solve: how do you give a stateless text generator the ability to accumulate knowledge over time?

The approaches differ on three axes:

Where memory lives. In the context window? In an external database? In the model's own weights?
How memory is managed. By hand-designed rules? By the agent itself? By learned policy?
What kind of memory it is. Raw conversation history? Extracted facts? Structured knowledge graphs? Procedural patterns?

Every framework, paper, and product in this document takes a position on those three axes.

2. What is vector-based memory?

Before diving into frameworks, one concept needs explaining because it underpins most of what follows.

When an AI system stores a memory, it doesn't save raw text in a traditional database. It converts the text into a vector, a list of numbers (typically 768 to 3,072 numbers long) that represents the meaning of that text in mathematical space. This conversion is called embedding, and it's done by a specialized model (an embedding model, separate from the chat model).

The key property: texts with similar meanings produce vectors that are close together in this mathematical space. "I prefer Python for data work" and "Use Python, not R, for analysis" would produce vectors that are nearly neighbors, even though they share few words. "My daughter's birthday is in May" would be far away from both.

Retrieval works by similarity search. When you ask a question, the system embeds your question into the same vector space and finds the stored memories whose vectors are closest. This is called semantic search: it finds things by meaning, not by keyword matching.

Embedding turns text into a vector; vector DB stores them; similarity search finds nearest matches at query time.

A vector database (Pinecone, Qdrant, Chroma, pgvector) is purpose-built for storing these vectors and doing fast similarity search across millions of them. It's the foundation layer that most memory frameworks build on top of. May 2026 movement at this layer: Pinecone launched Nexus and KnowQL in early access as part of its May Launch Week, on top of its existing serverless + Pinecone Inference (hosted embedding and reranking) stack. Chroma's 2025 Rust rewrite is now in wide use, delivering ~4x faster writes and queries vs the original Python implementation (still positioned for development speed; production scale at 50M+ vectors typically lands on Pinecone or Milvus). Qdrant remains the best price-performance option for self-hosted deployments (millions of vectors on a small VPS at $30-50/month).

The limitation: vectors capture similarity but not structure. "Alice works at Google" and "Alice left Google" are semantically similar (both about Alice and Google) but factually opposite. Pure vector search can't distinguish them. This is why temporal knowledge graphs (Zep) and multi-graph architectures (MAGMA) add structure on top of vectors.

3. The established frameworks (actively maintained, in production)

These are the tools people are actually using in production today. All originated in 2024-2025 but are actively developed and releasing updates in 2026. They represent three distinct architectural approaches.

Three architectural approaches in production today. Pick by what your task actually needs.

3.1 Mem0: vector-based memory layer

What it is: A standalone memory service. You call add() to store memories and search() to retrieve them. Under the hood, it converts text to numerical vectors (embeddings) and stores them in a vector database. Retrieval works by semantic similarity.

Key design choice: Framework-agnostic. It doesn't care what agent framework you use. You import the SDK, point it at Mem0, and your agent has persistent memory. Your orchestration, tool management, and agent logic stay where they are.

What it does well: When you told it something two weeks ago and ask a related question today, it finds the connection. It handles updates: if you correct a preference, it updates rather than duplicating. It scopes memory per user, per session, per agent. On the Pro tier ($249/month), it adds a knowledge graph layer that extracts entities and relationships for multi-hop queries.

What it doesn't do: Time. Memories are stored and retrieved, not modeled as time-bounded facts. "You worked at Microsoft" is stored the same way whether it was true yesterday or five years ago. For agents that need to reason about how things changed, this is a gap.

Scale: ~48,000 GitHub stars. Integrations with 21 frameworks. The ECAI 2025 paper (arXiv:2504.19413) benchmarked ten approaches to memory, the broadest head-to-head comparison published to date.

May 2026 updates: Mem0 shipped Memory Decay in early May, adding recency-aware ranking to search so memories accessed recently get a soft boost and idle ones gently move lower (nothing deletes; old facts can still surface when genuinely relevant). Raised a $24M Series A on May 12, 2026. A token-efficient memory algorithm built on single-pass hierarchical extraction and multi-signal retrieval landed in April.

3.2 Zep / Graphiti: temporal knowledge graphs

What it is: A memory layer powered by Graphiti, a temporally-aware knowledge graph engine. Every fact stored has a validity window: "Kendra works at Google (March 2024 - September 2025)" is not just a stored string but a fact with a temporal bound.

How it works: The graph has three layers modeled after human memory:

Episode subgraph: raw conversational episodes (what happened, who said what)
Semantic entity subgraph: extracted entities and relationships (who knows whom, who works where)
Community subgraph: higher-level clusters of related knowledge for broad-topic queries

When new information contradicts old, Graphiti invalidates the old without discarding the historical record. "Kendra now works at Meta" doesn't delete the Google fact: it marks it as no longer valid and creates a new valid fact. This is non-destructive temporal modeling.

Why it matters: On LongMemEval using GPT-4o, Zep scores 63.8% vs. Mem0's 49.0%, a 15-point gap. That gap widens on temporal questions ("when did X change jobs?") and multi-hop reasoning ("what was true when Y happened?"). The architectural advantage of temporal fact modeling over flat vector storage shows up most when the question involves change over time.

Tradeoffs: More complex to set up and operate than Mem0. The temporal graph adds overhead that most straightforward use cases don't need.

Recent benchmarks and infrastructure: On the DMR benchmark Zep scores 94.8% vs 93.4% baseline. On LongMemEval, accuracy improvements up to 18.5% with response latency dropped 90% vs baseline. Graphiti MCP Server v1.0 shipped November 2025, compatible with Claude Desktop, Cursor, and any MCP client. Infrastructure work in late 2025 scaled Zep 30x in two weeks and brought P95 graph search latency from 600ms down to 150ms.

3.3 Letta (MemGPT): OS-inspired tiered memory

What it is: A full agent runtime built around the idea from UC Berkeley's MemGPT paper: treat the model's context window like RAM, and external storage like disk. The agent manages its own memory the way an operating system manages virtual memory.

Three tiers from MemGPT: core (in-context) / recall (searchable) / archival (long-term). Agent self-curates.

The radical part: The agent itself decides what to remember. It has tools to write to its own memory, search its memory, and page things in and out. This is not a passive system extracting facts from conversations. The agent actively curates what it keeps.

Tradeoff: Predictability vs intelligence. Mem0's passive extraction is consistent and cheap in terms of compute. Letta's self-editing approach is more adaptive but memory quality depends entirely on the model's judgment. If the model fails to save something, it's gone. Every memory operation also costs inference tokens, since the agent has to reason about what to store and how.

Recent additions: The "Conversations API" (December 2025) allows agents to maintain shared memory across parallel sessions; each conversation is tied to a specific agent and experiences within a conversation become memories that transfer across all conversations. Letta Code (the memory-first coding agent that's top-rated model-agnostic agent on Terminal-Bench) now defaults to using the Conversations API endpoints under the hood, so multiple concurrent Letta Code sessions retain memory and experience across all of them. As of December 1, 2025, the Letta API also supports programmatic tool calling for any LLM model.

4. Frontier research (2026 papers)

These are the ideas that haven't become products yet. Some may never. But they represent where the thinking is going.

4.1 MAGMA: memory as multiple graphs

Published: January 2026 (arXiv:2601.03236)

The idea: Instead of one knowledge graph (like Zep), represent each memory item across four separate, orthogonal graphs: semantic, temporal, causal, and entity. When you ask a question, the system figures out which graph(s) to traverse based on what you're actually asking.

"What happened?" routes to the temporal graph. "Why?" routes to causal. "Who's involved?" routes to entity. "What's this related to?" routes to semantic. The system traverses each independently and fuses the results.

Why it matters: Consider "Why did the project timeline change?" Semantic search finds memories about "timeline" and "project." Temporal search finds what happened in order. Neither finds the why. A causal graph can trace: timeline changed because the vendor missed a deadline, which happened because their API broke, which happened because they upgraded without testing. That chain of causation is a different structure from similarity or chronology.

Benchmarks: Outperforms Mem0, Zep, and other established systems on LoCoMo and LongMemEval for long-horizon reasoning tasks.

4.2 ZenBrain: neuroscience-grounded memory

Published: April 26, 2026 (arXiv:2604.23878).

The argument: Every existing memory system borrows metaphors from computer science (virtual memory paging in Letta, databases in Mem0, note-taking systems). None integrates principles from how the brain actually handles memory, despite a century of empirical neuroscience research on the topic.

Seven memory layers:

Working: what you're holding in mind right now
Short-term: fades in minutes
Episodic: specific events ("that meeting where Bob said X")
Semantic: abstracted knowledge ("Bob is risk-averse")
Procedural: skills and patterns ("how we do deploys here")
Core: identity-level beliefs
Cross-context: bridges between different domains of your life

Fifteen neuroscience algorithms orchestrate these layers, including:

A four-channel NeuromodulatorEngine simulating dopamine, norepinephrine, serotonin, and acetylcholine dynamics
Prediction-error-gated reconsolidation (memories update when reality violates expectations)
TripleCopyMemory with divergent decay (fast/medium/deep copies with logarithmic deep-copy consolidation)
Sleep-time consolidation in three phases: slow-wave sleep (SWS) for consolidation, REM for creative recombination, synaptic homeostasis for pruning

On sleep consolidation: This is not as out-there as it sounds. Anthropic's Claude Code already ships a production "Auto Dream" pipeline: processing and consolidating memories during idle time. ZenBrain formalizes the concept with neuroscience-grounded phases. Their sleep consolidation achieves 37% stability improvement with 47.4% storage reduction.

Benchmarks: On LongMemEval-500, ZenBrain holds the highest mean rank across all evaluation cells. Reaches 91.3% of oracle (perfect) accuracy at 1/106th the per-query token budget. Multi-layer routing beats flat single-layer baseline by 20.7% F1 on LoCoMo.

Open source with 11,589 automated test cases.

4.3 Field-theoretic memory: memory as physics

Published: February 2026 (arXiv:2602.21220)

The idea: Treat memories as continuous fields governed by partial differential equations rather than discrete entries in a database. Memories diffuse through semantic space, decay thermodynamically based on importance, and interact through field coupling.

Instead of storing "user prefers dark mode" as a row in a database and retrieving it by similarity search, this approach models it as a field with energy that spreads, fades, and interferes with other memories according to physics equations.

Why this one is worth paying attention to: Field coupling enables multi-agent knowledge sharing without centralized coordination. Two agents' memory fields can overlap and influence each other without a shared database. This is the first approach that addresses multi-person (or multi-agent) memory architecturally, not as a product feature bolted on, but as a mathematical property of the system itself.

Benchmarks: +116% F1 on multi-session reasoning (p < 0.01), +43.8% on temporal reasoning (p < 0.001), +27.8% retrieval recall on knowledge updates vs. existing baselines. Open-source implementation with JAX acceleration achieving 518x speedup through JIT compilation.

4.4 Second Me: memory baked into model weights

Published: March 2025 (arXiv:2503.08102). Open-source, 15K+ GitHub stars.

The idea: Every other approach stores memories outside the model and retrieves them into the context window at query time. Second Me asks: what if you fine-tuned a small personal model on your own data so the memories live inside the weights?

Three-layer architecture:

L0: raw data capture (conversations, documents, interactions)
L1: structured extraction and abstraction (entities, topics, patterns)
L2: a "Lifelong Personal Model" (LPM) fine-tuned on your personal data using parameter-efficient methods (LoRA adapters)

The L2 model is positioned as a context provider aligned with the user's perspective, not a task executor. It doesn't do things for you. It represents you. When another AI system needs to know something about you, it queries your Second Me instead of asking you to repeat yourself.

The difference: External retrieval (Mem0, Zep) finds relevant facts and stuffs them into the prompt. Parametric memory (Second Me) has the knowledge woven into the model's weights: it "knows" things the way you know your own phone number, without looking it up. The tradeoff: retrieval-based memory can be updated instantly. Parametric memory requires retraining to absorb new information.

4.5 GSW: episodic memory as structured situations

Published: November 2025 (arXiv:2511.07587). Presented at AAAI 2026.

The idea: Standard retrieval systems break history into chunks or isolated facts. But human episodic memory doesn't work that way. When you remember a meeting, you don't remember isolated facts; you remember a situation with roles, actions, spatial context, and temporal flow.

GSW builds structured, interpretable representations of evolving situations. Instead of extracting "Bob said the deadline is Friday" as a fact, GSW preserves the whole situation model: a planning meeting where Bob (project lead) set Friday as a deadline, Sarah pushed back citing resource constraints, and the group compromised on Monday. The situation is a structure, not decomposed atoms.

Benchmarks: Outperforms RAG baselines by up to 20% on the Episodic Memory Benchmark. More importantly, it reduces query-time context tokens by 51% because structured situations give the model what it needs without padding the context with loosely related chunks.

4.6 OCR-Memory: visual memory

Published: April 29, 2026 (arXiv:2604.26622).

The idea: Instead of storing agent history as text, render it as images annotated with visual identifiers. Retrieve experience by locating and transcribing from those images.

The reasoning: images are a denser information format than text. A screenshot of a conversation carries layout, structure, speaker identity, emphasis, all in one visual chunk. Text-based retrieval has to chunk, embed, search, and reconstruct all of that separately. Image-based retrieval just looks at it.

Early results show consistent gains under strict context limits on long-horizon benchmarks.

4.7 MemMachine: ground-truth-preserving memory

Published: April 2026 (arXiv:2604.04853)

The idea: Focus on accuracy over architecture novelty. MemMachine scores 0.9169 on LoCoMo with gpt-4.1-mini, among the strongest published results for open memory frameworks, above reported scores for Mem0, Zep, Memobase, LangMem, and OpenAI baselines. Includes a systematic ablation study across six optimization dimensions (sentence chunking, query bias correction, context formatting, retrieval depth, search prompt design, answer model selection), achieving 93.0% overall accuracy on LongMemEval.

Worth reading for the methodology: it's less about a new architecture and more about what actually matters when you tune existing approaches carefully.

4.8 Human-like lifelong memory: emotional and epistemic memory

Published: March 2026 (arXiv:2603.29023)

The idea: Integrates emotional valuation, System 1/System 2 retrieval routing (fast intuitive lookup vs. slow deliberate search), curiosity-driven gist formation, epistemic trust channels, and identity persistence through a belief hierarchy. No existing framework integrates more than two of these mechanisms. This paper proposes design principles rather than a working system, but the conceptual architecture is the most complete attempt at modeling how human memory actually functions, including the messy emotional and social parts.

5. The survey papers (read these for the big picture)

5.1 "Memory in the Age of AI Agents" (Dec 2025, revised Jan 2026)

46 authors. 1.9K GitHub stars on the companion repo. The most comprehensive survey to date.

The taxonomy from the December 2025 survey. Three types of memory, three dynamics that act on them.

Key contribution: a functional taxonomy that replaces the old short-term/long-term split.

Factual memory: knowledge ("Python 3.12 uses X syntax")
Experiential memory: insights and skills ("last time this approach failed because...")
Working memory: active context management (what's relevant right now)

Plus a dynamics framework: Formation (how memory is extracted), Evolution (how it consolidates and what gets forgotten), Retrieval (how it's accessed).

Identifies open frontiers: multi-agent memory, multimodal memory, RL-integrated memory management, and trustworthiness (can you trust what the memory system tells you?).

5.2 LongMemEval-V2 (May 12, 2026)

A second-generation benchmark from the original LongMemEval authors: "LongMemEval-V2: Evaluating Long-Term Agent Memory Toward Experienced Colleagues" (arXiv:2605.12493). 451 manually curated questions covering five core memory abilities for web agents: static state recall, dynamic state tracking, workflow knowledge, environment gotchas, and premise awareness. Frames the evaluation around "can your memory system help an agent acquire the experience needed to behave like a knowledgeable colleague in a customized environment." A higher bar than the original LongMemEval's information-extraction / multi-session-reasoning split. Worth watching as the new benchmark publications cluster around it through the second half of 2026.

5.3 "Memory for Autonomous LLM Agents" (March 2026)

Complements the above with a more mechanistic view. Formalizes memory as a write-manage-read loop and identifies five mechanism families:

Context-resident compression: squish it into the prompt
Retrieval-augmented stores: look it up from a database (Mem0, Zep)
Reflective self-improvement: learn from what you did
Hierarchical virtual context: the Letta/MemGPT paging approach
Policy-learned management: use reinforcement learning to learn when and what to remember

That fifth one is the most significant frontier. Instead of a human engineer designing extraction rules, the agent learns its own memory strategy by optimizing for downstream task performance.

6. Raw vs derived: the central design spectrum

Every memory system is choosing a position on the raw vs derived spectrum.

The raw-vs-derived spectrum. Every memory architecture picks a point; most production systems hybridize.

Raw memory stores everything verbatim. The full conversation, every email, every document, with timestamps. Retrieval is exact. The system never loses fidelity but storage and retrieval get expensive at scale, and the model has to do a lot of work to extract relevant signal from all the noise.

Derived memory stores summaries, abstractions, embeddings. The system extracts the gist and discards the raw. Retrieval is fast and cheap. The system loses the long tail of detail and the abstraction's quality depends entirely on how well the summarization step worked.

Neither extreme works. Pure raw memory is expensive and noisy. Pure derived memory loses the specific details that often matter most ("you said it'd be done by Tuesday" vs "you mentioned a deadline"). Production memory systems hybridize: keep the raw for some window, derive after that, allow the agent to ask for raw retrieval when the derived isn't enough.

7. LeCun's world models and System M

Yann LeCun (Chief AI Scientist at Meta, Turing Award winner) has been arguing since 2022 that LLMs are fundamentally limited and that a different architecture is needed for human-level intelligence. His work connects to the memory landscape in non-obvious but important ways.

7.1 The JEPA architecture

JEPA (Joint Embedding Predictive Architecture) is LeCun's alternative to autoregressive text generation (which is how every LLM works: predict the next token, repeat). Instead of predicting raw data (pixels, words), JEPA predicts in an abstract representation space. It learns by predicting missing parts of an input at a higher level of abstraction than the raw data.

This has produced real research artifacts:

I-JEPA: predicts missing parts of images in representation space
V-JEPA: extends this to video, learning temporal dynamics
V-JEPA 2 (2026): trained on over 1 million hours of internet video, achieves state-of-the-art on human action anticipation. This is the one directly relevant to embodied AI and robotics.
LeWorldModel (March 2026, co-authored by LeCun): a JEPA-based world model that trains end-to-end from pixels with ~15M parameters on a single GPU, planning up to 48x faster than foundation-model-based world models

7.2 The A-B-M framework (March 2026 paper)

In a March 2026 position paper (arXiv:2603.15381, "Why AI systems don't learn and what to do about it"), LeCun, Dupoux, and Malik proposed a three-system architecture.

LeCun's three-system architecture. System M is the meta-controller that decides when to learn from observation vs action.

System A: Learning from Observation. Self-supervised learning. LLMs, CLIP, V-JEPA, DINO. Passive intake, statistical pattern extraction, world modeling from sensory streams. Scales well. Cannot distinguish correlation from causation.

System B: Learning from Action. Reinforcement learning, control theory, planning. Trial-and-error interaction with an environment. Grounded in interaction but horrifically sample-inefficient.

System M: Meta-Controller. The new piece. A system that coordinates information flow between A and B. Decides WHEN to observe vs. act, WHAT data is worth learning from, HOW to switch between learning modes, and WHERE to route information between subsystems. Inspired by how children flexibly switch between watching, trying, following instructions, and imagining.

Their core argument: current AI has no System M. All meta-control is performed by human engineers making training decisions. The machine itself has zero ability to adapt after deployment.

7.3 Where this connects to memory

System M is a memory orchestrator. It decides what's worth remembering (formation), when to consolidate observations into knowledge (evolution), and when to retrieve past experience to inform current action (retrieval). Those are the same three dynamics from the memory survey taxonomy.

ZenBrain's seven-layer architecture with neuroscience-grounded routing is the closest existing memory system to what System M would need. The RL-learned memory management direction (where the agent learns its own memory strategy) is the closest mechanism. LeCun's world models (V-JEPA 2) provide the perception side: understanding the physical world through video; while the memory architectures provide the persistence side: remembering what was understood.

The gap: LeCun's work focuses on perception and representation learning. The memory research focuses on storage, retrieval, and management. Nobody has connected them into a single system where a world model's learned representations feed into a persistent memory architecture managed by a meta-controller. That integration is the frontier.

8. Open problems nobody has solved

These keep appearing across papers, surveys, and frameworks. They are the unsolved fundamentals.

Cross-device and cross-session identity resolution. For users interacting across multiple devices or authentication methods, resolving whether two interactions came from the same person is a non-trivial identity problem that memory systems do not currently address.

Memory staleness. A highly-retrieved memory about a user's employer is highly relevant until it isn't, at which point it becomes confidently wrong rather than just outdated. Detecting when high-relevance memories become stale is an open research problem. Zep's temporal validity windows help, but they require the system to learn that something changed.

Multi-user memory spaces. Every framework assumes one user, one memory store. No production system handles "these three people share some context but not all of it." Letta's Conversations API (Dec 2025) and the field-theoretic paper's field coupling are the only two approaches that even gesture at this.

Memory trustworthiness. If a system extracts a wrong fact and then retrieves it 50 times, the confidence in that fact grows while its accuracy doesn't. How do you build a memory system that knows what it doesn't know?

The evolution problem. You said you wanted to learn guitar in March. By June you've stopped mentioning it. Does the system keep nudging? Forget it? Ask? This is the "evolution" dimension from the survey taxonomy. Nobody has cracked it. Current systems either remember everything forever or forget on a fixed schedule. Neither is how humans work.

9. What the major LLM providers are doing

The memory frameworks above (Mem0, Zep, Letta, etc.) are independent tools you wire up yourself. But the major LLM providers are also building memory directly into their platforms.

9.1 Microsoft: the most active across two surfaces

GitHub Copilot Memory (public preview, March 2026, on by default for Pro/Pro+). The most architecturally interesting thing any major provider is doing. Copilot agents automatically discover and store insights ("memories") as they work: coding conventions, architectural patterns, cross-file dependencies. Key properties:

Cross-agent: what one Copilot agent learns, others can use. Code review discovers a pattern, cloud agent applies it later.
Validated before use: agents check memories against the current codebase before applying them. Stale or inaccurate context is never used.
28-day expiry with renewal: memories auto-expire after 28 days, but if a memory is validated and used by Copilot, a new one with the same details is stored, extending its life. Memories that keep proving useful survive. Others die. Simple but effective staleness management.
Repository-scoped, plus user-scoped as of May 15, 2026: repository memories are available to all users with access; the new user-level preferences (early access for Pro and Pro+ users) capture personal preferences that follow you across all your repositories and Copilot agents without affecting other users in the same repository. Two distinct scopes that share the same memory infrastructure.

M365 Copilot Memory (GA July 2025). Preference-based memory across Word, Excel, PowerPoint, Outlook, Teams. Uses intent-based storage: "I prefer Python for data science" gets stored; "Write Python code for clustering" does not.

Memory Poisoning Research (February 2026). Microsoft published research on AI Recommendation Poisoning: attackers injecting hidden instructions into an AI's memory through prompts embedded in web pages. Formally recognized by MITRE ATLAS as "AML.T0080: Memory Poisoning."

9.2 OpenAI: shipped in GPT-5.5, not GPT-6

The story has changed since the April 26 source was written. Sam Altman had positioned long-term memory as the headline feature for the next-generation system (GPT-6). What actually shipped on April 23, 2026 was GPT-5.5 (codenamed "Spud" during development), carrying the personalization features that had been pitched as GPT-6 territory. GPT-5.5 Instant became the default ChatGPT model on May 5, 2026.

The memory layer in GPT-5.5: enhanced personalization from past chats, files, and connected Gmail. Rolling out to Plus and Pro users on web (mobile coming, plans to expand to Free, Go, Business, Enterprise). The model uses this context to tailor suggestions, pick up ongoing work, and reduce the need to repeat context across conversations. Plus the existing "Saved Memories" (explicit) and "Chat History" (implicit, paid tiers) two-tier system from prior releases.

GPT-6 remains unannounced as of mid-May 2026: no architecture paper, no parameter count, no pricing, no launch date. The pattern points to OpenAI leaning into frequent point releases (5.4, 5.5, and onward) rather than another big-number leap. Memory is no longer being held back as the GPT-6 differentiator; it's the active layer in the current generation.

9.3 Google: platform integration

Gemini memory with import from other AI services: you can bring your ChatGPT memories into Gemini. Google Cloud's Vertex AI Agent Engine includes a Memory Bank as a managed service with sessions (short-term) and persistent memory bank (long-term). More infrastructure play than architectural innovation. The Siri deal (Gemini powering Apple's Siri overhaul in 2026) includes conversational memory and on-screen awareness as key features.

9.4 Anthropic: Claude's memory + sleep consolidation

Claude's built-in memory (what this conversation uses) works through user memory edits, conversation-derived memory, and project instructions.

The more interesting innovation: Claude Code's Auto Dream pipeline, a production sleep-consolidation system where the agent processes and consolidates memories during idle time. ZenBrain's paper cites this as independent validation of their neuroscience-inspired sleep consolidation approach. This is the only major provider shipping something that looks like the memory evolution/consolidation stage from the research taxonomy, rather than just storage and retrieval.

What Auto Dream actually does, in four phases mirroring sleep consolidation: converts relative dates to absolute ("Yesterday we decided to use Redis" becomes "On 2026-03-15 we decided to use Redis"); deletes contradicted facts (if you switched from Express to Fastify three weeks ago, the old "API uses Express" entry gets removed); merges overlapping entries (three separate notes about the same build-command quirk consolidate into one clean entry); and prunes stale low-signal notes. Triggers automatically after 24 hours and at least 5 sessions of new activity, or manually with /dream. Visible as "Auto-dream: on" in the Claude Code selector when active. Documented in the Claude API docs under managed agents.

9.5 AWS: infrastructure primitives, not a memory product

AWS does not have a standalone consumer-facing memory product. What it offers are infrastructure building blocks that developers assemble into memory architectures.

AgentCore Memory is the managed memory service inside Amazon Bedrock AgentCore:

Short-term memory: within a session. Conversation history, tool results.
Long-term memory: across sessions.
Built-in memory strategies: Semantic, User preference, Summary.
Memory branching: agents share a memory branch point (common context) but diverge into separate branches (agent-specific context).

Amazon Bedrock Knowledge Bases handles the full managed RAG pipeline (ingestion, embedding, storage in OpenSearch Serverless / Aurora pgvector / Pinecone / Redis, retrieval with citations).

Amazon Neptune (graph DB) can serve as a backend for graph-based memory; Mem0 supports Neptune as of late 2025; Graphiti's knowledge-graph architecture can run on top of Neptune.

10. How memory connects to agents

Memory is one of several capabilities an agent needs, alongside reasoning, planning, tool use, and perception. (For the full agents picture, see the Agents dossier.)

An agent without memory restarts from zero every task. Memory is what turns a stateless tool into something that improves over time.

Where memory frameworks plug into agent architectures. Agent frameworks (LangChain, CrewAI, AutoGen, Mastra) define the loop: perceive, reason, act, repeat. Memory plugs in at two points:

Before reasoning: retrieved memories get injected into the prompt alongside the user's message.
After acting: new information from the interaction gets extracted and stored for future retrieval.

Mem0, Zep, and Graphiti are "memory-as-a-service": they handle storage and retrieval while the agent framework handles everything else. Letta is different: it's both an agent framework and a memory system. The agent runs inside Letta, and memory management is part of the agent loop itself.

The RL frontier. Policy-learned memory management via reinforcement learning is where memory and agents become inseparable. Instead of hand-designing what gets remembered, the agent learns its own memory strategy by optimizing for task performance.

Multi-agent memory. When multiple agents collaborate, they need shared memory. Letta's Conversations API (Dec 2025) is the first production attempt. The field-theoretic paper's field coupling is the most novel research approach. Most multi-agent systems today use shared files or databases, not purpose-built shared memory architectures.

11. Resources and reading lists

Established framework links

Mem0: Paper (arXiv:2504.19413), Product, State of AI Agent Memory 2026
Zep / Graphiti: Paper, Product, Graphiti open-source
Letta (MemGPT): MemGPT concept, Product, GitHub

Frontier research papers

MAGMA: arXiv:2601.03236
ZenBrain: arXiv:2604.23878, Code
Field-theoretic memory: arXiv:2602.21220
Second Me: arXiv:2503.08102, Code
GSW: arXiv:2511.07587, Code
OCR-Memory: arXiv:2604.26622
MemMachine: arXiv:2604.04853

Survey papers

"Memory in the Age of AI Agents": arXiv:2512.13564; Paper list
"Memory for Autonomous LLM Agents": arXiv:2603.07670

Curated reading lists (actively updated)

VoltAgent/awesome-ai-agent-papers (2026 papers, weekly updates)
Shichun-Liu/Agent-Memory-Paper-List (1.9K stars, memory-specific)

LeCun and world models

Vendor memory docs

Microsoft: GitHub Copilot Memory blog, docs, Memory Poisoning research
AWS: AgentCore Memory, Bedrock Knowledge Bases