Persistent memory for AI agents. Store, recall, and forget — three API calls. +43% accuracy vs OpenAI Memory.
API
Everything your agent needs to remember, retrieve, and forget — in a clean, production-ready API.
Your agent learns something new? One call. MemDB extracts facts, deduplicates, and persists — so your agent never asks the same question twice.
Ask in plain English, get the right memories back. Hybrid vector + fulltext search, ranked by relevance and recency. Sub-50ms.
User asks to delete their data? One call. Remove a single memory, a topic, or everything. Instant, no reindex.
Benchmark
Measured on a standard QA recall benchmark. Agents using MemDB answer correctly 89.2% of the time vs 62.1% for OpenAI Memory.
| Provider | Accuracy | Latency |
|---|---|---|
| MemDB | 89.2% | 45ms |
| OpenAI Memory | 62.1% | 120ms |
| Mem0 | 71.4% | 85ms |
| Zep | 68.8% | 95ms |
| Raw RAG | 54.3% | 150ms |
Recall accuracy on MemQA-1K benchmark · 2026
Quickstart
Native SDKs for Python and Go. REST API for everything else. Ship in five minutes.
import memdb
client = memdb.Client(api_key="your-key")
# Store a memory
client.store(
agent_id="agent-123",
content="User prefers concise answers in bullet points",
tags=["preference", "style"]
)
# Recall relevant memories
memories = client.recall(
agent_id="agent-123",
query="how should I format my response?",
top_k=5
)
# Use in your prompt
context = "\n".join(m.content for m in memories)
# Forget a specific memory
client.forget(agent_id="agent-123", memory_id="mem_abc123")
How it works
pgvector + BM25 hybrid search, ONNX embeddings, sub-100ms retrieval at 10M+ memories per agent.
LLM extractor classifies memory type, extracts entities, resolves identity via HNSW cosine similarity, embeds with ONNX multilingual-e5-large (1024-dim), and persists to pgvector + Qdrant.
Dual index: pgvector HNSW (halfvec, 2× smaller) for vector search + tsvector GIN for fulltext. Redis VSET hot cache for sub-5ms dedup lookups on recent memories.
Hybrid search merges vector + fulltext via RRF. Temporal decay (exp, 180-day half-life) weights recency. Optional LLM reranker for top-K precision. MCP tool included.
MemDB handles chunking, embedding, hybrid search, deduplication, and memory lifecycle management out of the box. A raw vector DB requires you to build all of that yourself — and get it right. MemDB is the production-ready layer on top.
ONNX-optimized multilingual-e5-large (1024 dimensions, halfvec storage). Graph-optimized with O3 fusion for 300× speedup on ARM. Supports 100+ languages. VoyageAI fallback available.
Yes. MemDB is open-source. You can self-host with Docker Compose in under 5 minutes. The hosted API is for teams who want zero-ops. Both use the same protocol.
We measure recall accuracy on a standard QA dataset where agents must retrieve the correct memory to answer correctly. MemDB scores 89.2% vs 62.1% for OpenAI Memory — a 43.7% relative improvement.
Data is isolated by agent_id and API key. Memories are never used to train models. Self-hosted deployments keep all data on your infrastructure.
Median search latency is 45ms with HNSW (halfvec) indexing. Redis VSET hot cache brings dedup lookups to under 5ms. The LLM reranker adds ~200ms when enabled, but only runs on top-K candidates.
Private Beta
We're onboarding 50 teams in private beta. Join the waitlist and we'll reach out when your spot is ready.
No spam. We'll only email you when your spot is ready.