AI Memory API — Private Beta

Your AI agent forgets everything.
MemDB fixes that.

Persistent memory for AI agents. Store, recall, and forget — three API calls. +43% accuracy vs OpenAI Memory.

API

Three calls. Full memory.

Everything your agent needs to remember, retrieve, and forget — in a clean, production-ready API.

Store

Your agent learns something new? One call. MemDB extracts facts, deduplicates, and persists — so your agent never asks the same question twice.

client.add("User prefers bullet points over paragraphs")

Search

Ask in plain English, get the right memories back. Hybrid vector + fulltext search, ranked by relevance and recency. Sub-50ms.

results = client.search("How does this user like responses?")

Forget

User asks to delete their data? One call. Remove a single memory, a topic, or everything. Instant, no reindex.

client.delete(memory_ids=["mem_abc", "mem_def"])

Benchmark

+43% accuracy.
3× faster recall.

Measured on a standard QA recall benchmark. Agents using MemDB answer correctly 89.2% of the time vs 62.1% for OpenAI Memory.

+43.70%
accuracy gain vs OpenAI Memory
35.24%
fewer tokens in context window
Provider Accuracy Latency
MemDB 89.2% 45ms
OpenAI Memory 62.1% 120ms
Mem0 71.4% 85ms
Zep 68.8% 95ms
Raw RAG 54.3% 150ms

Recall accuracy on MemQA-1K benchmark · 2026

Quickstart

Python, Go, or curl. Pick one.

Native SDKs for Python and Go. REST API for everything else. Ship in five minutes.

Python · example.python
import memdb

client = memdb.Client(api_key="your-key")

# Store a memory
client.store(
    agent_id="agent-123",
    content="User prefers concise answers in bullet points",
    tags=["preference", "style"]
)

# Recall relevant memories
memories = client.recall(
    agent_id="agent-123",
    query="how should I format my response?",
    top_k=5
)

# Use in your prompt
context = "\n".join(m.content for m in memories)

# Forget a specific memory
client.forget(agent_id="agent-123", memory_id="mem_abc123")

How it works

Built for production from day one.

pgvector + BM25 hybrid search, ONNX embeddings, sub-100ms retrieval at 10M+ memories per agent.

01

Ingest

LLM extractor classifies memory type, extracts entities, resolves identity via HNSW cosine similarity, embeds with ONNX multilingual-e5-large (1024-dim), and persists to pgvector + Qdrant.

02

Index

Dual index: pgvector HNSW (halfvec, 2× smaller) for vector search + tsvector GIN for fulltext. Redis VSET hot cache for sub-5ms dedup lookups on recent memories.

03

Retrieve

Hybrid search merges vector + fulltext via RRF. Temporal decay (exp, 180-day half-life) weights recency. Optional LLM reranker for top-K precision. MCP tool included.

pgvector HNSW tsvector + Vector hybrid multilingual-e5-large PostgreSQL 17 + Apache AGE ONNX Runtime Qdrant Redis VSET LLM Reranker

FAQ

Questions answered.

Still curious? Open an issue on GitHub.

How is MemDB different from just using a vector database?

MemDB handles chunking, embedding, hybrid search, deduplication, and memory lifecycle management out of the box. A raw vector DB requires you to build all of that yourself — and get it right. MemDB is the production-ready layer on top.

What embedding model do you use?

ONNX-optimized multilingual-e5-large (1024 dimensions, halfvec storage). Graph-optimized with O3 fusion for 300× speedup on ARM. Supports 100+ languages. VoyageAI fallback available.

Can I run MemDB on my own infrastructure?

Yes. MemDB is open-source. You can self-host with Docker Compose in under 5 minutes. The hosted API is for teams who want zero-ops. Both use the same protocol.

How does the accuracy benchmark work?

We measure recall accuracy on a standard QA dataset where agents must retrieve the correct memory to answer correctly. MemDB scores 89.2% vs 62.1% for OpenAI Memory — a 43.7% relative improvement.

Is my data private?

Data is isolated by agent_id and API key. Memories are never used to train models. Self-hosted deployments keep all data on your infrastructure.

What's the latency at scale?

Median search latency is 45ms with HNSW (halfvec) indexing. Redis VSET hot cache brings dedup lookups to under 5ms. The LLM reranker adds ~200ms when enabled, but only runs on top-K candidates.

Private Beta

Be first to ship with MemDB.

We're onboarding 50 teams in private beta. Join the waitlist and we'll reach out when your spot is ready.

No spam. We'll only email you when your spot is ready.

50
beta spots
89.2%
recall accuracy
45ms
median latency