Skip to main content

Command Palette

Search for a command to run...

Why Your AI Agent's Memory Is Broken (And How to Fix It With SQLite)

Published
4 min read

Every developer who has built a "memory-enabled" chatbot knows the drill: chunk the conversation, generate embeddings, shove everything into Qdrant or Pinecone, fetch Top-K by cosine similarity. Done, right?

Wrong. And by the time your agent serves its 500th conversation, you will understand exactly why.

The Problem: Classic RAG Destroys Long-Lived Agents

Here is a concrete failure case from building a persistent local AI agent:

A user said: "I prefer Python." A week later: "I am writing in Rust now." Another week: "What language should I use for a CLI tool?"

The agent fetched both facts from the vector store with nearly identical cosine scores and delivered an answer that blended click with clap, argparse with structopt. Pure schizophrenia.

Vector databases do not know about time. They do not know that one fact supersedes another. They do not forget - ever. And that is a fundamental architectural mismatch for agents that need to maintain a coherent model of a person over weeks and months.

The real problems compound fast:

  • Context pollution - after a month, thousands of fragments compete for attention with equal weight
  • No conflict resolution - user worked at Company A, now works at Company B, agent sees both with equal confidence
  • No provenance - where did this fact come from? When? Can we trust it?
  • Zero forgetting - irrelevant info from 6 months ago competes with critical facts from yesterday

In 2026, with agents handling increasingly long-horizon tasks, this is not a minor UX issue. It is a fundamental reliability problem.

Why It Matters More Than Ever

AI agents in 2026 are not just answering questions - they are managing codebases, scheduling tasks, maintaining context across multi-day projects. The gap between a chatbot that remembers and an agent with genuine working memory is enormous.

The good news: you do not need a distributed vector database or a specialized graph DB service. You need SQLite and a solid architecture.

The Solution: Graph Cognitive Memory on a Single SQLite File

Instead of a flat vector store, implement a typed graph with four distinct node types, five edge types, full-text search, vector search, and the Ebbinghaus forgetting curve - all in one .db file.

The Node Schema

CREATE TABLE nodes (
    id          TEXT PRIMARY KEY,
    type        TEXT NOT NULL CHECK(type IN (
                    'episodic','semantic','procedural','opinion')),
    content     TEXT NOT NULL,
    embedding   BLOB,
    event_time  INTEGER NOT NULL,
    valid_from  INTEGER NOT NULL,
    valid_until INTEGER,
    confidence  REAL NOT NULL DEFAULT 1.0,
    decay_rate  REAL NOT NULL DEFAULT 0.1,
    session_id  TEXT
);

Hybrid Search: FTS5 + Vector + Graph

def reciprocal_rank_fusion(result_lists, k=60):
    scores = {}
    for results in result_lists:
        for rank, node_id in enumerate(results):
            scores[node_id] = scores.get(node_id, 0) + 1.0 / (k + rank + 1)
    return sorted(scores, key=scores.get, reverse=True)

combined = reciprocal_rank_fusion([fts_results, vector_results, graph_results])

The Write Path: 50ms Hot Path, No LLM

The critical principle: LLM on write, algorithms on read. The hot path must never block on an LLM call.

All writes go through a single-writer queue to avoid SQLite database is locked errors under concurrent async workloads. WAL mode is enabled. Reads use a separate connection with PRAGMA query_only=ON.

Forgetting: The Ebbinghaus Curve

import math

class DecayService:
    THRESHOLD = 0.05

    async def apply(self, storage):
        nodes = await storage.get_decayable_nodes()
        updates, to_prune = [], []

        for node in nodes:
            days_since = (now() - node["last_accessed"]) / 86400.0
            new_confidence = node["confidence"] * math.exp(
                -node["decay_rate"] * (max(0, days_since) ** 0.8)
            )
            if new_confidence < self.THRESHOLD:
                to_prune.append(node["id"])
            else:
                updates.append((node["id"], new_confidence))

        await storage.soft_delete_nodes(to_prune)
        return {"decayed": len(updates), "pruned": len(to_prune)}

The 0.8 exponent makes decay sub-exponential, matching how human memory actually fades. Episodic nodes never decay - they are the immutable audit trail.

Conflict Resolution in Practice

async def resolve_conflict(old_node_id, new_node_id):
    await storage.update_node_fields(old_node_id, {
        "confidence": 0.3,
        "decay_rate": 0.5
    })
    await storage.soft_delete_node(old_node_id)
    await storage.insert_edge({
        "source_id": new_node_id,
        "target_id": old_node_id,
        "relation_type": "supersedes",
        "confidence": 0.95
    })

Old fact is invisible to queries (WHERE valid_until IS NULL), new fact is authoritative.

Key Takeaways

  1. Vector databases alone are not memory - they are search indexes
  2. SQLite is enough - FTS5 + sqlite-vec + WAL mode in a single file
  3. Keep LLM off the hot path - 50ms writes, background consolidation
  4. Implement forgetting - Ebbinghaus curve prevents context bloat
  5. Type your edges - temporal, causal, supersedes, derived_from carry structural information no embedding can express

Full implementation on GitHub: github.com/VitalyOborin/yodoca

Built something similar? Check out gerus-lab.com

More from this blog