Memory · the long version

It remembers.
On purpose.

A vector‑store backed memory layer the model writes to on its own. A frozen profile that survives prompt‑cache. Facts and notes scored by semantic relevance with recency decay. Auto‑sweep session summaries. Pin what matters; archive what doesn't. Recall is one tool call away.

Catalyst memory manager: stored facts keyed, ranked, with pin and archive controls

The actual memory manager — every fact keyed, ranked, and recallable, with pin and archive controls. No black box.

Three kinds of memory

Profile, facts, notes.

One unified LanceDB table, three semantic kinds. The model picks the right one when it calls update_memory. You curate from the UI.

profile

~2000 chars. Frozen at session start.

Who you are, what you do, how you like things explained. Baked into the system prompt at session start so prompt‑cache stays warm. The model edits it via tool — mid‑session writes don't destabilize the cache because the system prompt doesn't re‑render live.

shape short narrative + bullets writes tool · model‑initiated limit 2000 chars · auto self‑prune
facts

Atomic, recallable, ranked.

"Our CRO prioritizes net retention through FY26." "We deploy on Cloudflare Pages." One claim per row. Vector‑recallable when relevant. Semantic dedup merges duplicates; recency decay (30‑day half‑life) pushes stale ones down without deleting them.

recall ≥50% relevance threshold decay 30‑day half‑life curate pin · archive · merge
notes

Session summaries, auto‑swept.

After 10+ minutes of session inactivity, sessions with ≥3 messages get summarized in the background and saved as notes. No manual "save this session" step. The conversation becomes recallable without your intervention.

trigger 10 min idle min size 3 messages runs background sweep
Catalyst recalling the user's profile and facts in a brand-new chat via recall_memory

A brand‑new chat — the profile is already in the system prompt, and the model pulls the rest with recall_memory. You stop re‑briefing it.

The mechanics

Vector recall, opinionated.

Vector stores tend to return everything ranked. That's a quality problem in chat — the model gets 30 dusty rows and burns context. Ours is opinionated.

relevance gate

≥50% or it's not returned.

The recall tool drops anything under the threshold before it reaches the model. No noise, no "loosely related" injection that bloats the prompt and confuses the answer.

recency decay

30‑day half‑life on last reference.

Every time a fact gets surfaced or referenced, its last_referenced timestamp updates. Older ones rank lower. Yesterday's note beats a six‑month‑old one with identical similarity.

pin overrides

Pinned entries always surface.

For the handful of facts you never want to lose — the org chart, the brand guidelines, the "always use postgres‑replica for reads" rule — pin them. Recall returns them regardless of score.

archive ≠ delete

Quiet without losing it.

Archive an entry and it stays in the database but stops appearing in recall results. Un‑archive when relevant again. Nothing's gone; nothing's noisy.

semantic dedup

Duplicates merge automatically.

The background curator runs semantic comparison across facts; near‑duplicates merge with the canonical row keeping the better wording and the union of sources.

force self‑prune

Profile guards its own size.

When the 2000‑char profile fills up, the model is asked to prune itself before adding new content — keeping the highest‑signal claims and discarding noise. The cache stays warm; the profile stays sharp.

local · zero egress

Your memory stays yours.

LanceDB on disk (or wherever you point it). Sentence‑transformer embeddings computed locally. The recall tool returns rows to the model in your tenant — they never leave the deployment. If you self‑host, no third party sees what your AI remembers about you.

  • One table, three kinds. memory_entries with kind: fact / memory / note. Unified queries, unified UI.
  • Profile is in the system prompt. Bakes in at session start. Cache‑friendly. Mid‑session writes update the row but don't re‑render the prompt.
  • Curator runs in the background. Dedupes, archives stale entries (90+ days unreferenced), forces self‑prune on profile overflow.
  • Pinned + archived state visible in UI. No magic. You see what the model sees.
memory.recall("Q4 board priorities")
fact · pinned1.00

CRO prioritizes net retention over new logo growth through FY26.

note · session #2140.88

Board agreed to defer mid‑market hiring until pipeline coverage exceeds 3.2×.

fact · 14d ago0.81

Enterprise expansion ARR concentrated in top 14 accounts; renewals Q1–Q2.

fact · 6mo ago0.43

(below 50% threshold — not returned)

Stop re‑explaining yourself

Tell it once.
It remembers forever.

Every session inherits the profile. Every important fact is recallable. Nothing leaves your tenant. The model gets sharper; you stop repeating yourself.