jatin.blog ~ $
$ cat ai-engineering/cross-session-identity.md

Cross-Session Identity and Personalization

Cross-session identity for LLM agents: user profiles, personas, the cold-start staircase, sensitivity-gated writes, and the deletion path.

Jatin Bansal@blog:~/ai-engineering$ open cross-session-identity

The agent has shipped to production. Every individual memory tier is wired up correctly. The conversation buffer trims, the working-memory scratchpad tracks the plan, the episodic store holds three months of past interactions, the semantic store has 312 distilled facts, the procedural store caches the deployment ritual. Each tier passes its eval. The product still feels off. Users say “it doesn’t remember me.” They are not wrong. The agent retrieves their preferences correctly when asked. It does not introduce itself knowing them. Session 47 opens the same way session 1 did — “Hi, how can I help?” — and the user has to drive the recall: ask the question, wait for the retrieval, see the answer that proves the system knows. The recall machinery works. The identity layer on top of it does not. That layer is what makes a session-N response feel like it’s coming from “the same agent that talked to me yesterday” rather than “a stateless API with a memory bolt-on.” This article is the deep dive on it.

Opening bridge

Yesterday’s procedural-memory article walked the fourth CoALA tier — the JIT-compiled-routine cache that turns repeated task re-derivation into retrieval. One of its closing failure modes was cross-user contamination: a procedure that worked for user A getting cached and retrieved for user B whose setup is completely different. The mitigation was “scope by namespace” or “scope by precondition” — both of which require the agent to know which user it’s talking to and to carry that scope through every read and write. That scoping mechanism — the user-identification layer on top of the memory stack — is what cross-session identity is. Today’s piece pulls the thread together: where the procedural article handled the scope as a parameter on the store, this one treats the user as a first-class object in the memory architecture, with its own lifecycle, its own retrieval semantics, and its own privacy boundary. Procedural memory was about caching the skills; cross-session identity is about caching the user.

Definition

Cross-session identity is the layer that maintains a stable, retrievable representation of a user (or other long-lived entity — a project, a workspace, a tenant) across the gaps between conversations, and that injects the right subset of that representation into the prompt at the start of every new session. Three properties distinguish it from the underlying memory tiers. First, the unit of identity is durable — it outlives any single session, agent instance, or conversation thread; deleting a session must not delete the user. Second, identity is bootstrapped at session-start, not on-demand mid-turn — the read happens once before the first response and is materialized into context, rather than retrieved lazily; this is the classic difference between a profile and a cache. Third, identity has its own write semantics — promotions to the profile are slower, more deliberate, and more privacy-sensitive than promotions to the episodic store; not every fact mentioned in conversation deserves to become identity.

The simplest mental model: identity is to the agent’s memory what /etc/passwd and the user’s home directory are to a Unix login. The system knows who you are before you start typing; your shell starts with your dotfiles loaded; your background services have your preferences applied. The session is bootstrapped with you, not toward you.

Intuition

The reason every multi-session agent eventually grows an identity layer — and the reason building it cleanly is harder than it looks — comes down to a clock-skew problem. Four clocks govern a long-running agent’s relationship with a user, and they tick at different rates:

  1. The session clock. Bounded — the conversation starts, runs for some number of turns, ends. The conversation buffer, the working-memory scratchpad, the in-session context all live on this clock. Their default lifetime is the session.
  2. The episode clock. Per-event — every meaningful interaction emits an episode. The episodic store lives on this clock; episode boundaries don’t respect session boundaries (one session can produce many episodes; one long task can span many sessions).
  3. The account clock. Multi-year — the user’s account exists from signup to deletion. The identity layer lives on this clock.
  4. The persona clock. Per-context — the same human can present as different personas depending on the workspace (“I’m on my work account today; treat me as a developer,” vs “I’m on my personal account; this is for hobby work”). The persona clock is a scoped sub-clock of the account clock.

The bugs almost all come from confusing these clocks. Storing a user preference on the session clock and being surprised when it doesn’t survive logout. Storing a one-off episode on the account clock and being surprised when it surfaces six months later as if it’s still relevant. Treating the persona as if it were the account. Pretending the account is the session. The distributed-systems parallel — and the deepest one in this article — is that identity is the durable, slow-moving anchor that lets all the other clocks be lossy without the user noticing.

The distributed systems parallel

The closest single analogue is the user-session cookie versus the user account record in a multi-tenant web service. Three properties of that analogue are load-bearing:

  • The cookie is short-lived; the account is long-lived; the cookie is the index into the account. Web services don’t put the user’s preferences in the cookie — the cookie carries an opaque user-ID and the preferences live in a database row. The cookie’s contents are bounded; the database row’s contents are not. The agent analogue: the in-context prompt carries a bounded identity block (typically 200-2000 tokens); the durable identity store can be unbounded. The block is the cookie; the store is the database row.
  • The login event is what materializes the cookie from the account. A user logging in triggers a database lookup, a session record creation, and a cookie set in the response. The agent analogue: the session-start event triggers a profile retrieval, an in-context-identity-block assembly, and a system-prompt render. The pattern is the same — durable storage, bounded materialization, explicit boot step.
  • The cookie is invalidated on logout or expiration; the account is deleted on a different (much slower) path. Logout drops the cookie but leaves the account intact; account deletion is a separate, audited operation. The agent analogue: ending a session drops the in-context identity block; the durable profile survives until an explicit deletion request, with its own audit trail. Conflating “I ended the chat” with “delete my data” is the same UX bug that confused early-2000s web auth systems and that confuses every naive AI-memory implementation that ships without an explicit deletion path.

The second-tier analogue is DNS resolution with caching. The user’s identity has a canonical record (the durable profile) and materialized snapshots (the in-context blocks) that can become stale. The TTL on the materialized snapshot is the session lifetime — refresh it on every session start. Same caching pyramid, same invalidation discipline.

Mechanics

A production cross-session identity layer has six load-bearing pieces. Each maps to a clock and a write/read path.

1. The identity record

The durable substrate. A typed row keyed by user_id (or tenant_id, workspace_id — whatever the agent’s primary scope is). The schema is opinionated and matters:

text
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
identity {
  user_id: str
  display_name: str
  preferences: {key: {value, confidence, source_episode_ids, last_updated, half_life_days}}
  facts: {key: {value, confidence, source_episode_ids, last_updated, sensitivity}}
  persona_active: str   # references one of personas[]
  personas: [{persona_id, scope, preferences_override, facts_override}]
  consents: {memory_enabled: bool, sharing_allowed: bool, retention_days: int}
  created_at, updated_at
  schema_version: int
}

Three things to notice. The schema is typed and bounded — preferences and facts are keyed dictionaries, not free-form text, because the retrieval path needs to know whether to fetch by key (preferences) or by similarity (episodes). The personas array allows the same human to present as multiple identities under one account, each with its own override slice; this is the production answer to “I’m on my work account today.” The consents block is first-class, not an afterthought — every read on the identity record has to check consents.memory_enabled before materializing anything into context, and the retention TTL governs the slow-eviction path. The schema_version field is what makes the identity record migratable; identity schemas evolve faster than most other persisted state because the product surface evolves.

2. The session-start materialization

The boot path. On session open — whether triggered by a new HTTP connection, a fresh conversation thread, or a cold-started agent instance — the harness runs a single read against the identity store and materializes a bounded in-context identity block. The structure mirrors Letta’s core-memory blocks but with a typed schema:

text
1
2
3
4
5
6
7
8
<identity>
You are talking to Maya Chen (she/her), a senior backend engineer at a startup
called Helix Analytics. She prefers concise technical answers, uses Python/Postgres
primarily, and is allergic to peanuts (medically relevant for any restaurant
recommendations). She has opted in to memory; her current persona is "work".
Last conversation: 2 days ago, about a Postgres slow-query issue. Open task:
debug the LATERAL JOIN regression.
</identity>

The block is small (typically 200-800 tokens), pinned to the top of the system prompt for primacy, and stable across the session — it does not change mid-conversation. The stability is what makes it cacheable in the prompt cache; the boundedness is what keeps it from competing with the rest of the prompt for attention. Both properties are deliberate engineering choices, not accidents.

3. The write path

The harder direction. Every turn produces candidate updates to the identity record — the user mentioned a new preference, corrected an old fact, declared a persona switch. Most of those candidates should not become identity-level writes. The memory write policies article’s four-stage pipeline (triage, extract, dedupe, persist) applies here, but with two identity-specific gates layered on top:

  • The durability gate. Identity writes need to outlive the immediate session. A user saying “I like dark mode” mid-debugging is probably a durable preference; a user saying “let’s pretend the database is named foo for this example” is not. The gate is a small classifier that asks “would this still be true next month?” and rejects ephemeral content.
  • The sensitivity gate. Some facts the agent can extract — medical conditions, financial details, political views — but shouldn’t persist to a long-lived profile without explicit user consent. The gate compares the candidate against a sensitivity taxonomy and routes high-sensitivity facts through a confirmation flow (“I’m about to remember that you have a pacemaker — should I keep that on your profile?”) before the write lands.

Identity writes are slower, more deliberate, and more confirmation-driven than episodic writes. The asymmetry is by design.

4. The cold-start staircase

The most-asked question about identity systems is what to do when there isn’t one yet. The memory-cognitive-taxonomy article named three starting points (seed from a knowledge base, seed from an existing profile, accept the cold-start cost). The production refinement is a staircase — the system has different cold-start strategies for different identity dimensions, ascending in confidence as the data arrives:

  1. Step 0 (anonymous): No identity. The agent operates against an in-memory pseudo-profile that lasts the session and is discarded on close. The user-facing experience is “I don’t know you yet; tell me what you need.”
  2. Step 1 (claimed): The user provides minimum-viable identity — name, role, primary tool stack — typically via an onboarding flow. The profile exists but is mostly empty. Retrieval still falls back to “I don’t know much about you yet” framings.
  3. Step 2 (bootstrapped): The agent has run a reflection pass over the first 2-5 sessions and distilled a handful of stable preferences. The identity block now has real signal but is small. Personalization is visible without being aggressive.
  4. Step 3 (matured): The agent has 20+ sessions of history; the identity block is reasonably stable; the personalization quality is high. The system can proactively reference past context (“last time we discussed this, you preferred X”).
  5. Step 4 (calibrated): The agent has explicit feedback (thumbs, edits, “no, I prefer X actually”) and has used it to recalibrate the personalization. The profile is now learned, not just observed.

The trap most cold-start designs fall into is jumping the staircase — using a Step-1 profile as if it were a Step-3 profile, leading to confident but wrong personalization. The mitigation is to render the identity block with explicit confidence markers in the early steps — “you may prefer X (low confidence based on 2 sessions)” — so the model knows to verify before relying on the personalization. Production systems that ship without confidence markers in the early-step identity blocks get the worse failure mode where the model confidently misrepresents the user back to themselves.

5. The persona switch

The persona clock. The same account can have multiple personas — work vs personal, team-lead vs IC, primary-language-English vs primary-language-Spanish. Each persona has a scoped override slice on top of the base identity. The switch is an explicit event: the agent observes a signal that the user has changed personas (an explicit “switch to my personal account,” a workspace boundary crossing, a tool that’s only available to one persona) and routes the rest of the session through the new persona’s slice. The naive implementation makes persona a free-form field; the production implementation makes it a typed enum with explicit boundaries, because cross-persona contamination is one of the highest-severity bugs in this layer — the work agent surfacing the user’s personal-account preferences in front of their team is a real production incident.

6. The privacy and deletion path

The slow-moving compliance layer. Every identity record needs three operations: export (the user can download everything stored about them), edit (the user can correct a wrong fact directly, not via a conversation), and delete (the user can request hard erasure). The first two are straightforward; the third is the deep one. As recent surveys of GDPR-compliant LLM memory deletion have documented, what constitutes deletion of a user from an LLM-backed system is genuinely contested — the durable profile and the episodic store can be hard-deleted, but content that has flowed into a fine-tuned model or a cached prefix is much harder to extricate. The full pipeline (seven steps including derived-artifact rebuilds and a verification query) lives in the memory privacy and multi-tenancy article; the identity-layer summary is:

  • Hard-delete the identity record and every episode in the user’s namespace.
  • Invalidate every cached prompt prefix that references the user.
  • Audit-log the deletion event itself for compliance demonstration.
  • Not claim that “all memory of you is gone” — be explicit that data that flowed into training cycles or backups may persist according to the data-retention policy.

The honesty is itself a product feature; users notice when a deletion claim is overstated.

Code: Python

A minimal but production-shaped identity layer in Python. Uses the Anthropic SDK for the model and Chroma for the episodic store, with a SQLite-backed identity record. Install: pip install anthropic chromadb.

python
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
"""
cross_session_identity.py

A minimal cross-session identity layer:
  - SQLite-backed identity records (durable, typed)
  - Chroma-backed episodic store (per-user namespace)
  - Session-start materialization into a bounded identity block
  - Durability-gated and sensitivity-gated write path
  - Hard-deletion path
"""

from __future__ import annotations
import json
import sqlite3
import time
import uuid
from dataclasses import dataclass, field, asdict
from typing import Any
import anthropic
import chromadb

# ---------------------------------------------------------------------------
# Schema
# ---------------------------------------------------------------------------


@dataclass
class IdentityRecord:
    user_id: str
    display_name: str
    persona_active: str = "default"
    preferences: dict[str, dict[str, Any]] = field(default_factory=dict)
    facts: dict[str, dict[str, Any]] = field(default_factory=dict)
    consents: dict[str, Any] = field(
        default_factory=lambda: {
            "memory_enabled": True,
            "sharing_allowed": False,
            "retention_days": 365,
        }
    )
    created_at: float = field(default_factory=time.time)
    updated_at: float = field(default_factory=time.time)
    schema_version: int = 1


SENSITIVE_KEYS = {"medical", "financial", "political", "religious", "sexuality"}


# ---------------------------------------------------------------------------
# Storage layer
# ---------------------------------------------------------------------------


class IdentityStore:
    """Durable identity records, keyed by user_id."""

    def __init__(self, db_path: str = "./identity.db"):
        self.conn = sqlite3.connect(db_path)
        self.conn.execute(
            """
            CREATE TABLE IF NOT EXISTS identity (
                user_id TEXT PRIMARY KEY,
                record_json TEXT NOT NULL,
                updated_at REAL NOT NULL
            )
            """
        )
        self.conn.commit()

    def upsert(self, record: IdentityRecord) -> None:
        record.updated_at = time.time()
        self.conn.execute(
            "INSERT OR REPLACE INTO identity (user_id, record_json, updated_at) VALUES (?, ?, ?)",
            (record.user_id, json.dumps(asdict(record)), record.updated_at),
        )
        self.conn.commit()

    def get(self, user_id: str) -> IdentityRecord | None:
        row = self.conn.execute(
            "SELECT record_json FROM identity WHERE user_id = ?", (user_id,)
        ).fetchone()
        if row is None:
            return None
        return IdentityRecord(**json.loads(row[0]))

    def hard_delete(self, user_id: str) -> None:
        """Compliance-grade deletion. Audits the event."""
        self.conn.execute(
            "INSERT INTO deletion_audit (user_id, deleted_at) VALUES (?, ?)",
            (user_id, time.time()),
        )
        self.conn.execute("DELETE FROM identity WHERE user_id = ?", (user_id,))
        self.conn.commit()

    def __init_audit_table(self) -> None:
        self.conn.execute(
            """
            CREATE TABLE IF NOT EXISTS deletion_audit (
                user_id TEXT NOT NULL,
                deleted_at REAL NOT NULL
            )
            """
        )
        self.conn.commit()


# ---------------------------------------------------------------------------
# Materialization
# ---------------------------------------------------------------------------


def materialize_identity_block(
    record: IdentityRecord, last_session_summary: str | None = None
) -> str:
    """Bounded identity block, pinned to the top of the system prompt.

    Renders confidence markers explicitly so the model knows when to verify
    rather than confidently misrepresent the user.
    """
    if not record.consents.get("memory_enabled", False):
        return "<identity>The user has opted out of memory. Treat each session as fresh.</identity>"

    persona = record.persona_active or "default"
    # Resolve persona override slice over the base record.
    prefs_view = dict(record.preferences)
    facts_view = dict(record.facts)

    lines = [
        f"You are talking to {record.display_name} (persona: {persona}).",
        f"Profile maturity: {_maturity_step(record)} (see staircase).",
    ]
    # Render only high-confidence, low-sensitivity items in the block.
    for k, v in prefs_view.items():
        if v["confidence"] >= 0.6 and v.get("sensitivity", "low") != "high":
            marker = "" if v["confidence"] >= 0.85 else " (medium confidence)"
            lines.append(f"- Preference: {k} = {v['value']}{marker}")
    for k, v in facts_view.items():
        if v["confidence"] >= 0.6 and v.get("sensitivity", "low") != "high":
            marker = "" if v["confidence"] >= 0.85 else " (medium confidence)"
            lines.append(f"- Fact: {k} = {v['value']}{marker}")
    if last_session_summary:
        lines.append(f"Last conversation: {last_session_summary}")

    body = "\n".join(lines)
    return f"<identity>\n{body}\n</identity>"


def _maturity_step(record: IdentityRecord) -> str:
    n = len(record.preferences) + len(record.facts)
    if n == 0:
        return "Step 0 (anonymous)"
    if n < 3:
        return "Step 1 (claimed)"
    if n < 10:
        return "Step 2 (bootstrapped)"
    if n < 30:
        return "Step 3 (matured)"
    return "Step 4 (calibrated)"


# ---------------------------------------------------------------------------
# Write path with durability + sensitivity gates
# ---------------------------------------------------------------------------


def candidate_to_identity(
    record: IdentityRecord,
    candidate: dict[str, Any],
    client: anthropic.Anthropic,
) -> bool:
    """Returns True if the candidate was promoted to the identity record.

    Gates the candidate through:
      1. Durability classifier  ("would this still be true next month?")
      2. Sensitivity classifier (high-sensitivity items require explicit consent)
      3. Dedup against existing keys (UPDATE vs ADD vs NOOP)
    """
    durable = _is_durable(candidate, client)
    if not durable:
        return False  # Ephemeral; route to episodic store instead.

    sensitivity = _sensitivity_of(candidate)
    if sensitivity == "high" and not candidate.get("explicit_consent"):
        return False  # Surfaces to the user as a confirmation prompt elsewhere.

    bucket = "preferences" if candidate["kind"] == "preference" else "facts"
    target = getattr(record, bucket)
    key = candidate["key"]

    if key in target and target[key]["value"] == candidate["value"]:
        # NOOP: re-confirmation, but boost the freshness clock.
        target[key]["last_updated"] = time.time()
        target[key]["confidence"] = min(1.0, target[key]["confidence"] + 0.05)
        return True

    target[key] = {
        "value": candidate["value"],
        "confidence": candidate.get("confidence", 0.7),
        "source_episode_ids": candidate.get("source_episode_ids", []),
        "last_updated": time.time(),
        "sensitivity": sensitivity,
        "half_life_days": candidate.get("half_life_days", 180),
    }
    return True


def _is_durable(candidate: dict[str, Any], client: anthropic.Anthropic) -> bool:
    """Cheap classifier: would this candidate still be true next month?"""
    resp = client.messages.create(
        model="claude-haiku-4-5",
        max_tokens=10,
        messages=[
            {
                "role": "user",
                "content": (
                    "Would the following fact about a user still likely be true "
                    "in one month? Answer only YES or NO.\n\n"
                    f"Fact: {candidate['key']} = {candidate['value']}"
                ),
            }
        ],
    )
    answer = resp.content[0].text.strip().upper()
    return answer.startswith("YES")


def _sensitivity_of(candidate: dict[str, Any]) -> str:
    """Rule-based sensitivity classifier. A real system uses a learned model."""
    key = candidate["key"].lower()
    if any(s in key for s in SENSITIVE_KEYS):
        return "high"
    if any(s in str(candidate["value"]).lower() for s in SENSITIVE_KEYS):
        return "high"
    return "low"


# ---------------------------------------------------------------------------
# Demo
# ---------------------------------------------------------------------------


def demo():
    client = anthropic.Anthropic()
    store = IdentityStore()
    store.__init_audit_table()

    # Step 0 -> Step 1: claimed identity from an onboarding signal.
    rec = IdentityRecord(user_id="u-maya", display_name="Maya Chen")
    store.upsert(rec)

    # Mid-session, a candidate update arrives from the extraction layer.
    candidates = [
        {"kind": "preference", "key": "answer_style", "value": "concise technical", "confidence": 0.9},
        {"kind": "fact", "key": "primary_stack", "value": "Python/Postgres", "confidence": 0.85},
        {"kind": "fact", "key": "medical_allergy", "value": "peanuts", "confidence": 0.95, "explicit_consent": True},
        # Ephemeral: should be rejected by the durability gate.
        {"kind": "fact", "key": "current_focus", "value": "this one PR", "confidence": 0.6},
    ]
    for c in candidates:
        promoted = candidate_to_identity(rec, c, client)
        print(f"{c['key']:>20s} promoted={promoted}")

    store.upsert(rec)

    # Session start: materialize the block.
    block = materialize_identity_block(
        rec, last_session_summary="2 days ago, debugged a Postgres LATERAL JOIN regression."
    )
    print("\n=== Identity block ===")
    print(block)

    # Demonstrate the deletion path.
    # store.hard_delete("u-maya")


if __name__ == "__main__":
    demo()

The shape that matters: durable identity record, bounded session-start materialization, gated writes, explicit deletion path. Everything else is implementation detail.

Code: TypeScript

Functionally equivalent in TypeScript using the Anthropic SDK. Install: npm install @anthropic-ai/sdk better-sqlite3.

typescript
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
// cross_session_identity.ts
import Anthropic from "@anthropic-ai/sdk";
import Database from "better-sqlite3";

interface PreferenceEntry {
  value: string;
  confidence: number;
  sourceEpisodeIds: string[];
  lastUpdated: number;
  sensitivity: "low" | "high";
  halfLifeDays: number;
}

interface IdentityRecord {
  userId: string;
  displayName: string;
  personaActive: string;
  preferences: Record<string, PreferenceEntry>;
  facts: Record<string, PreferenceEntry>;
  consents: { memoryEnabled: boolean; sharingAllowed: boolean; retentionDays: number };
  createdAt: number;
  updatedAt: number;
  schemaVersion: number;
}

const SENSITIVE_KEYS = ["medical", "financial", "political", "religious", "sexuality"];

class IdentityStore {
  private db: Database.Database;
  constructor(dbPath = "./identity.db") {
    this.db = new Database(dbPath);
    this.db.exec(`
      CREATE TABLE IF NOT EXISTS identity (
        user_id TEXT PRIMARY KEY,
        record_json TEXT NOT NULL,
        updated_at REAL NOT NULL
      );
      CREATE TABLE IF NOT EXISTS deletion_audit (
        user_id TEXT NOT NULL,
        deleted_at REAL NOT NULL
      );
    `);
  }

  upsert(rec: IdentityRecord): void {
    rec.updatedAt = Date.now() / 1000;
    this.db.prepare(
      "INSERT OR REPLACE INTO identity (user_id, record_json, updated_at) VALUES (?, ?, ?)"
    ).run(rec.userId, JSON.stringify(rec), rec.updatedAt);
  }

  get(userId: string): IdentityRecord | null {
    const row = this.db
      .prepare("SELECT record_json FROM identity WHERE user_id = ?")
      .get(userId) as { record_json: string } | undefined;
    return row ? (JSON.parse(row.record_json) as IdentityRecord) : null;
  }

  hardDelete(userId: string): void {
    this.db.prepare("INSERT INTO deletion_audit (user_id, deleted_at) VALUES (?, ?)").run(
      userId,
      Date.now() / 1000
    );
    this.db.prepare("DELETE FROM identity WHERE user_id = ?").run(userId);
  }
}

function maturityStep(rec: IdentityRecord): string {
  const n = Object.keys(rec.preferences).length + Object.keys(rec.facts).length;
  if (n === 0) return "Step 0 (anonymous)";
  if (n < 3) return "Step 1 (claimed)";
  if (n < 10) return "Step 2 (bootstrapped)";
  if (n < 30) return "Step 3 (matured)";
  return "Step 4 (calibrated)";
}

function materializeIdentityBlock(
  rec: IdentityRecord,
  lastSessionSummary?: string
): string {
  if (!rec.consents.memoryEnabled) {
    return "<identity>The user has opted out of memory. Treat each session as fresh.</identity>";
  }
  const lines: string[] = [
    `You are talking to ${rec.displayName} (persona: ${rec.personaActive}).`,
    `Profile maturity: ${maturityStep(rec)} (see staircase).`,
  ];
  for (const [k, v] of Object.entries(rec.preferences)) {
    if (v.confidence >= 0.6 && v.sensitivity !== "high") {
      const marker = v.confidence >= 0.85 ? "" : " (medium confidence)";
      lines.push(`- Preference: ${k} = ${v.value}${marker}`);
    }
  }
  for (const [k, v] of Object.entries(rec.facts)) {
    if (v.confidence >= 0.6 && v.sensitivity !== "high") {
      const marker = v.confidence >= 0.85 ? "" : " (medium confidence)";
      lines.push(`- Fact: ${k} = ${v.value}${marker}`);
    }
  }
  if (lastSessionSummary) {
    lines.push(`Last conversation: ${lastSessionSummary}`);
  }
  return `<identity>\n${lines.join("\n")}\n</identity>`;
}

async function isDurable(
  client: Anthropic,
  key: string,
  value: string
): Promise<boolean> {
  const resp = await client.messages.create({
    model: "claude-haiku-4-5",
    max_tokens: 10,
    messages: [
      {
        role: "user",
        content:
          "Would the following fact about a user still likely be true in one month? Answer only YES or NO.\n\n" +
          `Fact: ${key} = ${value}`,
      },
    ],
  });
  const block = resp.content[0];
  if (block.type !== "text") return false;
  return block.text.trim().toUpperCase().startsWith("YES");
}

function sensitivityOf(key: string, value: string): "low" | "high" {
  const lower = (key + " " + value).toLowerCase();
  return SENSITIVE_KEYS.some((s) => lower.includes(s)) ? "high" : "low";
}

interface Candidate {
  kind: "preference" | "fact";
  key: string;
  value: string;
  confidence?: number;
  explicitConsent?: boolean;
  sourceEpisodeIds?: string[];
}

async function candidateToIdentity(
  rec: IdentityRecord,
  cand: Candidate,
  client: Anthropic
): Promise<boolean> {
  if (!(await isDurable(client, cand.key, cand.value))) return false;
  const sensitivity = sensitivityOf(cand.key, cand.value);
  if (sensitivity === "high" && !cand.explicitConsent) return false;

  const bucket = cand.kind === "preference" ? rec.preferences : rec.facts;
  if (bucket[cand.key] && bucket[cand.key].value === cand.value) {
    bucket[cand.key].lastUpdated = Date.now() / 1000;
    bucket[cand.key].confidence = Math.min(1.0, bucket[cand.key].confidence + 0.05);
    return true;
  }

  bucket[cand.key] = {
    value: cand.value,
    confidence: cand.confidence ?? 0.7,
    sourceEpisodeIds: cand.sourceEpisodeIds ?? [],
    lastUpdated: Date.now() / 1000,
    sensitivity,
    halfLifeDays: 180,
  };
  return true;
}

async function demo() {
  const client = new Anthropic();
  const store = new IdentityStore();

  const rec: IdentityRecord = {
    userId: "u-maya",
    displayName: "Maya Chen",
    personaActive: "default",
    preferences: {},
    facts: {},
    consents: { memoryEnabled: true, sharingAllowed: false, retentionDays: 365 },
    createdAt: Date.now() / 1000,
    updatedAt: Date.now() / 1000,
    schemaVersion: 1,
  };

  const candidates: Candidate[] = [
    { kind: "preference", key: "answer_style", value: "concise technical", confidence: 0.9 },
    { kind: "fact", key: "primary_stack", value: "Python/Postgres", confidence: 0.85 },
    {
      kind: "fact",
      key: "medical_allergy",
      value: "peanuts",
      confidence: 0.95,
      explicitConsent: true,
    },
    { kind: "fact", key: "current_focus", value: "this one PR", confidence: 0.6 },
  ];

  for (const c of candidates) {
    const ok = await candidateToIdentity(rec, c, client);
    console.log(`${c.key.padStart(20)} promoted=${ok}`);
  }
  store.upsert(rec);

  console.log("\n=== Identity block ===");
  console.log(
    materializeIdentityBlock(
      rec,
      "2 days ago, debugged a Postgres LATERAL JOIN regression."
    )
  );
}

demo();

The TypeScript version is schema-equivalent — same identity record, same maturity staircase, same durability and sensitivity gates, same deletion semantics. A production system swaps better-sqlite3 for Postgres (or Mem0’s API, or Letta’s core_memory blocks) and the in-memory candidate flow for an extraction pipeline that runs over real conversation turns; the architectural shape is what carries.

Trade-offs, failure modes, and gotchas

The “personalization is creepy” trap. Personalization that surfaces too aggressively in early steps of the cold-start staircase (“As I remember, you mentioned last week that…”) on session 2, when the system only saw the user once, reads as surveillance rather than service. The mitigation is the maturity-step rendering from the snippet — early-step blocks are factual about what the system knows (“you mentioned X once”) rather than familiar (“as you usually prefer”). The familiarity tone earns its place at Step 3+; before that it’s a confidence-on-stilts UX failure mode. The empirical sign is users disabling memory after the third session.

The “stale identity beats accurate retrieval” failure mode. The identity block at the top of the system prompt is cached and not refreshed mid-session. If the user updates a preference mid-conversation (“actually, I switched to Rust six months ago, please update that”), the identity block in the current session is now out of date. The naive fix is to re-materialize the block on every turn — which defeats the prompt cache and burns latency. The better fix is to write the update to the durable record (it’ll be reflected in the next session’s block) but also inject a delta into the current session’s context — typically as a system-prompt suffix that says “(Note: during this session, the user updated primary_stack to Rust)”. The block stays cached; the delta carries the mid-session correction; the next session reads the unified record. This is the same shape as a delta-coded segment in a write-ahead log layered over an immutable snapshot.

The persona-leak production incident. The user is on their work account, hears a Slack ping, switches to their personal account, comes back to the work conversation and asks a question — except the harness silently latched the personal persona’s identity block into the work conversation’s cache. The model now sees “Personal Maya”’s preferences in a work context, surfaces them, and the team in the channel sees something that should have been private. The mitigation has three parts. First, persona switches are explicit and audited — the harness emits a log line on every switch, not silent state mutation. Second, the identity block is scoped to the persona, not just the user — the cache key includes the persona ID and the same user-in-different-personas gets different cached blocks. Third, the switch invalidates the in-context block — the next turn after a switch re-materializes with the new persona’s slice. Production systems that skip the third step ship the cross-persona leak.

The confidence-marker-ignored-by-the-model failure. The identity block includes “(medium confidence based on 2 sessions)” markers, the system prompt asks the model to verify before relying on them, and the model ignores the markers and answers as if the preference were ground truth. The empirical fix is to put the verification instruction at the top of the system prompt, not inside the identity block — instructions buried inside the identity content get treated as identity, not as policy. A short policy preamble (“If any identity item is marked medium-confidence, ask the user to confirm before relying on it”) at the head of the prompt does the work the inline marker alone cannot. The prompt-caching article notes the same shape — policy goes high, content goes lower; this is the identity-specific instance of that pattern.

The “every user is a unique snowflake” cost problem. A multi-tenant agent system with N users has N identity records, N session-start materializations per user-day, and N per-user prompt caches. At scale, this dominates the cost curve. The mitigation is identity sharding by stable preference cluster — users with very similar profile slices share a cached base persona prompt and only the per-user delta gets injected. The base prompt is cache-warm across users; the delta is small and per-call. This is the same shape as a Linux kernel sharing read-only pages between processes (COW) — the duplicate state shares physical pages, the unique state gets per-process. Production systems that implement this on top of an Anthropic-prompt-cache-aware infrastructure see meaningful cost reductions vs. naive per-user-fully-unique prompts.

The deletion-as-supersession bug from the procedural-memory article, generalized. When a user says “stop remembering anything about my job” (a partial-scope deletion), the naive implementation treats this as a contradiction (writing a new fact “user requested job-related deletion” alongside the existing job-related facts). The right answer is a scoped hard-deletion — match the deletion intent against a key prefix or category, delete every matching entry in the identity record and every linked episode, and audit-log the operation. Conflating partial-scope deletion with contradiction is the same family of bug as the user-asserted-deletion-versus-supersession ambiguity that bit the conflict-resolver design; the identity layer is where it hits hardest because identity content is user-facing in a way that raw episodes are not.

The “did I migrate everyone?” silent schema drift. Identity schemas change — a new field for pronouns lands, the consents block grows a model_routing_allowed boolean, the maturity step taxonomy splits. The naive migration runs an online ALTER and updates the application code to read the new field. Users created before the migration have records without the new field; users created after have records with it. The retrieval path either has to check for missing fields everywhere (cluttered code, fragile to skip) or run a backfill (operational cost). The mitigation is the schema_version field from the schema — every read checks the version, runs a lazy upgrade on access, and writes the upgraded record back. The Postgres-tables-with-migration-versions parallel is direct; an identity layer without this discipline accumulates inconsistency that surfaces six months later as “why does the new feature work for some users and not others?”

The over-precision-in-the-block trap. Identity blocks rendered with full source episode IDs, exact timestamps, confidence scores to three decimals, and complete persona metadata are machine-friendly and model-confusing. The model gets the right information but spends attention parsing it, and the rest of the prompt suffers. The mitigation is two-tier rendering — the model sees a natural-language identity block (“Maya, senior backend engineer, prefers Python and Postgres, allergic to peanuts”); the harness keeps the structured representation for read/write/audit/export operations and for the model only when it explicitly asks (“what do you actually know about me?”). The pattern is the same as a database query result being formatted differently for a CLI tool versus for a Grafana dashboard — same data, different rendering depending on the consumer.

The session-summary-loop pollution. The “last session summary” line in the identity block (“Last conversation: 2 days ago, debugged a Postgres slow-query issue”) is generated by a summarizer over the prior session. If the summarizer is run on the conversation that already includes the previous session’s summary, the summaries chain — each session’s summary is a summary-of-summaries. After 20 sessions, the summary line is unrecognizable. The mitigation is to generate the summary from the raw episodes, not from the previous summary — same input, fresh distillation each time. The context-compression article worked the chained-summary failure mode in detail; the identity layer’s session-summary line is a specific instance of that anti-pattern.

The right-to-be-forgotten that wasn’t. The user requests deletion. The harness hard-deletes the identity record and the episode store. But cached prompt prefixes still reference the user; the next request that hits the cache returns content based on the now-deleted user. Worse, a fine-tuned variant of the model that was trained on user data still surfaces it on certain queries. The honest mitigation has three layers. First, invalidate every cache on deletion (explicit cache invalidation API, audit-logged). Second, segment the user from the training data that flows into any future fine-tune — deletion before next training cycle is the boundary; user data already in a deployed fine-tune is much harder. Third, be transparent about what deletion covers — the product surface should say “your durable profile and episodes have been hard-deleted; data already used in training cycles or backups may persist according to our retention policy.” The honesty is itself the engineering deliverable.

The “personalization works in eval, fails in production” failure mode. Identity layers are typically evaluated against synthetic profile-injection tests (“given this profile, does the model surface the right preference?”) which the system passes easily — the model can read the prompt. The real failure mode is proactive personalization — surfacing the preference without being asked, at the right moment, with the right framing. Synthetic evals don’t measure this; it requires longitudinal user-experience evals (does the user say “you remembered!” or “stop assuming things about me!”). The benchmark literature is starting to catch up here — recent work on the State of AI Agent Memory in 2026 reports that proactive-personalization accuracy lags reactive-personalization accuracy by a noticeable margin even in the best frameworks. Plan for the gap by evaluating the proactive case explicitly, not as a corollary of the reactive case.

Further reading

  • The State of AI Agent Memory 2026 — Mem0 — the cleanest published-recently survey of where production agent-memory systems are with personalization, including the proactive-vs-reactive accuracy gap and the per-user scoping patterns. The §3 architectural-patterns section and the §5 benchmarks section are the must-reads for grounding the identity-layer design against current production evidence.
  • Letta — Stateful Agents and Core Memory Blocks — the production-framework reference for the persona and human core-memory blocks that this article’s identity layer is structurally similar to. The cleanest available docs for how a real agent runtime exposes identity as a typed, bounded, agent-editable surface, including the tool-call protocol for the agent to self-update the block.
  • Mem0 Documentation — Memory Scoping (User, Session, Agent) — the per-user/per-session/per-agent scoping model that productionizes the tenant-isolation discipline the rest of this article keeps invoking. The required-parameter pattern (the user_id argument is structural, not optional) is the safer-by-default approach the snippet borrowed.
  • Right to Be Forgotten in AI: Navigating LLM Training Data — 2026 industry guidance — the most current-as-of-publishing summary of GDPR-style deletion semantics applied to LLM-backed systems, including the boundary between profile/episodic hard-deletion (tractable) and training-data unlearning (much harder). The §3 layered-remediation discussion is the right reference for designing an identity-layer deletion path that’s honest about what it can and can’t guarantee.
  • Memory and new controls for ChatGPT — OpenAI — the production reference for one of the highest-volume identity layers in deployment. The user-facing controls (memory on/off, individual-fact edits, full export, temporary-chat opt-out) are the de-facto baseline for what consumer-grade agent identity needs to expose, and the architecture summary is the closest thing to a publicly-documented reference implementation at scale.
  • Long-Term Memory: Vector-Backed Episodic Storage — the substrate the identity layer reads against. Identity records and episodic storage are companion structures: identity is the small, durable, keyed record; episodes are the large, time-stamped, similarity-retrieved corpus. The two together cover the cross-session continuity story.
  • Hierarchical Memory: Working / Episodic / Semantic Tiers — the architectural frame the identity block fits into. The identity block is the canonical content of the hot tier (the always-in-context portion); the staircase, the persona switch, and the cache-aware materialization are all hot-tier engineering concerns.
  • Memory Privacy, Isolation, and Multi-Tenancy — the next piece, and the structural generalization of this article’s deletion path. Where this piece worked persona switches and per-user opt-out, the privacy article works per-tenant boundaries, verifiable GDPR deletion with derived-artifact rebuilds, and the MINJA/MEXTRA memory-injection attack model that any production identity layer has to assume against itself.
  • Memory Conflict, Forgetting, and Embedding Drift — the read/write conflict patterns this article’s update path inherits. The contradiction resolver, the verification-boost mechanic, and the user-asserted-deletion-versus-supersession classifier are all directly applicable to identity writes; the difference is that identity contradictions are higher-stakes because they’re more visible to the user.