diff --git a/docs/rag-setup-plan.md b/docs/rag-setup-plan.md
new file mode 100644
index 0000000..0481ad5
--- /dev/null
+++ b/docs/rag-setup-plan.md
@@ -0,0 +1,1223 @@
+# RAG Setup Plan — Cross-project reference
+
+> **Mục đích:** Plan setup Hybrid RAG (Option A) cho project có MD context > 1M tokens. Cross-project applicable — SOLUTION_ERP làm baseline reference, future 2 dự án bro apply pattern này.
+> **Last updated:** 2026-05-12 (Session 21 turn 1+)
+> **Status:** 📝 Plan saved — chưa implement, target Week 1-4 trial 2 dự án future
+> **Owner:** pqhuy1987@gmail.com + Claude (em main + 4 sub-agents)
+
+---
+
+## 📋 Table of Contents
+
+1. [Context + Why](#1-context--why)
+2. [Architecture overview](#2-architecture-overview)
+3. [BLANKET load list (~100K tokens, 28%)](#3-blanket-load-list)
+4. [RAG store list (~254K tokens, 72%)](#4-rag-store-list)
+5. [Tool stack recommend](#5-tool-stack-recommend)
+6. [Setup scripts (copy-paste ready)](#6-setup-scripts)
+7. [Audit procedure (3-tier cadence)](#7-audit-procedure)
+8. [Multi-AI client access](#8-multi-ai-client-access)
+9. [Timeline rollout (~10-14h dedicated)](#9-timeline-rollout)
+10. [Caveats + risks](#10-caveats--risks)
+11. [Success metrics + decision gate](#11-success-metrics)
+12. [Future enhancements](#12-future-enhancements)
+
+---
+
+## 1. Context + Why
+
+### Problem statement
+
+```
+Hiện tại lazy blanket pattern (em main + 4 agents):
+  - Em main vác ~120K MD upfront (35% project)
+  - Lazy Read khi cần — em main TỰ ĐOÁN file relevant
+  - 4 agents mỗi spawn ~188K cache WRITE
+  - Heavy session ~700K effective billed
+  - Lost-in-middle threshold đạt sau ~5.75h productive
+  
+Scale-up to 2 projects > 1M MD tokens each:
+  ❌ Blanket KHÔNG khả thi (vượt 1M context cap)
+  ❌ Lazy Read recall ~30-60% (em main miss file không nghĩ tới)
+  ❌ 4 agents duplicate Read same files (cumulative ~240K wasted)
+  ❌ Vietnamese-English synonym miss (grep keyword only)
+  ❌ Cross-project context impossible without manual switching
+```
+
+### Solution
+
+**Hybrid RAG Option A** — blanket critical + retrieve on-demand:
+
+```
+KEEP blanket: ~100K static (core stable + current state + agent + skills + memory critical)
+ADD RAG layer: 70% MD remaining accessible via semantic retrieve
+SHARE cache: 4 agents reuse retrieved chunks (multi-agent leverage)
+```
+
+### Benefits chốt từ analysis sessions trước
+
+| Metric | Lazy current | Option A | Δ |
+|---|---|---|---|
+| Quality recall | 30-60% | **85%** | **+25-55pp** |
+| Heavy session token | 700K | **560K** | -20% |
+| Session productive hours | 5.75h | **7.6h** | **+1.85h** |
+| Tasks before lost-in-middle | ~23 | **~38** | **+65%** |
+| Net successful tasks/session | 25 | **50** | **2×** |
+| Multi-agent shared cache | ❌ | **✅ 60-90% cache hit** | leverage real |
+| Việt-Anh semantic search | ❌ grep only | **✅ Voyage multilingual** | unlock |
+| Scale > 1M MD | ❌ break | **✅ work** | **enable** |
+
+### Trade-off
+
+- ⚠️ Setup cost: ~10-14h dedicated session (1 lần invest)
+- ⚠️ Maintenance: ~30 phút/tuần audit
+- ⚠️ Beta features (Memory tool, Files API): có thể breaking change
+- ⚠️ Retrieval miss risk ~5-10% (mitigated bằng citations + fallback Read)
+- ⚠️ Voyage API cost: ~$0.36 initial embed + ~$0.20/tháng delta
+
+---
+
+## 2. Architecture overview
+
+```
+┌─────────────────────────────────────────────────────────────┐
+│ LAYER 1 — Static blanket (cache hot, 5min-1h TTL)           │
+├─────────────────────────────────────────────────────────────┤
+│ Em main + 4 sub-agents auto-inject ~100K core context:      │
+│   • rules.md, architecture.md, CLAUDE.md, PROJECT-MAP       │
+│   • STATUS top 100 line, HANDOFF top 150 line               │
+│   • 5 agent .md (README + 4 agent identity)                 │
+│   • 5 SKILL.md descriptions (auto-inject)                   │
+│   • 5 memory entries critical cross-cutting                  │
+└─────────────────────────────────────────────────────────────┘
+                              ↓
+┌─────────────────────────────────────────────────────────────┐
+│ LAYER 2 — Vector DB retrieve on-demand                      │
+├─────────────────────────────────────────────────────────────┤
+│ Qdrant local (~50MB binary, ~200MB index per project):      │
+│   • Session logs cumulative (49% MD, biggest)               │
+│   • Gotchas detail (chunk per entry)                        │
+│   • Archives + Recently Done + Migration-todos              │
+│   • Flows + Database guides                                 │
+│   • SKILL.md detail (description đã trong blanket)          │
+│   • Memory entries non-critical                             │
+│   • Guides ops conditional                                  │
+└─────────────────────────────────────────────────────────────┘
+                              ↑
+┌─────────────────────────────────────────────────────────────┐
+│ LAYER 3 — Embedding service (Voyage AI cloud)               │
+├─────────────────────────────────────────────────────────────┤
+│ voyage-3-large multilingual 26 lang (Việt-Anh tốt):         │
+│   • Index time: embed chunks → vectors (one-time + delta)   │
+│   • Query time: embed query → search Qdrant top-K           │
+│   • Cost: $0.18/M tokens, ~$0.36 init + ~$0.20/month        │
+└─────────────────────────────────────────────────────────────┘
+                              ↕
+┌─────────────────────────────────────────────────────────────┐
+│ LAYER 4 — MCP retriever server (FastMCP Python)             │
+├─────────────────────────────────────────────────────────────┤
+│ Tool exposed: rag_retrieve(query, scope, k, time_range)     │
+│ Transport: stdio (Claude Code) hoặc HTTP/SSE (multi-AI)     │
+│ Auth: API key per client (multi-AI mode)                    │
+└─────────────────────────────────────────────────────────────┘
+                              ↕
+┌─────────────────────────────────────────────────────────────┐
+│ LAYER 5 — Multi-AI clients                                  │
+├─────────────────────────────────────────────────────────────┤
+│ Claude Code (em main + 4 agents) — primary                  │
+│ Claude Desktop — secondary                                  │
+│ GPT-4 / Cursor / Continue / Custom agent — optional         │
+└─────────────────────────────────────────────────────────────┘
+                              ↑
+┌─────────────────────────────────────────────────────────────┐
+│ LAYER 6 — Re-index pipeline                                 │
+├─────────────────────────────────────────────────────────────┤
+│ Pre-commit hook: delta re-index changed MD                  │
+│ Weekly full re-index: catch missed (Saturday off-peak)      │
+│ Batch API 50% discount cho mass re-index                    │
+└─────────────────────────────────────────────────────────────┘
+```
+
+### Flow time index (1 lần init + delta)
+
+```
+1. Walk filesystem → docs/ + .claude/ + memory/
+2. Chunk adaptive theo doc_type (custom Python chunker)
+3. Batch embed via Voyage API (128 chunks/batch)
+4. Upsert Qdrant với metadata (source, doc_type, project, last_modified)
+5. Total init: ~10-15 phút cho 1M MD tokens
+```
+
+### Flow query time (mỗi spawn em main hoặc agent)
+
+```
+1. Em main/agent: rag_retrieve("query keyword", scope, k)
+2. MCP server: embed query → Voyage API (~100ms)
+3. MCP server: Qdrant search top-K (~50ms local)
+4. MCP server: return chunks với metadata + score
+5. Total: ~150-200ms per query (network-bound)
+6. Cache: subsequent same query → ~10ms (cache hit)
+```
+
+---
+
+## 3. BLANKET load list
+
+> **Total: ~100K tokens (28% project MD)**
+> Auto-load mỗi spawn em main + 4 agents.
+
+### A. Core stable docs (~30K — KHÔNG đổi thường xuyên)
+
+| File | Token | Lý do blanket |
+|---|---:|---|
+| `docs/rules.md` | ~7K | Coding conventions stable, mọi task reference |
+| `CLAUDE.md` (root pointer) | ~3K | Auto-inject system reminder |
+| `docs/CLAUDE.md` | ~3K | Tech stack overview baseline |
+| `docs/architecture.md` | ~7K | 4-layer Clean Arch baseline |
+| `docs/PROJECT-MAP.md` | ~3K | Bản đồ navigate |
+| `docs/workflow-contract.md` | ~4K | State machine 9 phase Contract domain core |
+| `docs/forms-spec.md` | ~3K | 8 form catalog domain knowledge |
+
+### B. Current state (~25K — em main biết direct, không cần retrieve)
+
+| File | Strategy | Token |
+|---|---|---:|
+| `docs/STATUS.md` **top 100 line** | Current phase + In Progress + 1-2 Recently Done top | ~15K |
+| `docs/HANDOFF.md` **top 150 line** | Last updated + TL;DR latest session + next priority | ~10K |
+
+→ **Drop từ blanket:** STATUS Recently Done > 5 row cũ (retrieve nếu cần), HANDOFF TL;DR cũ > 1 tuần.
+
+### C. Agent infrastructure (~25K — agent identity stable)
+
+| File | Token |
+|---|---:|
+| `.claude/agents/README.md` | ~5K |
+| `.claude/agents/investigator.md` | ~3.5K |
+| `.claude/agents/implementer.md` | ~4K |
+| `.claude/agents/reviewer.md` | ~3.5K |
+| `.claude/agents/cicd-monitor.md` | ~5K |
+| `.claude/agent-memory/{4 agents}/MEMORY.md` auto-inject 25KB first 200 lines | ~4K total |
+
+### D. Skills descriptions (~5K — auto-inject, không SKILL.md full)
+
+| File | Strategy | Token |
+|---|---|---:|
+| `.claude/skills/README.md` | Full | ~2.5K |
+| 6 SKILL.md descriptions | Auto-inject by Claude Code | ~1K total |
+| 6 SKILL.md detail | **KHÔNG blanket** → RAG retrieve khi triggered | — |
+
+### E. Memory user-level critical (~15K)
+
+| File | Token | Lý do critical |
+|---|---:|---|
+| `project_solution_erp.md` | ~3.5K | Project overview narrative |
+| `feedback_md_compact_narrative.md` (§6.5) | ~2K | Rule cốt lõi mọi doc work |
+| `feedback_uat_skip_verify.md` | ~2K | Phase 9 current mode rule |
+| `feedback_multi_agent_setup.md` | ~3K | 4-agent discipline |
+| `feedback_per_chunk_commit.md` | ~2K | Implementer pattern reusable |
+| `feedback_audit_reuse_before_clone.md` | ~2K | Investigator natural pattern |
+
+→ **Drop từ blanket:** 11 memory entries còn lại (retrieve khi pattern triggered).
+
+### TOTAL BLANKET ≈ 100K tokens
+
+---
+
+## 4. RAG store list
+
+> **Total: ~254K tokens (72% project MD)**
+> Index vào Qdrant, retrieve on-demand.
+
+### F. Session logs (~150K — biggest, 49% MD)
+
+```
+Path: docs/changelog/sessions/*.md (41+ files growing)
+Chunk strategy: 1 file = 1 chunk (preserve narrative §6.5)
+Metadata:
+  - session_date: extracted from filename
+  - phase: extracted from content
+  - topic: extracted from H1
+  - commit_sha_range: extracted from "Commits:" line
+  - doc_type: "session_log"
+Scope filter: time_range="last_week|last_month|last_quarter|all"
+```
+
+### G. Gotchas (~9K — lookup per debug)
+
+```
+Path: docs/gotchas.md (44+ entries)
+Chunk strategy: split per "### N. ..." numbered heading
+Metadata:
+  - gotcha_id: integer
+  - category: extracted from content (tech/EF/Workflow/CICD/Security/...)
+  - doc_type: "gotcha"
+Scope filter: scope="gotcha"
+```
+
+### H. Archives + Recently Done (~75K)
+
+| File | Strategy | Token |
+|---|---|---:|
+| `docs/STATUS.md` rest beyond top 100 | Per H2 section | ~8K |
+| `docs/HANDOFF.md` rest beyond top 150 | Per H2 section | ~21K |
+| `docs/changelog/migration-todos.md` | Per H3 task | ~18K |
+| `docs/changelog/recently-done-archive-*.md` | Per H3 phase | ~6K |
+| `docs/_archive/forms-spec-raw.md` | Full file (cold archive) | ~23K |
+| `docs/_archive/workflow-raw.md` | Full file (cold archive) | ~4K |
+
+### I. Flows + Database (~17K — conditional task)
+
+| File | Token | Khi retrieve |
+|---|---:|---|
+| `docs/flows/README.md` | ~1K | Index khi cần flow |
+| `docs/flows/auth-flow.md` | ~1K | Task auth |
+| `docs/flows/permission-flow.md` | ~1.5K | Task permission |
+| `docs/flows/contract-creation-flow.md` | ~1.5K | Task Contract |
+| `docs/flows/contract-approval-flow.md` | ~1.5K | Task approval |
+| `docs/flows/form-render-flow.md` | ~1K | Task form |
+| `docs/flows/sla-expiry-flow.md` | ~1K | Task SLA |
+| `docs/database/database-guide.md` | ~3K | Task schema |
+| `docs/database/schema-diagram.md` | ~12K | Task ERD |
+
+### J. SKILL.md detail (~40K — retrieve khi skill triggered)
+
+| File | Token |
+|---|---:|
+| `.claude/skills/contract-workflow/SKILL.md` | ~7K |
+| `.claude/skills/form-engine/SKILL.md` | ~5K |
+| `.claude/skills/permission-matrix/SKILL.md` | ~5K |
+| `.claude/skills/dependency-audit-erp/SKILL.md` | ~5K |
+| `.claude/skills/ef-core-migration/SKILL.md` | ~5.5K |
+| `.claude/skills/iis-deploy-runbook/SKILL.md` | ~6K |
+
+### K. Guides ops conditional (~10K)
+
+| File | Token | Khi retrieve |
+|---|---:|---|
+| `docs/guides/deployment-iis.md` | ~2.5K | Task deploy |
+| `docs/guides/cicd.md` | ~2K | Task CI/CD |
+| `docs/guides/security-checklist.md` | ~2K | Audit security |
+| `docs/guides/vps-setup.md` | ~2.5K | Setup VPS |
+| `docs/guides/runbook.md` | ~1K | Ops debug |
+
+### L. Memory entries non-critical (~50K — pattern lookup)
+
+```
+11 memory entries còn lại (user-level):
+  - feedback_n_stage_workflow_pattern.md (DEPRECATED post-Mig 21)
+  - feedback_designtime_runtime_db.md
+  - feedback_drastic_refactor_scope.md
+  - feedback_cron_monthly_limitation.md
+  - feedback_user_manual_style.md
+  - feedback_node_cicd.md
+  - feedback_unittest_timing.md
+  - feedback_responsive_laptop_breakpoint.md
+  - feedback_service_hook_vs_endpoint.md
+  - reference_session_prompts.md
+  - MEMORY.md index
+```
+
+### M. Audit logs (~2K, grow)
+
+```
+docs/changelog/skill-audit-{YYYY-MM}.md (monthly audit log)
+```
+
+### TOTAL RAG STORE ≈ 254K tokens
+
+---
+
+## 5. Tool stack recommend
+
+| Component | Tool | Reason | Cost |
+|---|---|---|---|
+| **Vector DB** | **Qdrant local** | Rust binary 50MB, no Docker, fast, metadata filtering, admin UI | $0 |
+| **Embedding** | **Voyage-3-large** | Anthropic partner, multilingual 26 lang, no GPU needed | $0.18/M (~$0.36 init) |
+| **MCP server framework** | **FastMCP Python** | Official Anthropic SDK, ~100 LOC, auto schema | $0 |
+| **Chunking** | **Custom Python adaptive** | ~50 LOC, transparent, §6.5 compliant | $0 |
+| **Re-index pipeline** | **Pre-commit hook** | Native git, ~10 LOC bash | $0 |
+| **Monitoring** | **Qdrant Dashboard + custom audit** | Built-in UI port 6333 | $0 |
+| **Auth (multi-AI)** | **Bearer token + rate limit** | Custom middleware ~30 LOC | $0 |
+| **Batch re-index** | **Voyage Batch API** | 50% discount cho mass re-embed | -50% |
+
+### Stack rejected + lý do
+
+| Alternative | Reason rejected |
+|---|---|
+| Chroma vector DB | Python ecosystem, slower than Qdrant Rust |
+| pgvector | Cần PostgreSQL setup, overhead |
+| OpenAI text-embedding-3-small | Vietnamese quality kém hơn Voyage |
+| BGE-M3 local | Cần GPU >= 4GB (Intel Iris Xe không OK) |
+| LangChain / LlamaIndex | Heavy abstraction, black-box debug khó, §6.5 chunker không tuân |
+| TypeScript MCP SDK | Verbose hơn Python FastMCP |
+| Pinecone cloud | Paid + vendor lock, không cần scale đó |
+
+---
+
+## 6. Setup scripts
+
+### 6.1 `requirements.txt`
+
+```text
+fastmcp>=2.0
+voyageai>=0.3
+qdrant-client>=1.12
+python-frontmatter>=1.1
+```
+
+### 6.2 `scripts/rag-indexer.py` (~120 LOC)
+
+```python
+"""
+RAG Indexer — Embed MD files + upsert vào Qdrant.
+
+Usage:
+  python rag-indexer.py                    # full index
+  python rag-indexer.py --files "a.md b.md"  # delta re-index
+"""
+import os, glob, re, sys
+from voyageai import Client
+from qdrant_client import QdrantClient
+from qdrant_client.models import Distance, VectorParams, PointStruct
+
+QDRANT_PATH = "./rag-data/qdrant"
+COLLECTION = "project_md"  # rename per project
+EMBED_MODEL = "voyage-3-large"
+DIM = 1024
+
+voyage = Client(api_key=os.environ["VOYAGE_API_KEY"])
+qdrant = QdrantClient(path=QDRANT_PATH)
+
+def chunk_file(path: str) -> list[dict]:
+    """Adaptive chunking theo doc type."""
+    content = open(path, encoding="utf-8").read()
+    base = {"source": path, "size_chars": len(content)}
+    
+    if "/changelog/sessions/" in path:
+        return [{**base, "content": content, "doc_type": "session_log"}]
+    
+    if path.endswith("gotchas.md"):
+        entries = re.split(r"^### (\d+)\.", content, flags=re.M)
+        return [
+            {**base, "content": f"### {entries[i]}.{entries[i+1]}",
+             "doc_type": "gotcha", "entry_id": int(entries[i])}
+            for i in range(1, len(entries), 2)
+        ]
+    
+    if "/skills/" in path:
+        return [{**base, "content": content, "doc_type": "skill"}]
+    
+    if "/agents/" in path:
+        return [{**base, "content": content, "doc_type": "agent"}]
+    
+    if path.endswith("MEMORY.md") or "/memory/" in path:
+        return [{**base, "content": content, "doc_type": "memory"}]
+    
+    # Default: split per H2 heading
+    sections = re.split(r"^## ", content, flags=re.M)
+    return [
+        {**base, "content": ("## " + s) if i > 0 else s,
+         "doc_type": "doc", "section_idx": i}
+        for i, s in enumerate(sections) if len(s.strip()) > 200
+    ]
+
+def main(files: list[str] | None = None):
+    # Init collection (idempotent)
+    if not qdrant.collection_exists(COLLECTION):
+        qdrant.create_collection(
+            COLLECTION,
+            vectors_config=VectorParams(size=DIM, distance=Distance.COSINE)
+        )
+    
+    # Determine paths
+    if files:
+        paths = files
+    else:
+        paths = (
+            glob.glob("docs/**/*.md", recursive=True) +
+            glob.glob(".claude/**/*.md", recursive=True)
+        )
+        paths = [p for p in paths
+                 if "node_modules" not in p and "_user-guide" not in p]
+    
+    # Chunk
+    chunks = []
+    for path in paths:
+        try:
+            chunks.extend(chunk_file(path))
+        except Exception as e:
+            print(f"Skip {path}: {e}")
+    print(f"Chunking: {len(chunks)} chunks from {len(paths)} files")
+    
+    # Batch embed (Voyage max 128/batch)
+    texts = [c["content"] for c in chunks]
+    embeddings = []
+    for i in range(0, len(texts), 128):
+        batch = texts[i:i+128]
+        result = voyage.embed(batch, model=EMBED_MODEL, input_type="document")
+        embeddings.extend(result.embeddings)
+        print(f"Embedded {i+len(batch)}/{len(texts)}")
+    
+    # Upsert (Qdrant auto-replaces by id)
+    points = [
+        PointStruct(
+            id=hash(c["source"] + str(c.get("section_idx", 0))) & 0xFFFFFFFF,
+            vector=emb,
+            payload=c
+        )
+        for c, emb in zip(chunks, embeddings)
+    ]
+    qdrant.upsert(collection_name=COLLECTION, points=points)
+    print(f"Indexed {len(points)} chunks → Qdrant")
+
+if __name__ == "__main__":
+    files = sys.argv[2].split() if len(sys.argv) > 2 and sys.argv[1] == "--files" else None
+    main(files)
+```
+
+### 6.3 `scripts/rag-mcp-server.py` (~80 LOC)
+
+```python
+"""
+MCP retriever server — Expose rag_retrieve tool cho Claude Code + agents.
+
+Run: python rag-mcp-server.py  (stdio default)
+     python rag-mcp-server.py --http :7777  (HTTP/SSE for multi-AI)
+"""
+import os, sys
+from fastmcp import FastMCP
+from voyageai import Client
+from qdrant_client import QdrantClient
+from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
+
+mcp = FastMCP("project-rag")
+voyage = Client(api_key=os.environ["VOYAGE_API_KEY"])
+qdrant = QdrantClient(path="./rag-data/qdrant")
+COLLECTION = "project_md"
+
+@mcp.tool()
+def rag_retrieve(
+    query: str,
+    scope: str = "all",
+    k: int = 5
+) -> list[dict]:
+    """
+    Semantic search MD context.
+    
+    Args:
+        query: Search query (Vietnamese hoặc English, mix OK)
+        scope: Filter by doc_type:
+               "all" | "session_log" | "gotcha" | "memory" | 
+               "skill" | "agent" | "doc"
+        k: Top chunks to return (1-15, default 5)
+    
+    Returns:
+        List[dict] với keys: content, source, doc_type, score
+    
+    Use cases:
+        - Historical session log: rag_retrieve("Mig 26 V2", scope="session_log")
+        - Gotcha lookup: rag_retrieve("silent 403", scope="gotcha")
+        - Pattern reuse: rag_retrieve("audit clone", scope="memory")
+        - Cross-section: rag_retrieve("query", scope="all", k=10)
+    """
+    k = min(max(k, 1), 15)
+    
+    # Embed query
+    query_vec = voyage.embed(
+        [query], model="voyage-3-large", input_type="query"
+    ).embeddings[0]
+    
+    # Filter
+    filter_dict = None
+    if scope != "all":
+        filter_dict = Filter(
+            must=[FieldCondition(key="doc_type", match=MatchValue(value=scope))]
+        )
+    
+    # Search
+    results = qdrant.search(
+        collection_name=COLLECTION,
+        query_vector=query_vec,
+        query_filter=filter_dict,
+        limit=k
+    )
+    
+    return [
+        {
+            "content": r.payload["content"][:3000],  # truncate huge
+            "source": r.payload["source"],
+            "doc_type": r.payload["doc_type"],
+            "score": round(r.score, 3)
+        }
+        for r in results
+    ]
+
+@mcp.tool()
+def rag_stats() -> dict:
+    """Return collection stats (for audit)."""
+    info = qdrant.get_collection(COLLECTION)
+    return {
+        "total_chunks": info.points_count,
+        "vector_dim": info.config.params.vectors.size,
+        "distance": info.config.params.vectors.distance.value,
+        "indexed_at": info.optimizer_status,
+    }
+
+if __name__ == "__main__":
+    # Default: stdio mode for Claude Code
+    # HTTP/SSE mode: python rag-mcp-server.py --http :7777
+    if "--http" in sys.argv:
+        port = int(sys.argv[sys.argv.index("--http") + 1].lstrip(":"))
+        mcp.run(transport="sse", port=port)
+    else:
+        mcp.run()  # stdio default
+```
+
+### 6.4 `.claude/settings.json` register
+
+```jsonc
+{
+  "mcpServers": {
+    "project-rag": {
+      "command": "python",
+      "args": ["scripts/rag-mcp-server.py"],
+      "cwd": "${workspaceFolder}",
+      "env": {
+        "VOYAGE_API_KEY": "${env:VOYAGE_API_KEY}"
+      }
+    }
+  }
+}
+```
+
+### 6.5 Pre-commit hook
+
+```bash
+#!/bin/sh
+# .git/hooks/pre-commit
+# Re-index changed MD files
+changed_md=$(git diff --cached --name-only --diff-filter=AMR | grep -E "\.md$")
+if [ -n "$changed_md" ]; then
+    echo "RAG re-indexing $(echo "$changed_md" | wc -l) MD files..."
+    python scripts/rag-indexer.py --files "$changed_md"
+fi
+```
+
+### 6.6 Agent .md frontmatter update
+
+```yaml
+# Mỗi .claude/agents/{agent}.md thêm tool:
+tools: [Read, Grep, Glob, Bash, mcp__project-rag__rag_retrieve, ...]
+```
+
+System prompt section thêm:
+
+```markdown
+## RAG retriever usage (rag_retrieve tool)
+
+**WHEN to use:**
+- Historical session log lookup (> 1 tuần cũ)
+- Gotcha pattern matching debug
+- Memory pattern reuse "clone X sang Y"
+- Cross-section semantic search
+
+**WHEN to use Read instead:**
+- Current state (STATUS + HANDOFF top) — blanket loaded
+- Active file editing (cần full file)
+- Architecture review (stable docs, blanket)
+
+**Query examples:**
+- rag_retrieve("silent 403 non-admin", scope="gotcha", k=3)
+- rag_retrieve("PE V2 wire pattern", scope="session_log", k=5)
+- rag_retrieve("audit reuse clone", scope="memory", k=3)
+```
+
+---
+
+## 7. Audit procedure
+
+### 7.1 Weekly quick audit (~30 phút, mỗi Saturday)
+
+**Mục tiêu:** Check health + cost trend hàng tuần.
+
+**Checklist:**
+
+```bash
+# 1. Index health
+curl http://localhost:6333/collections/project_md
+# Verify: points_count tăng + status="green"
+
+# 2. Re-index lag
+git log --since="1 week ago" --name-only --pretty=format: | grep -E "\.md$" | sort -u | wc -l
+python -c "
+from qdrant_client import QdrantClient
+q = QdrantClient(path='./rag-data/qdrant')
+# Check sources có matching files changed
+"
+
+# 3. Voyage cost
+# Visit voyageai.com dashboard → check last 7 days usage
+# Target: <$1/week steady state
+
+# 4. Random query quality (manual 5 query)
+# Sample queries:
+#   - "Recent Mig" → expect session log top
+#   - "silent 403" → expect gotcha #44 top
+#   - "audit reuse" → expect memory entry top
+# Score: 1-5 mỗi query (relevant chunks trong top-5)
+
+# 5. Storage size
+du -sh ./rag-data/
+# Target: <500MB per project
+```
+
+**Log:** `docs/changelog/rag-audit-weekly-{YYYY-WW}.md` (1 page)
+
+### 7.2 Monthly deep audit (~2-3h, mỗi đầu tháng)
+
+**Mục tiêu:** Quality benchmark + chunking review + stale cleanup.
+
+**Checklist:**
+
+```python
+# 1. Quality benchmark — 30 query test set
+test_queries = [
+    # Categories: state, historical, debug, pattern, cross-stack
+    ("Phase hiện tại", "doc"),
+    ("Mig 26 PE Level Opinions UPSERT", "session_log"),
+    ("silent 403 non-admin Forbidden", "gotcha"),
+    ("audit reuse trước clone B từ A", "memory"),
+    # ... 30 total covering all scopes
+]
+
+results = []
+for query, expected_scope in test_queries:
+    retrieved = rag_retrieve(query, k=10)
+    # Manual score:
+    # - Recall: % expected sources trong top-10
+    # - Precision: % retrieved chunks actually relevant
+    results.append({"query": query, "recall": ..., "precision": ...})
+
+# Target: avg recall > 80%, precision > 75%
+
+# 2. Chunking review — sample 10 random chunks
+# Check: chunks có bị cắt giữa narrative không (vi phạm §6.5)
+# Action: tune chunker nếu phát hiện issues
+
+# 3. Stale audit
+# Files chưa re-index > 14 days → flag
+# Files đã xóa khỏi repo nhưng còn trong Qdrant → cleanup
+
+# 4. Cost trend
+# Monthly Voyage spend vs target
+# Target: <$3/month steady
+
+# 5. Capacity check
+# Total chunks vs disk space projection
+# Project có grow size đáng kể (>20% MoM) → plan scale
+```
+
+**Log:** `docs/changelog/rag-audit-monthly-{YYYY-MM}.md` (2-3 pages)
+
+### 7.3 Quarterly major audit (~4-6h, mỗi quý)
+
+**Mục tiêu:** Strategic review + major upgrades.
+
+**Checklist:**
+
+1. **Embedding model upgrade decision**
+   - Voyage có model mới? Test side-by-side với voyage-3-large
+   - Quality benchmark trên 30 query test set
+   - Decision: upgrade nếu recall +5pp
+
+2. **Chunking strategy iteration**
+   - Review 50 random chunks
+   - Identify patterns: cắt sai, overlap missing, metadata thiếu
+   - Tune chunker code → re-index full
+
+3. **Collection re-build từ scratch**
+   - Backup current → drop collection → re-index all
+   - Mục đích: clean orphan chunks + apply new chunking
+   - Effort: ~30 phút for 1M MD
+
+4. **Multi-AI client access audit**
+   - Active clients (Claude Code / Desktop / GPT / Cursor)
+   - Per-client query volume + token spend
+   - Security: rotate auth tokens, review rate limits
+
+5. **Cross-project namespace audit** (nếu multi-project)
+   - Project isolation working correctly?
+   - Cross-project query intentional vs accidental?
+   - Adjust metadata filter rules
+
+**Log:** `docs/changelog/rag-audit-quarterly-{YYYY-Q}.md` (5-10 pages)
+
+### 7.4 Trigger-based audit (ad-hoc)
+
+| Trigger | Action |
+|---|---|
+| Retrieval miss critical (em main báo) | Audit chunk relevant tại sao miss + tune |
+| Cost spike >50% MoM | Audit query patterns + rate limit clients |
+| Re-index hang >1h | Audit indexer logs + Qdrant health |
+| Quality regression em main observe | Spot-check + monthly audit sớm |
+| New project added | Setup namespace + initial index audit |
+
+---
+
+## 8. Multi-AI client access
+
+### 8.1 MCP protocol — agnostic
+
+MCP (Model Context Protocol) là **standard protocol**. Bất kỳ AI client nào support MCP đều consume cùng 1 server:
+
+```
+              Qdrant (single source)
+                    ↓
+            MCP server :7777 (HTTP/SSE)
+       ↙          ↓           ↓          ↘
+  Claude Code  Claude     Cursor     GPT-4 +
+              Desktop      IDE       custom adapter
+```
+
+### 8.2 Transport modes
+
+| Mode | Use case | Setup |
+|---|---|---|
+| **stdio** | Single client (Claude Code local) — default | `python rag-mcp-server.py` |
+| **HTTP/SSE** | Multi-client (network access) | `python rag-mcp-server.py --http :7777` |
+| **WebSocket** | Bi-directional (rare) | Custom config |
+
+### 8.3 Setup multi-AI mode
+
+**Step 1: Run MCP server HTTP mode**
+
+```bash
+# Terminal 1: MCP server (keep running)
+export VOYAGE_API_KEY="pa-xxxx"
+python scripts/rag-mcp-server.py --http :7777
+
+# Server endpoint: http://localhost:7777/sse
+```
+
+**Step 2: Add auth middleware (recommend cho multi-client)**
+
+```python
+# Update rag-mcp-server.py
+from fastmcp import FastMCP
+from fastmcp.middleware import bearer_auth
+
+ALLOWED_TOKENS = {
+    "claude-code-token": "claude-code-primary",
+    "gpt4-token": "gpt4-cursor-integration",
+    "custom-agent-token": "custom-research-agent",
+}
+
+mcp = FastMCP("project-rag", middleware=[
+    bearer_auth(tokens=ALLOWED_TOKENS, rate_limit_per_minute=30)
+])
+```
+
+**Step 3: Register per-client config**
+
+#### Claude Code (em main + 4 agents)
+```jsonc
+// .claude/settings.json
+{
+  "mcpServers": {
+    "project-rag": {
+      "transport": "sse",
+      "url": "http://localhost:7777/sse",
+      "headers": {
+        "Authorization": "Bearer claude-code-token"
+      }
+    }
+  }
+}
+```
+
+#### Claude Desktop
+```jsonc
+// claude_desktop_config.json
+{
+  "mcpServers": {
+    "project-rag": {
+      "transport": "sse",
+      "url": "http://localhost:7777/sse",
+      "headers": {
+        "Authorization": "Bearer claude-desktop-token"
+      }
+    }
+  }
+}
+```
+
+#### Cursor IDE
+```jsonc
+// .cursor/settings.json
+{
+  "mcp.servers": {
+    "project-rag": {
+      "transport": "sse",
+      "url": "http://localhost:7777/sse"
+    }
+  }
+}
+```
+
+#### GPT-4 via custom adapter
+```python
+# Use OpenAI Assistants API + custom function calling
+import requests
+
+def query_project_rag(query: str, scope: str = "all", k: int = 5):
+    response = requests.post(
+        "http://localhost:7777/tool/rag_retrieve",
+        headers={"Authorization": "Bearer gpt4-token"},
+        json={"query": query, "scope": scope, "k": k}
+    )
+    return response.json()
+
+# Register as OpenAI function tool
+```
+
+#### Continue.dev / custom agent
+```yaml
+# config.yaml
+mcp_servers:
+  - name: project-rag
+    transport: sse
+    url: http://localhost:7777/sse
+    auth_token: custom-agent-token
+```
+
+### 8.4 Security model multi-AI
+
+| Concern | Mitigation |
+|---|---|
+| Token leak | Rotate quarterly, store in env vars |
+| Rate limit abuse | 30 req/min/token default, tune per client |
+| Read-only enforcement | MCP server expose only `rag_retrieve` + `rag_stats` (no write tools) |
+| Audit log | Log every query: timestamp + client_token + query + result_count |
+| Cross-project leak | Per-collection access control (future enhancement) |
+
+### 8.5 Cost considerations multi-AI
+
+```
+Single Claude Code client (current):
+  Voyage cost: ~$0.20/month (low query volume)
+  Qdrant: free local
+
+4 AI clients heavy use (Claude Code + Desktop + Cursor + GPT-4):
+  Voyage cost: ~$2-5/month (higher query volume)
+  Network bandwidth: minimal (~100KB/query response)
+  CPU: Qdrant + Voyage embed call ~100ms total
+  
+→ Multi-AI access scale linearly với query volume, not infrastructure cost.
+```
+
+### 8.6 Recommend rollout
+
+```
+Phase 1 (Week 1-4): Single client (Claude Code only)
+  → Validate quality + cost baseline
+  
+Phase 2 (Month 2+): Add Claude Desktop nếu cần mobile/casual access
+  → Same auth, share collection
+  
+Phase 3 (Month 3+): Add Cursor IDE nếu work multi-IDE
+  → Verify no cross-tool conflicts
+  
+Phase 4 (Future): GPT-4 / custom agent integration nếu cần
+  → Custom adapter + auth strict
+```
+
+---
+
+## 9. Timeline rollout
+
+### Hour-by-hour breakdown (~10-14h dedicated session)
+
+| Hour | Task | Effort |
+|---|---|---|
+| **1-2** | Setup pre-flight: disk cleanup + Voyage signup + Python deps install | ~2h |
+| **3-4** | Write `scripts/rag-indexer.py` + run initial embed | ~2h |
+| **5** | Verify Qdrant collection + manual query sanity check | ~1h |
+| **6-7** | Write `scripts/rag-mcp-server.py` + register `.claude/settings.json` | ~2h |
+| **8** | Test rag_retrieve qua Claude Code (em main solo) | ~1h |
+| **9-10** | Update 4 agent .md frontmatter + system prompt sections | ~2h |
+| **11** | Setup pre-commit hook + audit logging | ~1h |
+| **12-14** | Buffer + trial 10-15 query measure quality + cost | ~3h |
+
+### Trial 4-week plan
+
+```
+Week 1: Pilot single project (smaller of 2)
+  - Day 1-2: Setup + initial index
+  - Day 3-7: Active use + measure baseline metrics
+  - Deliverable: rag-audit-weekly-W1.md
+
+Week 2: Roll out 2nd project
+  - Day 1: Setup separate Qdrant collection
+  - Day 2-7: Dual-project use measure
+  - Deliverable: rag-audit-weekly-W2.md
+
+Week 3: 4-agent integration
+  - Day 1-2: Update 4 agent .md với rag_retrieve tool
+  - Day 3-7: Multi-agent task measure shared cache benefit
+  - Deliverable: rag-audit-weekly-W3.md
+
+Week 4: Decision gate (keep / tune / upgrade B / rollback)
+  - Day 1-2: Compile metrics
+  - Day 3: Decision meeting (bro + em main)
+  - Day 4-7: Apply decision (tune embedding/chunking OR upgrade Option B OR rollback)
+  - Deliverable: rag-audit-monthly-M1.md + decision doc
+```
+
+### Decision gate Week 4
+
+```
+PASS criteria (continue + tune):
+  ✅ Quality recall > 80% on 30 query benchmark
+  ✅ Cost < $5/month total (Voyage + storage)
+  ✅ Session lifespan tăng > 30% (heavy session)
+  ✅ Multi-agent shared cache hit > 60%
+  ✅ Retrieval miss critical < 10% queries
+  ✅ Storage < 1GB per project
+
+TUNE criteria (continue + adjust):
+  ⚠️ Quality 70-80% → tune chunking or upgrade embedding
+  ⚠️ Cost 5-10/mo → audit query patterns, reduce k
+  ⚠️ Session lifespan tăng < 30% → audit blanket effectiveness
+
+ROLLBACK criteria (archive RAG):
+  ❌ Quality < 70%
+  ❌ Cost > $10/mo recurring
+  ❌ Session lifespan KHÔNG tăng or giảm
+  ❌ Em main complain "miss context" thường xuyên
+  ❌ Storage > 5GB per project
+```
+
+---
+
+## 10. Caveats + risks
+
+### 10.1 Beta features risk
+
+| Feature | Status | Mitigation |
+|---|---|---|
+| Anthropic Memory tool | Beta `content-management-2025-06-27` | Defer until GA, use MEMORY.md current |
+| Anthropic Files API | Beta `files-api-2025-04-14` | Optional add-on, RAG primary |
+| Extended 1h prompt cache | Beta `extended-cache-ttl-2025-04-11` | Use 5min default, opt-in 1h khi heavy session |
+| Voyage AI API | Stable | Production OK |
+| Qdrant local | Stable | Production OK |
+| FastMCP | Stable v2+ | Production OK |
+
+### 10.2 Storage concerns
+
+```
+Bro hiện tại: 911/954 GB used = 96% full (43GB free)
+
+RAG storage budget:
+  Qdrant binary: ~50MB
+  Per project index: ~200-500MB (depend MD volume)
+  Backup snapshots: ~500MB
+  Logs + audit: ~100MB
+  
+Per project total: ~1GB
+2 projects total: ~2GB
++ buffer 1GB
+= 3GB recommend free space
+
+→ Cleanup TRƯỚC setup: target 5GB+ free
+```
+
+**Cleanup priorities:**
+- `node_modules` projects cũ
+- `.NET bin/obj` artifacts
+- Docker images (`docker system prune -a`)
+- Browser caches (Chrome/Edge ~5GB common)
+- `%LOCALAPPDATA%` caches (NuGet, dotnet)
+- Downloads / Videos không dùng
+
+### 10.3 Quality monitoring
+
+| Risk | Indicator | Action |
+|---|---|---|
+| Chunking break narrative | Em main report "miss context" | Review chunk strategy, tune |
+| Embedding drift | Recall drop > 10pp benchmark | Re-embed full, check Voyage updates |
+| Stale index | Files commit chưa re-index | Force re-index full, check hook |
+| Query phrasing kém | Low precision on simple queries | Em main refine query patterns |
+| Cross-language mismatch | Vietnamese query miss English content | Multilingual reranker hoặc query expansion |
+
+### 10.4 Fallback strategy
+
+```
+Khi RAG fail / quality drop:
+  Layer 1: Em main fallback to Read full file (existing lazy pattern still works)
+  Layer 2: Em main blanket load critical file directly
+  Layer 3: Rollback Qdrant snapshot (weekly backup)
+  Layer 4: Full re-index từ scratch (~15 phút)
+  Layer 5: Archive RAG, return lazy current pattern (ultimate fallback)
+```
+
+Em main blanket 120K KHÔNG bị mất khi RAG fail → graceful degradation.
+
+### 10.5 Vietnamese-English mix considerations
+
+```
+Voyage-3-large multilingual claim 26 lang coverage.
+Vietnamese explicit benchmark KHÔNG public.
+
+Risk: technical jargon Việt-Anh mix có thể miss synonym.
+  Ví dụ: "im lặng 403" vs "silent 403" — vector có gần nhau không?
+  
+Mitigation:
+  - Test 10-20 Việt-Anh mix queries trong audit benchmark
+  - Nếu recall low → consider voyage-multilingual-2 backup
+  - Hoặc add query expansion (Anthropic Contextual Retrieval pattern)
+```
+
+---
+
+## 11. Success metrics
+
+### 11.1 Quality metrics
+
+| Metric | Target | Measurement |
+|---|---:|---|
+| Recall avg (30 query benchmark) | > 80% | Manual score weekly |
+| Precision avg | > 75% | Manual score weekly |
+| Retrieval miss critical rate | < 10% | Em main report cumulative |
+| Cross-language query recall | > 70% | Việt-Anh mix benchmark |
+
+### 11.2 Cost metrics
+
+| Metric | Target | Measurement |
+|---|---:|---|
+| Voyage monthly spend | < $5 | Voyage dashboard |
+| Total RAG infra cost | < $10/month | Sum tools |
+| Cost per query | < $0.001 | Calculated |
+| Disk usage per project | < 1GB | `du -sh` |
+
+### 11.3 Performance metrics
+
+| Metric | Target | Measurement |
+|---|---:|---|
+| Query latency (P50) | < 200ms | MCP server log |
+| Query latency (P99) | < 500ms | MCP server log |
+| Re-index lag (post-commit) | < 30s | Pre-commit hook timing |
+| Cache hit rate (multi-agent) | > 60% | Custom metric |
+
+### 11.4 Capacity metrics
+
+| Metric | Target | Measurement |
+|---|---:|---|
+| Session lifespan productive | +50% vs lazy | Time tracker |
+| Tasks before lost-in-middle | > 35 | Task counter |
+| Heavy session token | -20% vs lazy | Anthropic dashboard |
+| Multi-agent overlap saving | > 50K/session | Cumulative calc |
+
+### 11.5 Multi-AI client metrics
+
+| Metric | Target | Measurement |
+|---|---:|---|
+| Active clients | ≥ 1 stable | Audit log |
+| Per-client query volume | Track baseline | Audit log per client |
+| Cross-client conflict | 0 | Bug reports |
+
+---
+
+## 12. Future enhancements
+
+### 12.1 Phase 2 (after Week 4 validation)
+
+| Enhancement | Effort | Benefit |
+|---|---|---|
+| Upgrade Option B (drop blanket 30-40K) | 1 session | Saving +15% tokens |
+| Anthropic Memory tool integration | 2-3h | Native cross-conversation memory |
+| Files API integration | 2-3h | Reduce blanket re-upload cost |
+| Citations enable | 1h | RAG quality trace |
+
+### 12.2 Phase 3 (Month 2-3)
+
+| Enhancement | Effort | Benefit |
+|---|---|---|
+| Hybrid BM25 + vector search (Contextual Retrieval) | 4-6h | +49-67% recall (Anthropic doc) |
+| Multi-project namespace | 2-3h | Cross-project query với strict isolation |
+| Reranker model (Cohere rerank-3) | 2-3h | +10-20% precision |
+| Custom Streamlit audit dashboard | 4-5h | Visual quality monitoring |
+
+### 12.3 Phase 4 (Quarter 2+)
+
+| Enhancement | Effort | Benefit |
+|---|---|---|
+| Replace Voyage với Anthropic native embedding (if GA) | 2-3h | Reduce vendor count |
+| Auto-tuning chunking (LLM-aided) | 1 week | Quality+ |
+| Federated multi-machine setup | 1 week | Team usage |
+| Time-series analytics on retrieval patterns | 1 week | Insights |
+
+### 12.4 Defer indefinitely (over-engineering)
+
+- ❌ LangChain / LlamaIndex framework (heavy abstraction)
+- ❌ Self-host LLM (cost > value)
+- ❌ Custom embedding model fine-tuning (effort > value)
+- ❌ Full text + vector hybrid index (use Voyage Reranker instead)
+
+---
+
+## 📚 References + tools
+
+### Anthropic official
+- [Memory tool docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool)
+- [Prompt caching guide](https://platform.claude.com/docs/en/build-with-claude/prompt-caching)
+- [Files API](https://platform.claude.com/docs/en/build-with-claude/files)
+- [Contextual Retrieval cookbook](https://platform.claude.com/cookbook/capabilities-contextual-embeddings-guide)
+- [Effective context engineering](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)
+- [Agent SDK overview](https://code.claude.com/docs/en/agent-sdk/overview)
+
+### Tools docs
+- [Qdrant docs](https://qdrant.tech/documentation/)
+- [Voyage AI pricing](https://docs.voyageai.com/docs/pricing)
+- [FastMCP](https://github.com/jlowin/fastmcp)
+- [MCP servers list](https://github.com/modelcontextprotocol/servers)
+
+### Project memory
+- `feedback_md_compact_narrative.md` (§6.5 rule — KEEP narrative)
+- `feedback_multi_agent_setup.md` (4-agent discipline)
+- `feedback_drastic_refactor_scope.md` (RAG setup = dedicated session)
+- `feedback_uat_skip_verify.md` (Phase 9 UAT mode)
+
+---
+
+## ✅ Pre-implementation checklist
+
+```
+☐ Bro confirm 3 thông tin:
+  ☐ 2 dự án path (để Investigator audit MD inventory pre-flight)
+  ☐ Stack 2 dự án (BE: .NET/Node/Python? FE: React/Vue?)
+  ☐ Pilot project chọn (smaller in 2)
+  
+☐ Bro prepare environment:
+  ☐ Disk cleanup 5GB+ free (current 911/954 = 96% full)
+  ☐ Voyage AI account signup + API key
+  ☐ Python 3.10+ installed
+  ☐ Git installed (cho pre-commit hook)
+  
+☐ Bro schedule dedicated session:
+  ☐ 10-14h block 1 ngày cuối tuần (memory feedback_drastic_refactor_scope rule)
+  ☐ Reserve weekly cap ~30% cho RAG setup spawn cost
+  
+☐ Bro review plan:
+  ☐ Read full this file
+  ☐ Confirm scope blanket vs RAG store match needs
+  ☐ Confirm tool stack acceptable
+  ☐ Approve Week 1-4 trial timeline
+```
+
+---
+
+## 📝 Notes — keep updated
+
+- **2026-05-12 turn 1:** Plan saved sau S21 turn 1 chốt cicd-monitor. Cross-project reference cho 2 dự án future bro > 1M MD. SOLUTION_ERP baseline ~354K MD (chưa cần RAG, defer).
+- **Status:** 📝 PLAN ONLY — chưa implement
+- **Next trigger:** Bro confirm 3 thông tin → spawn 🔵 Investigator audit MD inventory 2 dự án → tinh chỉnh blanket list cho từng project