diff --git a/docs/rag-setup-plan.md b/docs/rag-setup-plan.md new file mode 100644 index 0000000..0481ad5 --- /dev/null +++ b/docs/rag-setup-plan.md @@ -0,0 +1,1223 @@ +# RAG Setup Plan — Cross-project reference + +> **Mục đích:** Plan setup Hybrid RAG (Option A) cho project có MD context > 1M tokens. Cross-project applicable — SOLUTION_ERP làm baseline reference, future 2 dự án bro apply pattern này. +> **Last updated:** 2026-05-12 (Session 21 turn 1+) +> **Status:** 📝 Plan saved — chưa implement, target Week 1-4 trial 2 dự án future +> **Owner:** pqhuy1987@gmail.com + Claude (em main + 4 sub-agents) + +--- + +## 📋 Table of Contents + +1. [Context + Why](#1-context--why) +2. [Architecture overview](#2-architecture-overview) +3. [BLANKET load list (~100K tokens, 28%)](#3-blanket-load-list) +4. [RAG store list (~254K tokens, 72%)](#4-rag-store-list) +5. [Tool stack recommend](#5-tool-stack-recommend) +6. [Setup scripts (copy-paste ready)](#6-setup-scripts) +7. [Audit procedure (3-tier cadence)](#7-audit-procedure) +8. [Multi-AI client access](#8-multi-ai-client-access) +9. [Timeline rollout (~10-14h dedicated)](#9-timeline-rollout) +10. [Caveats + risks](#10-caveats--risks) +11. [Success metrics + decision gate](#11-success-metrics) +12. [Future enhancements](#12-future-enhancements) + +--- + +## 1. Context + Why + +### Problem statement + +``` +Hiện tại lazy blanket pattern (em main + 4 agents): + - Em main vác ~120K MD upfront (35% project) + - Lazy Read khi cần — em main TỰ ĐOÁN file relevant + - 4 agents mỗi spawn ~188K cache WRITE + - Heavy session ~700K effective billed + - Lost-in-middle threshold đạt sau ~5.75h productive + +Scale-up to 2 projects > 1M MD tokens each: + ❌ Blanket KHÔNG khả thi (vượt 1M context cap) + ❌ Lazy Read recall ~30-60% (em main miss file không nghĩ tới) + ❌ 4 agents duplicate Read same files (cumulative ~240K wasted) + ❌ Vietnamese-English synonym miss (grep keyword only) + ❌ Cross-project context impossible without manual switching +``` + +### Solution + +**Hybrid RAG Option A** — blanket critical + retrieve on-demand: + +``` +KEEP blanket: ~100K static (core stable + current state + agent + skills + memory critical) +ADD RAG layer: 70% MD remaining accessible via semantic retrieve +SHARE cache: 4 agents reuse retrieved chunks (multi-agent leverage) +``` + +### Benefits chốt từ analysis sessions trước + +| Metric | Lazy current | Option A | Δ | +|---|---|---|---| +| Quality recall | 30-60% | **85%** | **+25-55pp** | +| Heavy session token | 700K | **560K** | -20% | +| Session productive hours | 5.75h | **7.6h** | **+1.85h** | +| Tasks before lost-in-middle | ~23 | **~38** | **+65%** | +| Net successful tasks/session | 25 | **50** | **2×** | +| Multi-agent shared cache | ❌ | **✅ 60-90% cache hit** | leverage real | +| Việt-Anh semantic search | ❌ grep only | **✅ Voyage multilingual** | unlock | +| Scale > 1M MD | ❌ break | **✅ work** | **enable** | + +### Trade-off + +- ⚠️ Setup cost: ~10-14h dedicated session (1 lần invest) +- ⚠️ Maintenance: ~30 phút/tuần audit +- ⚠️ Beta features (Memory tool, Files API): có thể breaking change +- ⚠️ Retrieval miss risk ~5-10% (mitigated bằng citations + fallback Read) +- ⚠️ Voyage API cost: ~$0.36 initial embed + ~$0.20/tháng delta + +--- + +## 2. Architecture overview + +``` +┌─────────────────────────────────────────────────────────────┐ +│ LAYER 1 — Static blanket (cache hot, 5min-1h TTL) │ +├─────────────────────────────────────────────────────────────┤ +│ Em main + 4 sub-agents auto-inject ~100K core context: │ +│ • rules.md, architecture.md, CLAUDE.md, PROJECT-MAP │ +│ • STATUS top 100 line, HANDOFF top 150 line │ +│ • 5 agent .md (README + 4 agent identity) │ +│ • 5 SKILL.md descriptions (auto-inject) │ +│ • 5 memory entries critical cross-cutting │ +└─────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────┐ +│ LAYER 2 — Vector DB retrieve on-demand │ +├─────────────────────────────────────────────────────────────┤ +│ Qdrant local (~50MB binary, ~200MB index per project): │ +│ • Session logs cumulative (49% MD, biggest) │ +│ • Gotchas detail (chunk per entry) │ +│ • Archives + Recently Done + Migration-todos │ +│ • Flows + Database guides │ +│ • SKILL.md detail (description đã trong blanket) │ +│ • Memory entries non-critical │ +│ • Guides ops conditional │ +└─────────────────────────────────────────────────────────────┘ + ↑ +┌─────────────────────────────────────────────────────────────┐ +│ LAYER 3 — Embedding service (Voyage AI cloud) │ +├─────────────────────────────────────────────────────────────┤ +│ voyage-3-large multilingual 26 lang (Việt-Anh tốt): │ +│ • Index time: embed chunks → vectors (one-time + delta) │ +│ • Query time: embed query → search Qdrant top-K │ +│ • Cost: $0.18/M tokens, ~$0.36 init + ~$0.20/month │ +└─────────────────────────────────────────────────────────────┘ + ↕ +┌─────────────────────────────────────────────────────────────┐ +│ LAYER 4 — MCP retriever server (FastMCP Python) │ +├─────────────────────────────────────────────────────────────┤ +│ Tool exposed: rag_retrieve(query, scope, k, time_range) │ +│ Transport: stdio (Claude Code) hoặc HTTP/SSE (multi-AI) │ +│ Auth: API key per client (multi-AI mode) │ +└─────────────────────────────────────────────────────────────┘ + ↕ +┌─────────────────────────────────────────────────────────────┐ +│ LAYER 5 — Multi-AI clients │ +├─────────────────────────────────────────────────────────────┤ +│ Claude Code (em main + 4 agents) — primary │ +│ Claude Desktop — secondary │ +│ GPT-4 / Cursor / Continue / Custom agent — optional │ +└─────────────────────────────────────────────────────────────┘ + ↑ +┌─────────────────────────────────────────────────────────────┐ +│ LAYER 6 — Re-index pipeline │ +├─────────────────────────────────────────────────────────────┤ +│ Pre-commit hook: delta re-index changed MD │ +│ Weekly full re-index: catch missed (Saturday off-peak) │ +│ Batch API 50% discount cho mass re-index │ +└─────────────────────────────────────────────────────────────┘ +``` + +### Flow time index (1 lần init + delta) + +``` +1. Walk filesystem → docs/ + .claude/ + memory/ +2. Chunk adaptive theo doc_type (custom Python chunker) +3. Batch embed via Voyage API (128 chunks/batch) +4. Upsert Qdrant với metadata (source, doc_type, project, last_modified) +5. Total init: ~10-15 phút cho 1M MD tokens +``` + +### Flow query time (mỗi spawn em main hoặc agent) + +``` +1. Em main/agent: rag_retrieve("query keyword", scope, k) +2. MCP server: embed query → Voyage API (~100ms) +3. MCP server: Qdrant search top-K (~50ms local) +4. MCP server: return chunks với metadata + score +5. Total: ~150-200ms per query (network-bound) +6. Cache: subsequent same query → ~10ms (cache hit) +``` + +--- + +## 3. BLANKET load list + +> **Total: ~100K tokens (28% project MD)** +> Auto-load mỗi spawn em main + 4 agents. + +### A. Core stable docs (~30K — KHÔNG đổi thường xuyên) + +| File | Token | Lý do blanket | +|---|---:|---| +| `docs/rules.md` | ~7K | Coding conventions stable, mọi task reference | +| `CLAUDE.md` (root pointer) | ~3K | Auto-inject system reminder | +| `docs/CLAUDE.md` | ~3K | Tech stack overview baseline | +| `docs/architecture.md` | ~7K | 4-layer Clean Arch baseline | +| `docs/PROJECT-MAP.md` | ~3K | Bản đồ navigate | +| `docs/workflow-contract.md` | ~4K | State machine 9 phase Contract domain core | +| `docs/forms-spec.md` | ~3K | 8 form catalog domain knowledge | + +### B. Current state (~25K — em main biết direct, không cần retrieve) + +| File | Strategy | Token | +|---|---|---:| +| `docs/STATUS.md` **top 100 line** | Current phase + In Progress + 1-2 Recently Done top | ~15K | +| `docs/HANDOFF.md` **top 150 line** | Last updated + TL;DR latest session + next priority | ~10K | + +→ **Drop từ blanket:** STATUS Recently Done > 5 row cũ (retrieve nếu cần), HANDOFF TL;DR cũ > 1 tuần. + +### C. Agent infrastructure (~25K — agent identity stable) + +| File | Token | +|---|---:| +| `.claude/agents/README.md` | ~5K | +| `.claude/agents/investigator.md` | ~3.5K | +| `.claude/agents/implementer.md` | ~4K | +| `.claude/agents/reviewer.md` | ~3.5K | +| `.claude/agents/cicd-monitor.md` | ~5K | +| `.claude/agent-memory/{4 agents}/MEMORY.md` auto-inject 25KB first 200 lines | ~4K total | + +### D. Skills descriptions (~5K — auto-inject, không SKILL.md full) + +| File | Strategy | Token | +|---|---|---:| +| `.claude/skills/README.md` | Full | ~2.5K | +| 6 SKILL.md descriptions | Auto-inject by Claude Code | ~1K total | +| 6 SKILL.md detail | **KHÔNG blanket** → RAG retrieve khi triggered | — | + +### E. Memory user-level critical (~15K) + +| File | Token | Lý do critical | +|---|---:|---| +| `project_solution_erp.md` | ~3.5K | Project overview narrative | +| `feedback_md_compact_narrative.md` (§6.5) | ~2K | Rule cốt lõi mọi doc work | +| `feedback_uat_skip_verify.md` | ~2K | Phase 9 current mode rule | +| `feedback_multi_agent_setup.md` | ~3K | 4-agent discipline | +| `feedback_per_chunk_commit.md` | ~2K | Implementer pattern reusable | +| `feedback_audit_reuse_before_clone.md` | ~2K | Investigator natural pattern | + +→ **Drop từ blanket:** 11 memory entries còn lại (retrieve khi pattern triggered). + +### TOTAL BLANKET ≈ 100K tokens + +--- + +## 4. RAG store list + +> **Total: ~254K tokens (72% project MD)** +> Index vào Qdrant, retrieve on-demand. + +### F. Session logs (~150K — biggest, 49% MD) + +``` +Path: docs/changelog/sessions/*.md (41+ files growing) +Chunk strategy: 1 file = 1 chunk (preserve narrative §6.5) +Metadata: + - session_date: extracted from filename + - phase: extracted from content + - topic: extracted from H1 + - commit_sha_range: extracted from "Commits:" line + - doc_type: "session_log" +Scope filter: time_range="last_week|last_month|last_quarter|all" +``` + +### G. Gotchas (~9K — lookup per debug) + +``` +Path: docs/gotchas.md (44+ entries) +Chunk strategy: split per "### N. ..." numbered heading +Metadata: + - gotcha_id: integer + - category: extracted from content (tech/EF/Workflow/CICD/Security/...) + - doc_type: "gotcha" +Scope filter: scope="gotcha" +``` + +### H. Archives + Recently Done (~75K) + +| File | Strategy | Token | +|---|---|---:| +| `docs/STATUS.md` rest beyond top 100 | Per H2 section | ~8K | +| `docs/HANDOFF.md` rest beyond top 150 | Per H2 section | ~21K | +| `docs/changelog/migration-todos.md` | Per H3 task | ~18K | +| `docs/changelog/recently-done-archive-*.md` | Per H3 phase | ~6K | +| `docs/_archive/forms-spec-raw.md` | Full file (cold archive) | ~23K | +| `docs/_archive/workflow-raw.md` | Full file (cold archive) | ~4K | + +### I. Flows + Database (~17K — conditional task) + +| File | Token | Khi retrieve | +|---|---:|---| +| `docs/flows/README.md` | ~1K | Index khi cần flow | +| `docs/flows/auth-flow.md` | ~1K | Task auth | +| `docs/flows/permission-flow.md` | ~1.5K | Task permission | +| `docs/flows/contract-creation-flow.md` | ~1.5K | Task Contract | +| `docs/flows/contract-approval-flow.md` | ~1.5K | Task approval | +| `docs/flows/form-render-flow.md` | ~1K | Task form | +| `docs/flows/sla-expiry-flow.md` | ~1K | Task SLA | +| `docs/database/database-guide.md` | ~3K | Task schema | +| `docs/database/schema-diagram.md` | ~12K | Task ERD | + +### J. SKILL.md detail (~40K — retrieve khi skill triggered) + +| File | Token | +|---|---:| +| `.claude/skills/contract-workflow/SKILL.md` | ~7K | +| `.claude/skills/form-engine/SKILL.md` | ~5K | +| `.claude/skills/permission-matrix/SKILL.md` | ~5K | +| `.claude/skills/dependency-audit-erp/SKILL.md` | ~5K | +| `.claude/skills/ef-core-migration/SKILL.md` | ~5.5K | +| `.claude/skills/iis-deploy-runbook/SKILL.md` | ~6K | + +### K. Guides ops conditional (~10K) + +| File | Token | Khi retrieve | +|---|---:|---| +| `docs/guides/deployment-iis.md` | ~2.5K | Task deploy | +| `docs/guides/cicd.md` | ~2K | Task CI/CD | +| `docs/guides/security-checklist.md` | ~2K | Audit security | +| `docs/guides/vps-setup.md` | ~2.5K | Setup VPS | +| `docs/guides/runbook.md` | ~1K | Ops debug | + +### L. Memory entries non-critical (~50K — pattern lookup) + +``` +11 memory entries còn lại (user-level): + - feedback_n_stage_workflow_pattern.md (DEPRECATED post-Mig 21) + - feedback_designtime_runtime_db.md + - feedback_drastic_refactor_scope.md + - feedback_cron_monthly_limitation.md + - feedback_user_manual_style.md + - feedback_node_cicd.md + - feedback_unittest_timing.md + - feedback_responsive_laptop_breakpoint.md + - feedback_service_hook_vs_endpoint.md + - reference_session_prompts.md + - MEMORY.md index +``` + +### M. Audit logs (~2K, grow) + +``` +docs/changelog/skill-audit-{YYYY-MM}.md (monthly audit log) +``` + +### TOTAL RAG STORE ≈ 254K tokens + +--- + +## 5. Tool stack recommend + +| Component | Tool | Reason | Cost | +|---|---|---|---| +| **Vector DB** | **Qdrant local** | Rust binary 50MB, no Docker, fast, metadata filtering, admin UI | $0 | +| **Embedding** | **Voyage-3-large** | Anthropic partner, multilingual 26 lang, no GPU needed | $0.18/M (~$0.36 init) | +| **MCP server framework** | **FastMCP Python** | Official Anthropic SDK, ~100 LOC, auto schema | $0 | +| **Chunking** | **Custom Python adaptive** | ~50 LOC, transparent, §6.5 compliant | $0 | +| **Re-index pipeline** | **Pre-commit hook** | Native git, ~10 LOC bash | $0 | +| **Monitoring** | **Qdrant Dashboard + custom audit** | Built-in UI port 6333 | $0 | +| **Auth (multi-AI)** | **Bearer token + rate limit** | Custom middleware ~30 LOC | $0 | +| **Batch re-index** | **Voyage Batch API** | 50% discount cho mass re-embed | -50% | + +### Stack rejected + lý do + +| Alternative | Reason rejected | +|---|---| +| Chroma vector DB | Python ecosystem, slower than Qdrant Rust | +| pgvector | Cần PostgreSQL setup, overhead | +| OpenAI text-embedding-3-small | Vietnamese quality kém hơn Voyage | +| BGE-M3 local | Cần GPU >= 4GB (Intel Iris Xe không OK) | +| LangChain / LlamaIndex | Heavy abstraction, black-box debug khó, §6.5 chunker không tuân | +| TypeScript MCP SDK | Verbose hơn Python FastMCP | +| Pinecone cloud | Paid + vendor lock, không cần scale đó | + +--- + +## 6. Setup scripts + +### 6.1 `requirements.txt` + +```text +fastmcp>=2.0 +voyageai>=0.3 +qdrant-client>=1.12 +python-frontmatter>=1.1 +``` + +### 6.2 `scripts/rag-indexer.py` (~120 LOC) + +```python +""" +RAG Indexer — Embed MD files + upsert vào Qdrant. + +Usage: + python rag-indexer.py # full index + python rag-indexer.py --files "a.md b.md" # delta re-index +""" +import os, glob, re, sys +from voyageai import Client +from qdrant_client import QdrantClient +from qdrant_client.models import Distance, VectorParams, PointStruct + +QDRANT_PATH = "./rag-data/qdrant" +COLLECTION = "project_md" # rename per project +EMBED_MODEL = "voyage-3-large" +DIM = 1024 + +voyage = Client(api_key=os.environ["VOYAGE_API_KEY"]) +qdrant = QdrantClient(path=QDRANT_PATH) + +def chunk_file(path: str) -> list[dict]: + """Adaptive chunking theo doc type.""" + content = open(path, encoding="utf-8").read() + base = {"source": path, "size_chars": len(content)} + + if "/changelog/sessions/" in path: + return [{**base, "content": content, "doc_type": "session_log"}] + + if path.endswith("gotchas.md"): + entries = re.split(r"^### (\d+)\.", content, flags=re.M) + return [ + {**base, "content": f"### {entries[i]}.{entries[i+1]}", + "doc_type": "gotcha", "entry_id": int(entries[i])} + for i in range(1, len(entries), 2) + ] + + if "/skills/" in path: + return [{**base, "content": content, "doc_type": "skill"}] + + if "/agents/" in path: + return [{**base, "content": content, "doc_type": "agent"}] + + if path.endswith("MEMORY.md") or "/memory/" in path: + return [{**base, "content": content, "doc_type": "memory"}] + + # Default: split per H2 heading + sections = re.split(r"^## ", content, flags=re.M) + return [ + {**base, "content": ("## " + s) if i > 0 else s, + "doc_type": "doc", "section_idx": i} + for i, s in enumerate(sections) if len(s.strip()) > 200 + ] + +def main(files: list[str] | None = None): + # Init collection (idempotent) + if not qdrant.collection_exists(COLLECTION): + qdrant.create_collection( + COLLECTION, + vectors_config=VectorParams(size=DIM, distance=Distance.COSINE) + ) + + # Determine paths + if files: + paths = files + else: + paths = ( + glob.glob("docs/**/*.md", recursive=True) + + glob.glob(".claude/**/*.md", recursive=True) + ) + paths = [p for p in paths + if "node_modules" not in p and "_user-guide" not in p] + + # Chunk + chunks = [] + for path in paths: + try: + chunks.extend(chunk_file(path)) + except Exception as e: + print(f"Skip {path}: {e}") + print(f"Chunking: {len(chunks)} chunks from {len(paths)} files") + + # Batch embed (Voyage max 128/batch) + texts = [c["content"] for c in chunks] + embeddings = [] + for i in range(0, len(texts), 128): + batch = texts[i:i+128] + result = voyage.embed(batch, model=EMBED_MODEL, input_type="document") + embeddings.extend(result.embeddings) + print(f"Embedded {i+len(batch)}/{len(texts)}") + + # Upsert (Qdrant auto-replaces by id) + points = [ + PointStruct( + id=hash(c["source"] + str(c.get("section_idx", 0))) & 0xFFFFFFFF, + vector=emb, + payload=c + ) + for c, emb in zip(chunks, embeddings) + ] + qdrant.upsert(collection_name=COLLECTION, points=points) + print(f"Indexed {len(points)} chunks → Qdrant") + +if __name__ == "__main__": + files = sys.argv[2].split() if len(sys.argv) > 2 and sys.argv[1] == "--files" else None + main(files) +``` + +### 6.3 `scripts/rag-mcp-server.py` (~80 LOC) + +```python +""" +MCP retriever server — Expose rag_retrieve tool cho Claude Code + agents. + +Run: python rag-mcp-server.py (stdio default) + python rag-mcp-server.py --http :7777 (HTTP/SSE for multi-AI) +""" +import os, sys +from fastmcp import FastMCP +from voyageai import Client +from qdrant_client import QdrantClient +from qdrant_client.models import Filter, FieldCondition, MatchValue, Range + +mcp = FastMCP("project-rag") +voyage = Client(api_key=os.environ["VOYAGE_API_KEY"]) +qdrant = QdrantClient(path="./rag-data/qdrant") +COLLECTION = "project_md" + +@mcp.tool() +def rag_retrieve( + query: str, + scope: str = "all", + k: int = 5 +) -> list[dict]: + """ + Semantic search MD context. + + Args: + query: Search query (Vietnamese hoặc English, mix OK) + scope: Filter by doc_type: + "all" | "session_log" | "gotcha" | "memory" | + "skill" | "agent" | "doc" + k: Top chunks to return (1-15, default 5) + + Returns: + List[dict] với keys: content, source, doc_type, score + + Use cases: + - Historical session log: rag_retrieve("Mig 26 V2", scope="session_log") + - Gotcha lookup: rag_retrieve("silent 403", scope="gotcha") + - Pattern reuse: rag_retrieve("audit clone", scope="memory") + - Cross-section: rag_retrieve("query", scope="all", k=10) + """ + k = min(max(k, 1), 15) + + # Embed query + query_vec = voyage.embed( + [query], model="voyage-3-large", input_type="query" + ).embeddings[0] + + # Filter + filter_dict = None + if scope != "all": + filter_dict = Filter( + must=[FieldCondition(key="doc_type", match=MatchValue(value=scope))] + ) + + # Search + results = qdrant.search( + collection_name=COLLECTION, + query_vector=query_vec, + query_filter=filter_dict, + limit=k + ) + + return [ + { + "content": r.payload["content"][:3000], # truncate huge + "source": r.payload["source"], + "doc_type": r.payload["doc_type"], + "score": round(r.score, 3) + } + for r in results + ] + +@mcp.tool() +def rag_stats() -> dict: + """Return collection stats (for audit).""" + info = qdrant.get_collection(COLLECTION) + return { + "total_chunks": info.points_count, + "vector_dim": info.config.params.vectors.size, + "distance": info.config.params.vectors.distance.value, + "indexed_at": info.optimizer_status, + } + +if __name__ == "__main__": + # Default: stdio mode for Claude Code + # HTTP/SSE mode: python rag-mcp-server.py --http :7777 + if "--http" in sys.argv: + port = int(sys.argv[sys.argv.index("--http") + 1].lstrip(":")) + mcp.run(transport="sse", port=port) + else: + mcp.run() # stdio default +``` + +### 6.4 `.claude/settings.json` register + +```jsonc +{ + "mcpServers": { + "project-rag": { + "command": "python", + "args": ["scripts/rag-mcp-server.py"], + "cwd": "${workspaceFolder}", + "env": { + "VOYAGE_API_KEY": "${env:VOYAGE_API_KEY}" + } + } + } +} +``` + +### 6.5 Pre-commit hook + +```bash +#!/bin/sh +# .git/hooks/pre-commit +# Re-index changed MD files +changed_md=$(git diff --cached --name-only --diff-filter=AMR | grep -E "\.md$") +if [ -n "$changed_md" ]; then + echo "RAG re-indexing $(echo "$changed_md" | wc -l) MD files..." + python scripts/rag-indexer.py --files "$changed_md" +fi +``` + +### 6.6 Agent .md frontmatter update + +```yaml +# Mỗi .claude/agents/{agent}.md thêm tool: +tools: [Read, Grep, Glob, Bash, mcp__project-rag__rag_retrieve, ...] +``` + +System prompt section thêm: + +```markdown +## RAG retriever usage (rag_retrieve tool) + +**WHEN to use:** +- Historical session log lookup (> 1 tuần cũ) +- Gotcha pattern matching debug +- Memory pattern reuse "clone X sang Y" +- Cross-section semantic search + +**WHEN to use Read instead:** +- Current state (STATUS + HANDOFF top) — blanket loaded +- Active file editing (cần full file) +- Architecture review (stable docs, blanket) + +**Query examples:** +- rag_retrieve("silent 403 non-admin", scope="gotcha", k=3) +- rag_retrieve("PE V2 wire pattern", scope="session_log", k=5) +- rag_retrieve("audit reuse clone", scope="memory", k=3) +``` + +--- + +## 7. Audit procedure + +### 7.1 Weekly quick audit (~30 phút, mỗi Saturday) + +**Mục tiêu:** Check health + cost trend hàng tuần. + +**Checklist:** + +```bash +# 1. Index health +curl http://localhost:6333/collections/project_md +# Verify: points_count tăng + status="green" + +# 2. Re-index lag +git log --since="1 week ago" --name-only --pretty=format: | grep -E "\.md$" | sort -u | wc -l +python -c " +from qdrant_client import QdrantClient +q = QdrantClient(path='./rag-data/qdrant') +# Check sources có matching files changed +" + +# 3. Voyage cost +# Visit voyageai.com dashboard → check last 7 days usage +# Target: <$1/week steady state + +# 4. Random query quality (manual 5 query) +# Sample queries: +# - "Recent Mig" → expect session log top +# - "silent 403" → expect gotcha #44 top +# - "audit reuse" → expect memory entry top +# Score: 1-5 mỗi query (relevant chunks trong top-5) + +# 5. Storage size +du -sh ./rag-data/ +# Target: <500MB per project +``` + +**Log:** `docs/changelog/rag-audit-weekly-{YYYY-WW}.md` (1 page) + +### 7.2 Monthly deep audit (~2-3h, mỗi đầu tháng) + +**Mục tiêu:** Quality benchmark + chunking review + stale cleanup. + +**Checklist:** + +```python +# 1. Quality benchmark — 30 query test set +test_queries = [ + # Categories: state, historical, debug, pattern, cross-stack + ("Phase hiện tại", "doc"), + ("Mig 26 PE Level Opinions UPSERT", "session_log"), + ("silent 403 non-admin Forbidden", "gotcha"), + ("audit reuse trước clone B từ A", "memory"), + # ... 30 total covering all scopes +] + +results = [] +for query, expected_scope in test_queries: + retrieved = rag_retrieve(query, k=10) + # Manual score: + # - Recall: % expected sources trong top-10 + # - Precision: % retrieved chunks actually relevant + results.append({"query": query, "recall": ..., "precision": ...}) + +# Target: avg recall > 80%, precision > 75% + +# 2. Chunking review — sample 10 random chunks +# Check: chunks có bị cắt giữa narrative không (vi phạm §6.5) +# Action: tune chunker nếu phát hiện issues + +# 3. Stale audit +# Files chưa re-index > 14 days → flag +# Files đã xóa khỏi repo nhưng còn trong Qdrant → cleanup + +# 4. Cost trend +# Monthly Voyage spend vs target +# Target: <$3/month steady + +# 5. Capacity check +# Total chunks vs disk space projection +# Project có grow size đáng kể (>20% MoM) → plan scale +``` + +**Log:** `docs/changelog/rag-audit-monthly-{YYYY-MM}.md` (2-3 pages) + +### 7.3 Quarterly major audit (~4-6h, mỗi quý) + +**Mục tiêu:** Strategic review + major upgrades. + +**Checklist:** + +1. **Embedding model upgrade decision** + - Voyage có model mới? Test side-by-side với voyage-3-large + - Quality benchmark trên 30 query test set + - Decision: upgrade nếu recall +5pp + +2. **Chunking strategy iteration** + - Review 50 random chunks + - Identify patterns: cắt sai, overlap missing, metadata thiếu + - Tune chunker code → re-index full + +3. **Collection re-build từ scratch** + - Backup current → drop collection → re-index all + - Mục đích: clean orphan chunks + apply new chunking + - Effort: ~30 phút for 1M MD + +4. **Multi-AI client access audit** + - Active clients (Claude Code / Desktop / GPT / Cursor) + - Per-client query volume + token spend + - Security: rotate auth tokens, review rate limits + +5. **Cross-project namespace audit** (nếu multi-project) + - Project isolation working correctly? + - Cross-project query intentional vs accidental? + - Adjust metadata filter rules + +**Log:** `docs/changelog/rag-audit-quarterly-{YYYY-Q}.md` (5-10 pages) + +### 7.4 Trigger-based audit (ad-hoc) + +| Trigger | Action | +|---|---| +| Retrieval miss critical (em main báo) | Audit chunk relevant tại sao miss + tune | +| Cost spike >50% MoM | Audit query patterns + rate limit clients | +| Re-index hang >1h | Audit indexer logs + Qdrant health | +| Quality regression em main observe | Spot-check + monthly audit sớm | +| New project added | Setup namespace + initial index audit | + +--- + +## 8. Multi-AI client access + +### 8.1 MCP protocol — agnostic + +MCP (Model Context Protocol) là **standard protocol**. Bất kỳ AI client nào support MCP đều consume cùng 1 server: + +``` + Qdrant (single source) + ↓ + MCP server :7777 (HTTP/SSE) + ↙ ↓ ↓ ↘ + Claude Code Claude Cursor GPT-4 + + Desktop IDE custom adapter +``` + +### 8.2 Transport modes + +| Mode | Use case | Setup | +|---|---|---| +| **stdio** | Single client (Claude Code local) — default | `python rag-mcp-server.py` | +| **HTTP/SSE** | Multi-client (network access) | `python rag-mcp-server.py --http :7777` | +| **WebSocket** | Bi-directional (rare) | Custom config | + +### 8.3 Setup multi-AI mode + +**Step 1: Run MCP server HTTP mode** + +```bash +# Terminal 1: MCP server (keep running) +export VOYAGE_API_KEY="pa-xxxx" +python scripts/rag-mcp-server.py --http :7777 + +# Server endpoint: http://localhost:7777/sse +``` + +**Step 2: Add auth middleware (recommend cho multi-client)** + +```python +# Update rag-mcp-server.py +from fastmcp import FastMCP +from fastmcp.middleware import bearer_auth + +ALLOWED_TOKENS = { + "claude-code-token": "claude-code-primary", + "gpt4-token": "gpt4-cursor-integration", + "custom-agent-token": "custom-research-agent", +} + +mcp = FastMCP("project-rag", middleware=[ + bearer_auth(tokens=ALLOWED_TOKENS, rate_limit_per_minute=30) +]) +``` + +**Step 3: Register per-client config** + +#### Claude Code (em main + 4 agents) +```jsonc +// .claude/settings.json +{ + "mcpServers": { + "project-rag": { + "transport": "sse", + "url": "http://localhost:7777/sse", + "headers": { + "Authorization": "Bearer claude-code-token" + } + } + } +} +``` + +#### Claude Desktop +```jsonc +// claude_desktop_config.json +{ + "mcpServers": { + "project-rag": { + "transport": "sse", + "url": "http://localhost:7777/sse", + "headers": { + "Authorization": "Bearer claude-desktop-token" + } + } + } +} +``` + +#### Cursor IDE +```jsonc +// .cursor/settings.json +{ + "mcp.servers": { + "project-rag": { + "transport": "sse", + "url": "http://localhost:7777/sse" + } + } +} +``` + +#### GPT-4 via custom adapter +```python +# Use OpenAI Assistants API + custom function calling +import requests + +def query_project_rag(query: str, scope: str = "all", k: int = 5): + response = requests.post( + "http://localhost:7777/tool/rag_retrieve", + headers={"Authorization": "Bearer gpt4-token"}, + json={"query": query, "scope": scope, "k": k} + ) + return response.json() + +# Register as OpenAI function tool +``` + +#### Continue.dev / custom agent +```yaml +# config.yaml +mcp_servers: + - name: project-rag + transport: sse + url: http://localhost:7777/sse + auth_token: custom-agent-token +``` + +### 8.4 Security model multi-AI + +| Concern | Mitigation | +|---|---| +| Token leak | Rotate quarterly, store in env vars | +| Rate limit abuse | 30 req/min/token default, tune per client | +| Read-only enforcement | MCP server expose only `rag_retrieve` + `rag_stats` (no write tools) | +| Audit log | Log every query: timestamp + client_token + query + result_count | +| Cross-project leak | Per-collection access control (future enhancement) | + +### 8.5 Cost considerations multi-AI + +``` +Single Claude Code client (current): + Voyage cost: ~$0.20/month (low query volume) + Qdrant: free local + +4 AI clients heavy use (Claude Code + Desktop + Cursor + GPT-4): + Voyage cost: ~$2-5/month (higher query volume) + Network bandwidth: minimal (~100KB/query response) + CPU: Qdrant + Voyage embed call ~100ms total + +→ Multi-AI access scale linearly với query volume, not infrastructure cost. +``` + +### 8.6 Recommend rollout + +``` +Phase 1 (Week 1-4): Single client (Claude Code only) + → Validate quality + cost baseline + +Phase 2 (Month 2+): Add Claude Desktop nếu cần mobile/casual access + → Same auth, share collection + +Phase 3 (Month 3+): Add Cursor IDE nếu work multi-IDE + → Verify no cross-tool conflicts + +Phase 4 (Future): GPT-4 / custom agent integration nếu cần + → Custom adapter + auth strict +``` + +--- + +## 9. Timeline rollout + +### Hour-by-hour breakdown (~10-14h dedicated session) + +| Hour | Task | Effort | +|---|---|---| +| **1-2** | Setup pre-flight: disk cleanup + Voyage signup + Python deps install | ~2h | +| **3-4** | Write `scripts/rag-indexer.py` + run initial embed | ~2h | +| **5** | Verify Qdrant collection + manual query sanity check | ~1h | +| **6-7** | Write `scripts/rag-mcp-server.py` + register `.claude/settings.json` | ~2h | +| **8** | Test rag_retrieve qua Claude Code (em main solo) | ~1h | +| **9-10** | Update 4 agent .md frontmatter + system prompt sections | ~2h | +| **11** | Setup pre-commit hook + audit logging | ~1h | +| **12-14** | Buffer + trial 10-15 query measure quality + cost | ~3h | + +### Trial 4-week plan + +``` +Week 1: Pilot single project (smaller of 2) + - Day 1-2: Setup + initial index + - Day 3-7: Active use + measure baseline metrics + - Deliverable: rag-audit-weekly-W1.md + +Week 2: Roll out 2nd project + - Day 1: Setup separate Qdrant collection + - Day 2-7: Dual-project use measure + - Deliverable: rag-audit-weekly-W2.md + +Week 3: 4-agent integration + - Day 1-2: Update 4 agent .md với rag_retrieve tool + - Day 3-7: Multi-agent task measure shared cache benefit + - Deliverable: rag-audit-weekly-W3.md + +Week 4: Decision gate (keep / tune / upgrade B / rollback) + - Day 1-2: Compile metrics + - Day 3: Decision meeting (bro + em main) + - Day 4-7: Apply decision (tune embedding/chunking OR upgrade Option B OR rollback) + - Deliverable: rag-audit-monthly-M1.md + decision doc +``` + +### Decision gate Week 4 + +``` +PASS criteria (continue + tune): + ✅ Quality recall > 80% on 30 query benchmark + ✅ Cost < $5/month total (Voyage + storage) + ✅ Session lifespan tăng > 30% (heavy session) + ✅ Multi-agent shared cache hit > 60% + ✅ Retrieval miss critical < 10% queries + ✅ Storage < 1GB per project + +TUNE criteria (continue + adjust): + ⚠️ Quality 70-80% → tune chunking or upgrade embedding + ⚠️ Cost 5-10/mo → audit query patterns, reduce k + ⚠️ Session lifespan tăng < 30% → audit blanket effectiveness + +ROLLBACK criteria (archive RAG): + ❌ Quality < 70% + ❌ Cost > $10/mo recurring + ❌ Session lifespan KHÔNG tăng or giảm + ❌ Em main complain "miss context" thường xuyên + ❌ Storage > 5GB per project +``` + +--- + +## 10. Caveats + risks + +### 10.1 Beta features risk + +| Feature | Status | Mitigation | +|---|---|---| +| Anthropic Memory tool | Beta `content-management-2025-06-27` | Defer until GA, use MEMORY.md current | +| Anthropic Files API | Beta `files-api-2025-04-14` | Optional add-on, RAG primary | +| Extended 1h prompt cache | Beta `extended-cache-ttl-2025-04-11` | Use 5min default, opt-in 1h khi heavy session | +| Voyage AI API | Stable | Production OK | +| Qdrant local | Stable | Production OK | +| FastMCP | Stable v2+ | Production OK | + +### 10.2 Storage concerns + +``` +Bro hiện tại: 911/954 GB used = 96% full (43GB free) + +RAG storage budget: + Qdrant binary: ~50MB + Per project index: ~200-500MB (depend MD volume) + Backup snapshots: ~500MB + Logs + audit: ~100MB + +Per project total: ~1GB +2 projects total: ~2GB ++ buffer 1GB += 3GB recommend free space + +→ Cleanup TRƯỚC setup: target 5GB+ free +``` + +**Cleanup priorities:** +- `node_modules` projects cũ +- `.NET bin/obj` artifacts +- Docker images (`docker system prune -a`) +- Browser caches (Chrome/Edge ~5GB common) +- `%LOCALAPPDATA%` caches (NuGet, dotnet) +- Downloads / Videos không dùng + +### 10.3 Quality monitoring + +| Risk | Indicator | Action | +|---|---|---| +| Chunking break narrative | Em main report "miss context" | Review chunk strategy, tune | +| Embedding drift | Recall drop > 10pp benchmark | Re-embed full, check Voyage updates | +| Stale index | Files commit chưa re-index | Force re-index full, check hook | +| Query phrasing kém | Low precision on simple queries | Em main refine query patterns | +| Cross-language mismatch | Vietnamese query miss English content | Multilingual reranker hoặc query expansion | + +### 10.4 Fallback strategy + +``` +Khi RAG fail / quality drop: + Layer 1: Em main fallback to Read full file (existing lazy pattern still works) + Layer 2: Em main blanket load critical file directly + Layer 3: Rollback Qdrant snapshot (weekly backup) + Layer 4: Full re-index từ scratch (~15 phút) + Layer 5: Archive RAG, return lazy current pattern (ultimate fallback) +``` + +Em main blanket 120K KHÔNG bị mất khi RAG fail → graceful degradation. + +### 10.5 Vietnamese-English mix considerations + +``` +Voyage-3-large multilingual claim 26 lang coverage. +Vietnamese explicit benchmark KHÔNG public. + +Risk: technical jargon Việt-Anh mix có thể miss synonym. + Ví dụ: "im lặng 403" vs "silent 403" — vector có gần nhau không? + +Mitigation: + - Test 10-20 Việt-Anh mix queries trong audit benchmark + - Nếu recall low → consider voyage-multilingual-2 backup + - Hoặc add query expansion (Anthropic Contextual Retrieval pattern) +``` + +--- + +## 11. Success metrics + +### 11.1 Quality metrics + +| Metric | Target | Measurement | +|---|---:|---| +| Recall avg (30 query benchmark) | > 80% | Manual score weekly | +| Precision avg | > 75% | Manual score weekly | +| Retrieval miss critical rate | < 10% | Em main report cumulative | +| Cross-language query recall | > 70% | Việt-Anh mix benchmark | + +### 11.2 Cost metrics + +| Metric | Target | Measurement | +|---|---:|---| +| Voyage monthly spend | < $5 | Voyage dashboard | +| Total RAG infra cost | < $10/month | Sum tools | +| Cost per query | < $0.001 | Calculated | +| Disk usage per project | < 1GB | `du -sh` | + +### 11.3 Performance metrics + +| Metric | Target | Measurement | +|---|---:|---| +| Query latency (P50) | < 200ms | MCP server log | +| Query latency (P99) | < 500ms | MCP server log | +| Re-index lag (post-commit) | < 30s | Pre-commit hook timing | +| Cache hit rate (multi-agent) | > 60% | Custom metric | + +### 11.4 Capacity metrics + +| Metric | Target | Measurement | +|---|---:|---| +| Session lifespan productive | +50% vs lazy | Time tracker | +| Tasks before lost-in-middle | > 35 | Task counter | +| Heavy session token | -20% vs lazy | Anthropic dashboard | +| Multi-agent overlap saving | > 50K/session | Cumulative calc | + +### 11.5 Multi-AI client metrics + +| Metric | Target | Measurement | +|---|---:|---| +| Active clients | ≥ 1 stable | Audit log | +| Per-client query volume | Track baseline | Audit log per client | +| Cross-client conflict | 0 | Bug reports | + +--- + +## 12. Future enhancements + +### 12.1 Phase 2 (after Week 4 validation) + +| Enhancement | Effort | Benefit | +|---|---|---| +| Upgrade Option B (drop blanket 30-40K) | 1 session | Saving +15% tokens | +| Anthropic Memory tool integration | 2-3h | Native cross-conversation memory | +| Files API integration | 2-3h | Reduce blanket re-upload cost | +| Citations enable | 1h | RAG quality trace | + +### 12.2 Phase 3 (Month 2-3) + +| Enhancement | Effort | Benefit | +|---|---|---| +| Hybrid BM25 + vector search (Contextual Retrieval) | 4-6h | +49-67% recall (Anthropic doc) | +| Multi-project namespace | 2-3h | Cross-project query với strict isolation | +| Reranker model (Cohere rerank-3) | 2-3h | +10-20% precision | +| Custom Streamlit audit dashboard | 4-5h | Visual quality monitoring | + +### 12.3 Phase 4 (Quarter 2+) + +| Enhancement | Effort | Benefit | +|---|---|---| +| Replace Voyage với Anthropic native embedding (if GA) | 2-3h | Reduce vendor count | +| Auto-tuning chunking (LLM-aided) | 1 week | Quality+ | +| Federated multi-machine setup | 1 week | Team usage | +| Time-series analytics on retrieval patterns | 1 week | Insights | + +### 12.4 Defer indefinitely (over-engineering) + +- ❌ LangChain / LlamaIndex framework (heavy abstraction) +- ❌ Self-host LLM (cost > value) +- ❌ Custom embedding model fine-tuning (effort > value) +- ❌ Full text + vector hybrid index (use Voyage Reranker instead) + +--- + +## 📚 References + tools + +### Anthropic official +- [Memory tool docs](https://platform.claude.com/docs/en/agents-and-tools/tool-use/memory-tool) +- [Prompt caching guide](https://platform.claude.com/docs/en/build-with-claude/prompt-caching) +- [Files API](https://platform.claude.com/docs/en/build-with-claude/files) +- [Contextual Retrieval cookbook](https://platform.claude.com/cookbook/capabilities-contextual-embeddings-guide) +- [Effective context engineering](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) +- [Agent SDK overview](https://code.claude.com/docs/en/agent-sdk/overview) + +### Tools docs +- [Qdrant docs](https://qdrant.tech/documentation/) +- [Voyage AI pricing](https://docs.voyageai.com/docs/pricing) +- [FastMCP](https://github.com/jlowin/fastmcp) +- [MCP servers list](https://github.com/modelcontextprotocol/servers) + +### Project memory +- `feedback_md_compact_narrative.md` (§6.5 rule — KEEP narrative) +- `feedback_multi_agent_setup.md` (4-agent discipline) +- `feedback_drastic_refactor_scope.md` (RAG setup = dedicated session) +- `feedback_uat_skip_verify.md` (Phase 9 UAT mode) + +--- + +## ✅ Pre-implementation checklist + +``` +☐ Bro confirm 3 thông tin: + ☐ 2 dự án path (để Investigator audit MD inventory pre-flight) + ☐ Stack 2 dự án (BE: .NET/Node/Python? FE: React/Vue?) + ☐ Pilot project chọn (smaller in 2) + +☐ Bro prepare environment: + ☐ Disk cleanup 5GB+ free (current 911/954 = 96% full) + ☐ Voyage AI account signup + API key + ☐ Python 3.10+ installed + ☐ Git installed (cho pre-commit hook) + +☐ Bro schedule dedicated session: + ☐ 10-14h block 1 ngày cuối tuần (memory feedback_drastic_refactor_scope rule) + ☐ Reserve weekly cap ~30% cho RAG setup spawn cost + +☐ Bro review plan: + ☐ Read full this file + ☐ Confirm scope blanket vs RAG store match needs + ☐ Confirm tool stack acceptable + ☐ Approve Week 1-4 trial timeline +``` + +--- + +## 📝 Notes — keep updated + +- **2026-05-12 turn 1:** Plan saved sau S21 turn 1 chốt cicd-monitor. Cross-project reference cho 2 dự án future bro > 1M MD. SOLUTION_ERP baseline ~354K MD (chưa cần RAG, defer). +- **Status:** 📝 PLAN ONLY — chưa implement +- **Next trigger:** Bro confirm 3 thông tin → spawn 🔵 Investigator audit MD inventory 2 dự án → tinh chỉnh blanket list cho từng project