Plan comprehensive cho future 2 dự án bro > 1M MD context (SOLUTION_ERP baseline reference, chưa cần implement vì 354K < threshold). 12 sections: 1. Context + Why (problem + solution + benefits table) 2. Architecture overview (6-layer diagram: blanket + Qdrant + Voyage + MCP + multi-AI + re-index) 3. BLANKET load list ~100K (28%) — 5 categories: core stable + current state top + agent infra + skills desc + memory critical 4. RAG store list ~254K (72%) — 8 categories: session logs (49%) + gotchas + archives + flows/database + skills detail + memory non-critical + guides + audit 5. Tool stack recommend — Qdrant + Voyage-3-large + FastMCP Python + custom chunker + pre-commit hook 6. Setup scripts copy-paste ready (~250 LOC Python total: indexer + MCP server + settings + hook + agent .md update) 7. Audit procedure 3-tier cadence — weekly quick (~30min) + monthly deep (~2-3h) + quarterly major (~4-6h) + trigger-based ad-hoc 8. Multi-AI client access — MCP protocol agnostic, stdio/HTTP/SSE transport, bearer auth + rate limit, setup per client (Claude Code/Desktop/Cursor/GPT-4) 9. Timeline rollout — 10-14h dedicated session + 4-week trial plan + decision gate PASS/TUNE/ROLLBACK criteria 10. Caveats + risks — beta features + storage 96% full warning + quality monitoring + fallback graceful 11. Success metrics — quality (recall >80%, precision >75%) + cost (<$5/mo) + performance (P50<200ms) + capacity (+50% session lifespan) + multi-AI 12. Future enhancements — Phase 2 (Memory tool + Files API) → Phase 3 (Contextual Retrieval + multi-project) → defer over-engineering Status: PLAN ONLY — chưa implement. Next trigger: bro confirm 3 thông tin (2 dự án path + stack + pilot choice) → spawn Investigator audit MD inventory pre-flight. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
42 KiB
RAG Setup Plan — Cross-project reference
Mục đích: Plan setup Hybrid RAG (Option A) cho project có MD context > 1M tokens. Cross-project applicable — SOLUTION_ERP làm baseline reference, future 2 dự án bro apply pattern này. Last updated: 2026-05-12 (Session 21 turn 1+) Status: 📝 Plan saved — chưa implement, target Week 1-4 trial 2 dự án future Owner: pqhuy1987@gmail.com + Claude (em main + 4 sub-agents)
📋 Table of Contents
- Context + Why
- Architecture overview
- BLANKET load list (~100K tokens, 28%)
- RAG store list (~254K tokens, 72%)
- Tool stack recommend
- Setup scripts (copy-paste ready)
- Audit procedure (3-tier cadence)
- Multi-AI client access
- Timeline rollout (~10-14h dedicated)
- Caveats + risks
- Success metrics + decision gate
- Future enhancements
1. Context + Why
Problem statement
Hiện tại lazy blanket pattern (em main + 4 agents):
- Em main vác ~120K MD upfront (35% project)
- Lazy Read khi cần — em main TỰ ĐOÁN file relevant
- 4 agents mỗi spawn ~188K cache WRITE
- Heavy session ~700K effective billed
- Lost-in-middle threshold đạt sau ~5.75h productive
Scale-up to 2 projects > 1M MD tokens each:
❌ Blanket KHÔNG khả thi (vượt 1M context cap)
❌ Lazy Read recall ~30-60% (em main miss file không nghĩ tới)
❌ 4 agents duplicate Read same files (cumulative ~240K wasted)
❌ Vietnamese-English synonym miss (grep keyword only)
❌ Cross-project context impossible without manual switching
Solution
Hybrid RAG Option A — blanket critical + retrieve on-demand:
KEEP blanket: ~100K static (core stable + current state + agent + skills + memory critical)
ADD RAG layer: 70% MD remaining accessible via semantic retrieve
SHARE cache: 4 agents reuse retrieved chunks (multi-agent leverage)
Benefits chốt từ analysis sessions trước
| Metric | Lazy current | Option A | Δ |
|---|---|---|---|
| Quality recall | 30-60% | 85% | +25-55pp |
| Heavy session token | 700K | 560K | -20% |
| Session productive hours | 5.75h | 7.6h | +1.85h |
| Tasks before lost-in-middle | ~23 | ~38 | +65% |
| Net successful tasks/session | 25 | 50 | 2× |
| Multi-agent shared cache | ❌ | ✅ 60-90% cache hit | leverage real |
| Việt-Anh semantic search | ❌ grep only | ✅ Voyage multilingual | unlock |
| Scale > 1M MD | ❌ break | ✅ work | enable |
Trade-off
- ⚠️ Setup cost: ~10-14h dedicated session (1 lần invest)
- ⚠️ Maintenance: ~30 phút/tuần audit
- ⚠️ Beta features (Memory tool, Files API): có thể breaking change
- ⚠️ Retrieval miss risk ~5-10% (mitigated bằng citations + fallback Read)
- ⚠️ Voyage API cost: ~$0.36 initial embed + ~$0.20/tháng delta
2. Architecture overview
┌─────────────────────────────────────────────────────────────┐
│ LAYER 1 — Static blanket (cache hot, 5min-1h TTL) │
├─────────────────────────────────────────────────────────────┤
│ Em main + 4 sub-agents auto-inject ~100K core context: │
│ • rules.md, architecture.md, CLAUDE.md, PROJECT-MAP │
│ • STATUS top 100 line, HANDOFF top 150 line │
│ • 5 agent .md (README + 4 agent identity) │
│ • 5 SKILL.md descriptions (auto-inject) │
│ • 5 memory entries critical cross-cutting │
└─────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────┐
│ LAYER 2 — Vector DB retrieve on-demand │
├─────────────────────────────────────────────────────────────┤
│ Qdrant local (~50MB binary, ~200MB index per project): │
│ • Session logs cumulative (49% MD, biggest) │
│ • Gotchas detail (chunk per entry) │
│ • Archives + Recently Done + Migration-todos │
│ • Flows + Database guides │
│ • SKILL.md detail (description đã trong blanket) │
│ • Memory entries non-critical │
│ • Guides ops conditional │
└─────────────────────────────────────────────────────────────┘
↑
┌─────────────────────────────────────────────────────────────┐
│ LAYER 3 — Embedding service (Voyage AI cloud) │
├─────────────────────────────────────────────────────────────┤
│ voyage-3-large multilingual 26 lang (Việt-Anh tốt): │
│ • Index time: embed chunks → vectors (one-time + delta) │
│ • Query time: embed query → search Qdrant top-K │
│ • Cost: $0.18/M tokens, ~$0.36 init + ~$0.20/month │
└─────────────────────────────────────────────────────────────┘
↕
┌─────────────────────────────────────────────────────────────┐
│ LAYER 4 — MCP retriever server (FastMCP Python) │
├─────────────────────────────────────────────────────────────┤
│ Tool exposed: rag_retrieve(query, scope, k, time_range) │
│ Transport: stdio (Claude Code) hoặc HTTP/SSE (multi-AI) │
│ Auth: API key per client (multi-AI mode) │
└─────────────────────────────────────────────────────────────┘
↕
┌─────────────────────────────────────────────────────────────┐
│ LAYER 5 — Multi-AI clients │
├─────────────────────────────────────────────────────────────┤
│ Claude Code (em main + 4 agents) — primary │
│ Claude Desktop — secondary │
│ GPT-4 / Cursor / Continue / Custom agent — optional │
└─────────────────────────────────────────────────────────────┘
↑
┌─────────────────────────────────────────────────────────────┐
│ LAYER 6 — Re-index pipeline │
├─────────────────────────────────────────────────────────────┤
│ Pre-commit hook: delta re-index changed MD │
│ Weekly full re-index: catch missed (Saturday off-peak) │
│ Batch API 50% discount cho mass re-index │
└─────────────────────────────────────────────────────────────┘
Flow time index (1 lần init + delta)
1. Walk filesystem → docs/ + .claude/ + memory/
2. Chunk adaptive theo doc_type (custom Python chunker)
3. Batch embed via Voyage API (128 chunks/batch)
4. Upsert Qdrant với metadata (source, doc_type, project, last_modified)
5. Total init: ~10-15 phút cho 1M MD tokens
Flow query time (mỗi spawn em main hoặc agent)
1. Em main/agent: rag_retrieve("query keyword", scope, k)
2. MCP server: embed query → Voyage API (~100ms)
3. MCP server: Qdrant search top-K (~50ms local)
4. MCP server: return chunks với metadata + score
5. Total: ~150-200ms per query (network-bound)
6. Cache: subsequent same query → ~10ms (cache hit)
3. BLANKET load list
Total: ~100K tokens (28% project MD) Auto-load mỗi spawn em main + 4 agents.
A. Core stable docs (~30K — KHÔNG đổi thường xuyên)
| File | Token | Lý do blanket |
|---|---|---|
docs/rules.md |
~7K | Coding conventions stable, mọi task reference |
CLAUDE.md (root pointer) |
~3K | Auto-inject system reminder |
docs/CLAUDE.md |
~3K | Tech stack overview baseline |
docs/architecture.md |
~7K | 4-layer Clean Arch baseline |
docs/PROJECT-MAP.md |
~3K | Bản đồ navigate |
docs/workflow-contract.md |
~4K | State machine 9 phase Contract domain core |
docs/forms-spec.md |
~3K | 8 form catalog domain knowledge |
B. Current state (~25K — em main biết direct, không cần retrieve)
| File | Strategy | Token |
|---|---|---|
docs/STATUS.md top 100 line |
Current phase + In Progress + 1-2 Recently Done top | ~15K |
docs/HANDOFF.md top 150 line |
Last updated + TL;DR latest session + next priority | ~10K |
→ Drop từ blanket: STATUS Recently Done > 5 row cũ (retrieve nếu cần), HANDOFF TL;DR cũ > 1 tuần.
C. Agent infrastructure (~25K — agent identity stable)
| File | Token |
|---|---|
.claude/agents/README.md |
~5K |
.claude/agents/investigator.md |
~3.5K |
.claude/agents/implementer.md |
~4K |
.claude/agents/reviewer.md |
~3.5K |
.claude/agents/cicd-monitor.md |
~5K |
.claude/agent-memory/{4 agents}/MEMORY.md auto-inject 25KB first 200 lines |
~4K total |
D. Skills descriptions (~5K — auto-inject, không SKILL.md full)
| File | Strategy | Token |
|---|---|---|
.claude/skills/README.md |
Full | ~2.5K |
| 6 SKILL.md descriptions | Auto-inject by Claude Code | ~1K total |
| 6 SKILL.md detail | KHÔNG blanket → RAG retrieve khi triggered | — |
E. Memory user-level critical (~15K)
| File | Token | Lý do critical |
|---|---|---|
project_solution_erp.md |
~3.5K | Project overview narrative |
feedback_md_compact_narrative.md (§6.5) |
~2K | Rule cốt lõi mọi doc work |
feedback_uat_skip_verify.md |
~2K | Phase 9 current mode rule |
feedback_multi_agent_setup.md |
~3K | 4-agent discipline |
feedback_per_chunk_commit.md |
~2K | Implementer pattern reusable |
feedback_audit_reuse_before_clone.md |
~2K | Investigator natural pattern |
→ Drop từ blanket: 11 memory entries còn lại (retrieve khi pattern triggered).
TOTAL BLANKET ≈ 100K tokens
4. RAG store list
Total: ~254K tokens (72% project MD) Index vào Qdrant, retrieve on-demand.
F. Session logs (~150K — biggest, 49% MD)
Path: docs/changelog/sessions/*.md (41+ files growing)
Chunk strategy: 1 file = 1 chunk (preserve narrative §6.5)
Metadata:
- session_date: extracted from filename
- phase: extracted from content
- topic: extracted from H1
- commit_sha_range: extracted from "Commits:" line
- doc_type: "session_log"
Scope filter: time_range="last_week|last_month|last_quarter|all"
G. Gotchas (~9K — lookup per debug)
Path: docs/gotchas.md (44+ entries)
Chunk strategy: split per "### N. ..." numbered heading
Metadata:
- gotcha_id: integer
- category: extracted from content (tech/EF/Workflow/CICD/Security/...)
- doc_type: "gotcha"
Scope filter: scope="gotcha"
H. Archives + Recently Done (~75K)
| File | Strategy | Token |
|---|---|---|
docs/STATUS.md rest beyond top 100 |
Per H2 section | ~8K |
docs/HANDOFF.md rest beyond top 150 |
Per H2 section | ~21K |
docs/changelog/migration-todos.md |
Per H3 task | ~18K |
docs/changelog/recently-done-archive-*.md |
Per H3 phase | ~6K |
docs/_archive/forms-spec-raw.md |
Full file (cold archive) | ~23K |
docs/_archive/workflow-raw.md |
Full file (cold archive) | ~4K |
I. Flows + Database (~17K — conditional task)
| File | Token | Khi retrieve |
|---|---|---|
docs/flows/README.md |
~1K | Index khi cần flow |
docs/flows/auth-flow.md |
~1K | Task auth |
docs/flows/permission-flow.md |
~1.5K | Task permission |
docs/flows/contract-creation-flow.md |
~1.5K | Task Contract |
docs/flows/contract-approval-flow.md |
~1.5K | Task approval |
docs/flows/form-render-flow.md |
~1K | Task form |
docs/flows/sla-expiry-flow.md |
~1K | Task SLA |
docs/database/database-guide.md |
~3K | Task schema |
docs/database/schema-diagram.md |
~12K | Task ERD |
J. SKILL.md detail (~40K — retrieve khi skill triggered)
| File | Token |
|---|---|
.claude/skills/contract-workflow/SKILL.md |
~7K |
.claude/skills/form-engine/SKILL.md |
~5K |
.claude/skills/permission-matrix/SKILL.md |
~5K |
.claude/skills/dependency-audit-erp/SKILL.md |
~5K |
.claude/skills/ef-core-migration/SKILL.md |
~5.5K |
.claude/skills/iis-deploy-runbook/SKILL.md |
~6K |
K. Guides ops conditional (~10K)
| File | Token | Khi retrieve |
|---|---|---|
docs/guides/deployment-iis.md |
~2.5K | Task deploy |
docs/guides/cicd.md |
~2K | Task CI/CD |
docs/guides/security-checklist.md |
~2K | Audit security |
docs/guides/vps-setup.md |
~2.5K | Setup VPS |
docs/guides/runbook.md |
~1K | Ops debug |
L. Memory entries non-critical (~50K — pattern lookup)
11 memory entries còn lại (user-level):
- feedback_n_stage_workflow_pattern.md (DEPRECATED post-Mig 21)
- feedback_designtime_runtime_db.md
- feedback_drastic_refactor_scope.md
- feedback_cron_monthly_limitation.md
- feedback_user_manual_style.md
- feedback_node_cicd.md
- feedback_unittest_timing.md
- feedback_responsive_laptop_breakpoint.md
- feedback_service_hook_vs_endpoint.md
- reference_session_prompts.md
- MEMORY.md index
M. Audit logs (~2K, grow)
docs/changelog/skill-audit-{YYYY-MM}.md (monthly audit log)
TOTAL RAG STORE ≈ 254K tokens
5. Tool stack recommend
| Component | Tool | Reason | Cost |
|---|---|---|---|
| Vector DB | Qdrant local | Rust binary 50MB, no Docker, fast, metadata filtering, admin UI | $0 |
| Embedding | Voyage-3-large | Anthropic partner, multilingual 26 lang, no GPU needed | $0.18/M (~$0.36 init) |
| MCP server framework | FastMCP Python | Official Anthropic SDK, ~100 LOC, auto schema | $0 |
| Chunking | Custom Python adaptive | ~50 LOC, transparent, §6.5 compliant | $0 |
| Re-index pipeline | Pre-commit hook | Native git, ~10 LOC bash | $0 |
| Monitoring | Qdrant Dashboard + custom audit | Built-in UI port 6333 | $0 |
| Auth (multi-AI) | Bearer token + rate limit | Custom middleware ~30 LOC | $0 |
| Batch re-index | Voyage Batch API | 50% discount cho mass re-embed | -50% |
Stack rejected + lý do
| Alternative | Reason rejected |
|---|---|
| Chroma vector DB | Python ecosystem, slower than Qdrant Rust |
| pgvector | Cần PostgreSQL setup, overhead |
| OpenAI text-embedding-3-small | Vietnamese quality kém hơn Voyage |
| BGE-M3 local | Cần GPU >= 4GB (Intel Iris Xe không OK) |
| LangChain / LlamaIndex | Heavy abstraction, black-box debug khó, §6.5 chunker không tuân |
| TypeScript MCP SDK | Verbose hơn Python FastMCP |
| Pinecone cloud | Paid + vendor lock, không cần scale đó |
6. Setup scripts
6.1 requirements.txt
fastmcp>=2.0
voyageai>=0.3
qdrant-client>=1.12
python-frontmatter>=1.1
6.2 scripts/rag-indexer.py (~120 LOC)
"""
RAG Indexer — Embed MD files + upsert vào Qdrant.
Usage:
python rag-indexer.py # full index
python rag-indexer.py --files "a.md b.md" # delta re-index
"""
import os, glob, re, sys
from voyageai import Client
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
QDRANT_PATH = "./rag-data/qdrant"
COLLECTION = "project_md" # rename per project
EMBED_MODEL = "voyage-3-large"
DIM = 1024
voyage = Client(api_key=os.environ["VOYAGE_API_KEY"])
qdrant = QdrantClient(path=QDRANT_PATH)
def chunk_file(path: str) -> list[dict]:
"""Adaptive chunking theo doc type."""
content = open(path, encoding="utf-8").read()
base = {"source": path, "size_chars": len(content)}
if "/changelog/sessions/" in path:
return [{**base, "content": content, "doc_type": "session_log"}]
if path.endswith("gotchas.md"):
entries = re.split(r"^### (\d+)\.", content, flags=re.M)
return [
{**base, "content": f"### {entries[i]}.{entries[i+1]}",
"doc_type": "gotcha", "entry_id": int(entries[i])}
for i in range(1, len(entries), 2)
]
if "/skills/" in path:
return [{**base, "content": content, "doc_type": "skill"}]
if "/agents/" in path:
return [{**base, "content": content, "doc_type": "agent"}]
if path.endswith("MEMORY.md") or "/memory/" in path:
return [{**base, "content": content, "doc_type": "memory"}]
# Default: split per H2 heading
sections = re.split(r"^## ", content, flags=re.M)
return [
{**base, "content": ("## " + s) if i > 0 else s,
"doc_type": "doc", "section_idx": i}
for i, s in enumerate(sections) if len(s.strip()) > 200
]
def main(files: list[str] | None = None):
# Init collection (idempotent)
if not qdrant.collection_exists(COLLECTION):
qdrant.create_collection(
COLLECTION,
vectors_config=VectorParams(size=DIM, distance=Distance.COSINE)
)
# Determine paths
if files:
paths = files
else:
paths = (
glob.glob("docs/**/*.md", recursive=True) +
glob.glob(".claude/**/*.md", recursive=True)
)
paths = [p for p in paths
if "node_modules" not in p and "_user-guide" not in p]
# Chunk
chunks = []
for path in paths:
try:
chunks.extend(chunk_file(path))
except Exception as e:
print(f"Skip {path}: {e}")
print(f"Chunking: {len(chunks)} chunks from {len(paths)} files")
# Batch embed (Voyage max 128/batch)
texts = [c["content"] for c in chunks]
embeddings = []
for i in range(0, len(texts), 128):
batch = texts[i:i+128]
result = voyage.embed(batch, model=EMBED_MODEL, input_type="document")
embeddings.extend(result.embeddings)
print(f"Embedded {i+len(batch)}/{len(texts)}")
# Upsert (Qdrant auto-replaces by id)
points = [
PointStruct(
id=hash(c["source"] + str(c.get("section_idx", 0))) & 0xFFFFFFFF,
vector=emb,
payload=c
)
for c, emb in zip(chunks, embeddings)
]
qdrant.upsert(collection_name=COLLECTION, points=points)
print(f"Indexed {len(points)} chunks → Qdrant")
if __name__ == "__main__":
files = sys.argv[2].split() if len(sys.argv) > 2 and sys.argv[1] == "--files" else None
main(files)
6.3 scripts/rag-mcp-server.py (~80 LOC)
"""
MCP retriever server — Expose rag_retrieve tool cho Claude Code + agents.
Run: python rag-mcp-server.py (stdio default)
python rag-mcp-server.py --http :7777 (HTTP/SSE for multi-AI)
"""
import os, sys
from fastmcp import FastMCP
from voyageai import Client
from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue, Range
mcp = FastMCP("project-rag")
voyage = Client(api_key=os.environ["VOYAGE_API_KEY"])
qdrant = QdrantClient(path="./rag-data/qdrant")
COLLECTION = "project_md"
@mcp.tool()
def rag_retrieve(
query: str,
scope: str = "all",
k: int = 5
) -> list[dict]:
"""
Semantic search MD context.
Args:
query: Search query (Vietnamese hoặc English, mix OK)
scope: Filter by doc_type:
"all" | "session_log" | "gotcha" | "memory" |
"skill" | "agent" | "doc"
k: Top chunks to return (1-15, default 5)
Returns:
List[dict] với keys: content, source, doc_type, score
Use cases:
- Historical session log: rag_retrieve("Mig 26 V2", scope="session_log")
- Gotcha lookup: rag_retrieve("silent 403", scope="gotcha")
- Pattern reuse: rag_retrieve("audit clone", scope="memory")
- Cross-section: rag_retrieve("query", scope="all", k=10)
"""
k = min(max(k, 1), 15)
# Embed query
query_vec = voyage.embed(
[query], model="voyage-3-large", input_type="query"
).embeddings[0]
# Filter
filter_dict = None
if scope != "all":
filter_dict = Filter(
must=[FieldCondition(key="doc_type", match=MatchValue(value=scope))]
)
# Search
results = qdrant.search(
collection_name=COLLECTION,
query_vector=query_vec,
query_filter=filter_dict,
limit=k
)
return [
{
"content": r.payload["content"][:3000], # truncate huge
"source": r.payload["source"],
"doc_type": r.payload["doc_type"],
"score": round(r.score, 3)
}
for r in results
]
@mcp.tool()
def rag_stats() -> dict:
"""Return collection stats (for audit)."""
info = qdrant.get_collection(COLLECTION)
return {
"total_chunks": info.points_count,
"vector_dim": info.config.params.vectors.size,
"distance": info.config.params.vectors.distance.value,
"indexed_at": info.optimizer_status,
}
if __name__ == "__main__":
# Default: stdio mode for Claude Code
# HTTP/SSE mode: python rag-mcp-server.py --http :7777
if "--http" in sys.argv:
port = int(sys.argv[sys.argv.index("--http") + 1].lstrip(":"))
mcp.run(transport="sse", port=port)
else:
mcp.run() # stdio default
6.4 .claude/settings.json register
{
"mcpServers": {
"project-rag": {
"command": "python",
"args": ["scripts/rag-mcp-server.py"],
"cwd": "${workspaceFolder}",
"env": {
"VOYAGE_API_KEY": "${env:VOYAGE_API_KEY}"
}
}
}
}
6.5 Pre-commit hook
#!/bin/sh
# .git/hooks/pre-commit
# Re-index changed MD files
changed_md=$(git diff --cached --name-only --diff-filter=AMR | grep -E "\.md$")
if [ -n "$changed_md" ]; then
echo "RAG re-indexing $(echo "$changed_md" | wc -l) MD files..."
python scripts/rag-indexer.py --files "$changed_md"
fi
6.6 Agent .md frontmatter update
# Mỗi .claude/agents/{agent}.md thêm tool:
tools: [Read, Grep, Glob, Bash, mcp__project-rag__rag_retrieve, ...]
System prompt section thêm:
## RAG retriever usage (rag_retrieve tool)
**WHEN to use:**
- Historical session log lookup (> 1 tuần cũ)
- Gotcha pattern matching debug
- Memory pattern reuse "clone X sang Y"
- Cross-section semantic search
**WHEN to use Read instead:**
- Current state (STATUS + HANDOFF top) — blanket loaded
- Active file editing (cần full file)
- Architecture review (stable docs, blanket)
**Query examples:**
- rag_retrieve("silent 403 non-admin", scope="gotcha", k=3)
- rag_retrieve("PE V2 wire pattern", scope="session_log", k=5)
- rag_retrieve("audit reuse clone", scope="memory", k=3)
7. Audit procedure
7.1 Weekly quick audit (~30 phút, mỗi Saturday)
Mục tiêu: Check health + cost trend hàng tuần.
Checklist:
# 1. Index health
curl http://localhost:6333/collections/project_md
# Verify: points_count tăng + status="green"
# 2. Re-index lag
git log --since="1 week ago" --name-only --pretty=format: | grep -E "\.md$" | sort -u | wc -l
python -c "
from qdrant_client import QdrantClient
q = QdrantClient(path='./rag-data/qdrant')
# Check sources có matching files changed
"
# 3. Voyage cost
# Visit voyageai.com dashboard → check last 7 days usage
# Target: <$1/week steady state
# 4. Random query quality (manual 5 query)
# Sample queries:
# - "Recent Mig" → expect session log top
# - "silent 403" → expect gotcha #44 top
# - "audit reuse" → expect memory entry top
# Score: 1-5 mỗi query (relevant chunks trong top-5)
# 5. Storage size
du -sh ./rag-data/
# Target: <500MB per project
Log: docs/changelog/rag-audit-weekly-{YYYY-WW}.md (1 page)
7.2 Monthly deep audit (~2-3h, mỗi đầu tháng)
Mục tiêu: Quality benchmark + chunking review + stale cleanup.
Checklist:
# 1. Quality benchmark — 30 query test set
test_queries = [
# Categories: state, historical, debug, pattern, cross-stack
("Phase hiện tại", "doc"),
("Mig 26 PE Level Opinions UPSERT", "session_log"),
("silent 403 non-admin Forbidden", "gotcha"),
("audit reuse trước clone B từ A", "memory"),
# ... 30 total covering all scopes
]
results = []
for query, expected_scope in test_queries:
retrieved = rag_retrieve(query, k=10)
# Manual score:
# - Recall: % expected sources trong top-10
# - Precision: % retrieved chunks actually relevant
results.append({"query": query, "recall": ..., "precision": ...})
# Target: avg recall > 80%, precision > 75%
# 2. Chunking review — sample 10 random chunks
# Check: chunks có bị cắt giữa narrative không (vi phạm §6.5)
# Action: tune chunker nếu phát hiện issues
# 3. Stale audit
# Files chưa re-index > 14 days → flag
# Files đã xóa khỏi repo nhưng còn trong Qdrant → cleanup
# 4. Cost trend
# Monthly Voyage spend vs target
# Target: <$3/month steady
# 5. Capacity check
# Total chunks vs disk space projection
# Project có grow size đáng kể (>20% MoM) → plan scale
Log: docs/changelog/rag-audit-monthly-{YYYY-MM}.md (2-3 pages)
7.3 Quarterly major audit (~4-6h, mỗi quý)
Mục tiêu: Strategic review + major upgrades.
Checklist:
-
Embedding model upgrade decision
- Voyage có model mới? Test side-by-side với voyage-3-large
- Quality benchmark trên 30 query test set
- Decision: upgrade nếu recall +5pp
-
Chunking strategy iteration
- Review 50 random chunks
- Identify patterns: cắt sai, overlap missing, metadata thiếu
- Tune chunker code → re-index full
-
Collection re-build từ scratch
- Backup current → drop collection → re-index all
- Mục đích: clean orphan chunks + apply new chunking
- Effort: ~30 phút for 1M MD
-
Multi-AI client access audit
- Active clients (Claude Code / Desktop / GPT / Cursor)
- Per-client query volume + token spend
- Security: rotate auth tokens, review rate limits
-
Cross-project namespace audit (nếu multi-project)
- Project isolation working correctly?
- Cross-project query intentional vs accidental?
- Adjust metadata filter rules
Log: docs/changelog/rag-audit-quarterly-{YYYY-Q}.md (5-10 pages)
7.4 Trigger-based audit (ad-hoc)
| Trigger | Action |
|---|---|
| Retrieval miss critical (em main báo) | Audit chunk relevant tại sao miss + tune |
| Cost spike >50% MoM | Audit query patterns + rate limit clients |
| Re-index hang >1h | Audit indexer logs + Qdrant health |
| Quality regression em main observe | Spot-check + monthly audit sớm |
| New project added | Setup namespace + initial index audit |
8. Multi-AI client access
8.1 MCP protocol — agnostic
MCP (Model Context Protocol) là standard protocol. Bất kỳ AI client nào support MCP đều consume cùng 1 server:
Qdrant (single source)
↓
MCP server :7777 (HTTP/SSE)
↙ ↓ ↓ ↘
Claude Code Claude Cursor GPT-4 +
Desktop IDE custom adapter
8.2 Transport modes
| Mode | Use case | Setup |
|---|---|---|
| stdio | Single client (Claude Code local) — default | python rag-mcp-server.py |
| HTTP/SSE | Multi-client (network access) | python rag-mcp-server.py --http :7777 |
| WebSocket | Bi-directional (rare) | Custom config |
8.3 Setup multi-AI mode
Step 1: Run MCP server HTTP mode
# Terminal 1: MCP server (keep running)
export VOYAGE_API_KEY="pa-xxxx"
python scripts/rag-mcp-server.py --http :7777
# Server endpoint: http://localhost:7777/sse
Step 2: Add auth middleware (recommend cho multi-client)
# Update rag-mcp-server.py
from fastmcp import FastMCP
from fastmcp.middleware import bearer_auth
ALLOWED_TOKENS = {
"claude-code-token": "claude-code-primary",
"gpt4-token": "gpt4-cursor-integration",
"custom-agent-token": "custom-research-agent",
}
mcp = FastMCP("project-rag", middleware=[
bearer_auth(tokens=ALLOWED_TOKENS, rate_limit_per_minute=30)
])
Step 3: Register per-client config
Claude Code (em main + 4 agents)
// .claude/settings.json
{
"mcpServers": {
"project-rag": {
"transport": "sse",
"url": "http://localhost:7777/sse",
"headers": {
"Authorization": "Bearer claude-code-token"
}
}
}
}
Claude Desktop
// claude_desktop_config.json
{
"mcpServers": {
"project-rag": {
"transport": "sse",
"url": "http://localhost:7777/sse",
"headers": {
"Authorization": "Bearer claude-desktop-token"
}
}
}
}
Cursor IDE
// .cursor/settings.json
{
"mcp.servers": {
"project-rag": {
"transport": "sse",
"url": "http://localhost:7777/sse"
}
}
}
GPT-4 via custom adapter
# Use OpenAI Assistants API + custom function calling
import requests
def query_project_rag(query: str, scope: str = "all", k: int = 5):
response = requests.post(
"http://localhost:7777/tool/rag_retrieve",
headers={"Authorization": "Bearer gpt4-token"},
json={"query": query, "scope": scope, "k": k}
)
return response.json()
# Register as OpenAI function tool
Continue.dev / custom agent
# config.yaml
mcp_servers:
- name: project-rag
transport: sse
url: http://localhost:7777/sse
auth_token: custom-agent-token
8.4 Security model multi-AI
| Concern | Mitigation |
|---|---|
| Token leak | Rotate quarterly, store in env vars |
| Rate limit abuse | 30 req/min/token default, tune per client |
| Read-only enforcement | MCP server expose only rag_retrieve + rag_stats (no write tools) |
| Audit log | Log every query: timestamp + client_token + query + result_count |
| Cross-project leak | Per-collection access control (future enhancement) |
8.5 Cost considerations multi-AI
Single Claude Code client (current):
Voyage cost: ~$0.20/month (low query volume)
Qdrant: free local
4 AI clients heavy use (Claude Code + Desktop + Cursor + GPT-4):
Voyage cost: ~$2-5/month (higher query volume)
Network bandwidth: minimal (~100KB/query response)
CPU: Qdrant + Voyage embed call ~100ms total
→ Multi-AI access scale linearly với query volume, not infrastructure cost.
8.6 Recommend rollout
Phase 1 (Week 1-4): Single client (Claude Code only)
→ Validate quality + cost baseline
Phase 2 (Month 2+): Add Claude Desktop nếu cần mobile/casual access
→ Same auth, share collection
Phase 3 (Month 3+): Add Cursor IDE nếu work multi-IDE
→ Verify no cross-tool conflicts
Phase 4 (Future): GPT-4 / custom agent integration nếu cần
→ Custom adapter + auth strict
9. Timeline rollout
Hour-by-hour breakdown (~10-14h dedicated session)
| Hour | Task | Effort |
|---|---|---|
| 1-2 | Setup pre-flight: disk cleanup + Voyage signup + Python deps install | ~2h |
| 3-4 | Write scripts/rag-indexer.py + run initial embed |
~2h |
| 5 | Verify Qdrant collection + manual query sanity check | ~1h |
| 6-7 | Write scripts/rag-mcp-server.py + register .claude/settings.json |
~2h |
| 8 | Test rag_retrieve qua Claude Code (em main solo) | ~1h |
| 9-10 | Update 4 agent .md frontmatter + system prompt sections | ~2h |
| 11 | Setup pre-commit hook + audit logging | ~1h |
| 12-14 | Buffer + trial 10-15 query measure quality + cost | ~3h |
Trial 4-week plan
Week 1: Pilot single project (smaller of 2)
- Day 1-2: Setup + initial index
- Day 3-7: Active use + measure baseline metrics
- Deliverable: rag-audit-weekly-W1.md
Week 2: Roll out 2nd project
- Day 1: Setup separate Qdrant collection
- Day 2-7: Dual-project use measure
- Deliverable: rag-audit-weekly-W2.md
Week 3: 4-agent integration
- Day 1-2: Update 4 agent .md với rag_retrieve tool
- Day 3-7: Multi-agent task measure shared cache benefit
- Deliverable: rag-audit-weekly-W3.md
Week 4: Decision gate (keep / tune / upgrade B / rollback)
- Day 1-2: Compile metrics
- Day 3: Decision meeting (bro + em main)
- Day 4-7: Apply decision (tune embedding/chunking OR upgrade Option B OR rollback)
- Deliverable: rag-audit-monthly-M1.md + decision doc
Decision gate Week 4
PASS criteria (continue + tune):
✅ Quality recall > 80% on 30 query benchmark
✅ Cost < $5/month total (Voyage + storage)
✅ Session lifespan tăng > 30% (heavy session)
✅ Multi-agent shared cache hit > 60%
✅ Retrieval miss critical < 10% queries
✅ Storage < 1GB per project
TUNE criteria (continue + adjust):
⚠️ Quality 70-80% → tune chunking or upgrade embedding
⚠️ Cost 5-10/mo → audit query patterns, reduce k
⚠️ Session lifespan tăng < 30% → audit blanket effectiveness
ROLLBACK criteria (archive RAG):
❌ Quality < 70%
❌ Cost > $10/mo recurring
❌ Session lifespan KHÔNG tăng or giảm
❌ Em main complain "miss context" thường xuyên
❌ Storage > 5GB per project
10. Caveats + risks
10.1 Beta features risk
| Feature | Status | Mitigation |
|---|---|---|
| Anthropic Memory tool | Beta content-management-2025-06-27 |
Defer until GA, use MEMORY.md current |
| Anthropic Files API | Beta files-api-2025-04-14 |
Optional add-on, RAG primary |
| Extended 1h prompt cache | Beta extended-cache-ttl-2025-04-11 |
Use 5min default, opt-in 1h khi heavy session |
| Voyage AI API | Stable | Production OK |
| Qdrant local | Stable | Production OK |
| FastMCP | Stable v2+ | Production OK |
10.2 Storage concerns
Bro hiện tại: 911/954 GB used = 96% full (43GB free)
RAG storage budget:
Qdrant binary: ~50MB
Per project index: ~200-500MB (depend MD volume)
Backup snapshots: ~500MB
Logs + audit: ~100MB
Per project total: ~1GB
2 projects total: ~2GB
+ buffer 1GB
= 3GB recommend free space
→ Cleanup TRƯỚC setup: target 5GB+ free
Cleanup priorities:
node_modulesprojects cũ.NET bin/objartifacts- Docker images (
docker system prune -a) - Browser caches (Chrome/Edge ~5GB common)
%LOCALAPPDATA%caches (NuGet, dotnet)- Downloads / Videos không dùng
10.3 Quality monitoring
| Risk | Indicator | Action |
|---|---|---|
| Chunking break narrative | Em main report "miss context" | Review chunk strategy, tune |
| Embedding drift | Recall drop > 10pp benchmark | Re-embed full, check Voyage updates |
| Stale index | Files commit chưa re-index | Force re-index full, check hook |
| Query phrasing kém | Low precision on simple queries | Em main refine query patterns |
| Cross-language mismatch | Vietnamese query miss English content | Multilingual reranker hoặc query expansion |
10.4 Fallback strategy
Khi RAG fail / quality drop:
Layer 1: Em main fallback to Read full file (existing lazy pattern still works)
Layer 2: Em main blanket load critical file directly
Layer 3: Rollback Qdrant snapshot (weekly backup)
Layer 4: Full re-index từ scratch (~15 phút)
Layer 5: Archive RAG, return lazy current pattern (ultimate fallback)
Em main blanket 120K KHÔNG bị mất khi RAG fail → graceful degradation.
10.5 Vietnamese-English mix considerations
Voyage-3-large multilingual claim 26 lang coverage.
Vietnamese explicit benchmark KHÔNG public.
Risk: technical jargon Việt-Anh mix có thể miss synonym.
Ví dụ: "im lặng 403" vs "silent 403" — vector có gần nhau không?
Mitigation:
- Test 10-20 Việt-Anh mix queries trong audit benchmark
- Nếu recall low → consider voyage-multilingual-2 backup
- Hoặc add query expansion (Anthropic Contextual Retrieval pattern)
11. Success metrics
11.1 Quality metrics
| Metric | Target | Measurement |
|---|---|---|
| Recall avg (30 query benchmark) | > 80% | Manual score weekly |
| Precision avg | > 75% | Manual score weekly |
| Retrieval miss critical rate | < 10% | Em main report cumulative |
| Cross-language query recall | > 70% | Việt-Anh mix benchmark |
11.2 Cost metrics
| Metric | Target | Measurement |
|---|---|---|
| Voyage monthly spend | < $5 | Voyage dashboard |
| Total RAG infra cost | < $10/month | Sum tools |
| Cost per query | < $0.001 | Calculated |
| Disk usage per project | < 1GB | du -sh |
11.3 Performance metrics
| Metric | Target | Measurement |
|---|---|---|
| Query latency (P50) | < 200ms | MCP server log |
| Query latency (P99) | < 500ms | MCP server log |
| Re-index lag (post-commit) | < 30s | Pre-commit hook timing |
| Cache hit rate (multi-agent) | > 60% | Custom metric |
11.4 Capacity metrics
| Metric | Target | Measurement |
|---|---|---|
| Session lifespan productive | +50% vs lazy | Time tracker |
| Tasks before lost-in-middle | > 35 | Task counter |
| Heavy session token | -20% vs lazy | Anthropic dashboard |
| Multi-agent overlap saving | > 50K/session | Cumulative calc |
11.5 Multi-AI client metrics
| Metric | Target | Measurement |
|---|---|---|
| Active clients | ≥ 1 stable | Audit log |
| Per-client query volume | Track baseline | Audit log per client |
| Cross-client conflict | 0 | Bug reports |
12. Future enhancements
12.1 Phase 2 (after Week 4 validation)
| Enhancement | Effort | Benefit |
|---|---|---|
| Upgrade Option B (drop blanket 30-40K) | 1 session | Saving +15% tokens |
| Anthropic Memory tool integration | 2-3h | Native cross-conversation memory |
| Files API integration | 2-3h | Reduce blanket re-upload cost |
| Citations enable | 1h | RAG quality trace |
12.2 Phase 3 (Month 2-3)
| Enhancement | Effort | Benefit |
|---|---|---|
| Hybrid BM25 + vector search (Contextual Retrieval) | 4-6h | +49-67% recall (Anthropic doc) |
| Multi-project namespace | 2-3h | Cross-project query với strict isolation |
| Reranker model (Cohere rerank-3) | 2-3h | +10-20% precision |
| Custom Streamlit audit dashboard | 4-5h | Visual quality monitoring |
12.3 Phase 4 (Quarter 2+)
| Enhancement | Effort | Benefit |
|---|---|---|
| Replace Voyage với Anthropic native embedding (if GA) | 2-3h | Reduce vendor count |
| Auto-tuning chunking (LLM-aided) | 1 week | Quality+ |
| Federated multi-machine setup | 1 week | Team usage |
| Time-series analytics on retrieval patterns | 1 week | Insights |
12.4 Defer indefinitely (over-engineering)
- ❌ LangChain / LlamaIndex framework (heavy abstraction)
- ❌ Self-host LLM (cost > value)
- ❌ Custom embedding model fine-tuning (effort > value)
- ❌ Full text + vector hybrid index (use Voyage Reranker instead)
📚 References + tools
Anthropic official
- Memory tool docs
- Prompt caching guide
- Files API
- Contextual Retrieval cookbook
- Effective context engineering
- Agent SDK overview
Tools docs
Project memory
feedback_md_compact_narrative.md(§6.5 rule — KEEP narrative)feedback_multi_agent_setup.md(4-agent discipline)feedback_drastic_refactor_scope.md(RAG setup = dedicated session)feedback_uat_skip_verify.md(Phase 9 UAT mode)
✅ Pre-implementation checklist
☐ Bro confirm 3 thông tin:
☐ 2 dự án path (để Investigator audit MD inventory pre-flight)
☐ Stack 2 dự án (BE: .NET/Node/Python? FE: React/Vue?)
☐ Pilot project chọn (smaller in 2)
☐ Bro prepare environment:
☐ Disk cleanup 5GB+ free (current 911/954 = 96% full)
☐ Voyage AI account signup + API key
☐ Python 3.10+ installed
☐ Git installed (cho pre-commit hook)
☐ Bro schedule dedicated session:
☐ 10-14h block 1 ngày cuối tuần (memory feedback_drastic_refactor_scope rule)
☐ Reserve weekly cap ~30% cho RAG setup spawn cost
☐ Bro review plan:
☐ Read full this file
☐ Confirm scope blanket vs RAG store match needs
☐ Confirm tool stack acceptable
☐ Approve Week 1-4 trial timeline
📝 Notes — keep updated
- 2026-05-12 turn 1: Plan saved sau S21 turn 1 chốt cicd-monitor. Cross-project reference cho 2 dự án future bro > 1M MD. SOLUTION_ERP baseline ~354K MD (chưa cần RAG, defer).
- Status: 📝 PLAN ONLY — chưa implement
- Next trigger: Bro confirm 3 thông tin → spawn 🔵 Investigator audit MD inventory 2 dự án → tinh chỉnh blanket list cho từng project