Sau S21 turn 1 chốt cicd-monitor, bro clarify 5 dự án future > 1M MD tokens → discussion deep ~15 turn về RAG infrastructure. Em main solo (no SOLUTION_ERP sub-agent spawn), delegate claude-code-guide × 2 research Anthropic + community practice. Quyết định chốt: - Cách A defensive (giữ blanket 120K em main + RAG retrieve supplement) - Bỏ Cách B aggressive (cắt 60-70% blanket) — vi phạm priority em main control flow strong - Industry-validated cross 4 Anthropic blog + 5 community tools (Cursor/Continue/Cline/Aider all hybrid) - 3-layer pattern Phase 1-3 incremental rollout (vector → +BM25 → +reranking, recall ~70% → ~92%) - Stack: Voyage-3-large + Qdrant local + FastMCP Python + Streamlit dashboard Multi-agent cost reality clarify (post-S21 t2): - Em main blanket: ~120K - 4 sub-agents spawn cumulative: ~400K - Total billed heavy session: ~560K Cách A vs ~700K lazy - Saving -20% từ multi-agent shared cache 70-90% - Anthropic acknowledge 8-10× multiplier multi-agent Files updated: - docs/STATUS.md (Last updated S21 turn 2 + Recently Done row top) - docs/HANDOFF.md (TL;DR Session 21 turn 2 section + Last updated) - docs/rag-setup-plan.md (+Section 13 multi-agent cost reality + Section 14 3-layer hybrid Phase 1-3, +355 LOC) - docs/changelog/sessions/2026-05-12-1800-s21-turn2-rag-planning.md (new session log) Memory user-level update (outside repo, separate update): - feedback_rag_hybrid_pattern.md (NEW cross-project pattern reusable) - MEMORY.md index (+1 entry pointer) Plan I NEW deferred — trigger bro confirm 5 dự án path + stack + pilot + Voyage API + disk cleanup → dedicated session 10-14h weekend (per feedback_drastic_refactor_scope rule). Stats: - 17 memory entries (+1 RAG hybrid) - 1 plan file rag-setup-plan.md (1500 LOC final) - 4 sub-agents seeds-only unchanged - 81 test unchanged - 4 commits S21 cumulative (f1c61c9+3a34831+1f8e9af+ this) CI skip per path filter (all .md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
12 KiB
Session 21 turn 2 — RAG Hybrid setup planning + Cách A validation deep dive
Date: 2026-05-12 (tiếp S21 turn 1 từ 0030 sáng — sang sáng-chiều-tối discussion deep RAG)
Dev: Claude (Opus 4.7 1M Max — em main solo, no SOLUTION_ERP sub-agent spawn)
Base commit: 3a34831 (S21 turn 1 chốt cicd-monitor)
Commits: 1f8e9af (RAG plan save) + this chốt (2 commit S21 turn 2)
Bối cảnh
Sau S21 turn 1 chốt cicd-monitor (4 sub-agents seeds-only), bro đặt câu hỏi về RAG infrastructure cho 5 dự án future > 1M MD context. Cuộc thảo luận deep ~15+ turn covering:
- RAG fundamentals + Vector DB role
- Embedding model "AI nhúng" + Voyage AI cost mechanics
- Multi-project shared architecture (5 projects)
- Audit procedure 3-tier + change tracking SQLite
- UI/UX Streamlit dashboard 7 pages
- Cách A defensive (giữ blanket) vs Cách B aggressive (cắt 60-70%)
- Reasoning depth comparison (lazy current vs Cách A vs Cách B)
- Industry validation via claude-code-guide research
- Multi-agent cumulative cost reality (4 agents → ~520K cumulative blanket)
- 3-layer hybrid pattern (Anthropic Contextual Retrieval: embeddings + BM25 + reranking)
Deliverables
File mới — docs/rag-setup-plan.md (commit 1f8e9af, 1223 LOC)
Cross-project reference plan với 12 section comprehensive:
- Context + Why
- Architecture overview (6-layer diagram)
- BLANKET load list (~100K, 28% MD)
- RAG store list (~254K, 72% MD)
- Tool stack recommend
- Setup scripts copy-paste ready (~250 LOC Python)
- Audit procedure 3-tier (weekly/monthly/quarterly)
- Multi-AI client access (Claude Code + Desktop + Cursor + GPT-4)
- Timeline rollout 10-14h dedicated session
- Caveats + risks
- Success metrics + decision gate
- Future enhancements
File extend S21 turn 2 (this chốt commit)
Add 2 sections vào rag-setup-plan.md:
- Section 13: Multi-agent cumulative cost reality (Anthropic 8-10× warning)
- Section 14: 3-layer hybrid RAG upgrade path (Phase 1-3 Anthropic Contextual Retrieval)
Quyết định chốt — Cách A vs Cách B
Chọn Cách A (defensive hybrid) ⭐
Blanket: GIỮ NGUYÊN ~120K em main (35% MD)
RAG: ADD as supplement (retrieve on-demand)
Multi-agent: 4 sub-agents share retrieve cache
Sub-agent spawn blanket: ~80-100K each (auto-inject + skills + spec)
Cumulative blanket 5 entities: ~520K
Heavy session billed: ~560K (saving 20% vs lazy)
Why Cách A (priority bro: em main control flow strong):
- ✅ State ownership strong — em main biết direct project state
- ✅ Decision quality 90% (vs Cách B 75-80% do fragmentation)
- ✅ Wall-clock per task 12 phút (vs Cách B 16 phút)
- ✅ UX smooth — em response fast direct cho state question
- ✅ Risk-averse — graceful degradation nếu RAG fail (blanket fallback)
- ✅ Multi-agent leverage cache hit 70-90% common queries
- ✅ Quality recall +25-55pp (5-15 sources cross-validated vs lazy 1-3)
Bỏ Cách B (aggressive cut)
Blanket: CẮT MẠNH 60-70% (40-50K còn lại)
RAG: PRIMARY access mechanism cho mọi thứ
Why bỏ:
- ❌ Vi phạm priority "em main control flow strong"
- ❌ State ownership weak — phải retrieve mỗi câu state question
- ❌ UX latency +1-2s per state Q
- ❌ Decision quality 75-80% do reasoning fragmentation
- ❌ Risk severe nếu RAG fail (em main ngơ ngác)
- ❌ Anthropic research warn: "context rot inevitable cutting aggressively"
- ❌ Cascade retrieve problem (1 task → 2-3 retrieves)
Industry validation via claude-code-guide research
Spawn 2 lần claude-code-guide agent research (NOT SOLUTION_ERP sub-agents):
Round 1: Anthropic setup inventory (10 features)
- Memory tool beta (
content-management-2025-06-27) - Prompt caching extensions (5min/1h beta)
- Files API beta (
files-api-2025-04-14) - Citations stable
- MCP servers official + community (9,400+ in 2026)
- Voyage AI embedding partnership
- Context compaction tool
- Claude Agent SDK orchestration
- Batch API 50% discount
- RAG best practices Anthropic official
Round 2: Industry practice validation
5/5 dimensions Cách A fit Anthropic explicit recommend:
| Dimension | Bro setup | Anthropic pattern |
|---|---|---|
| Context approach | Hybrid blanket+RAG | ✅ Recommended explicit |
| Sub-agent count | 4 | ✅ "3-5 optimal" |
| MD scale | 5 project > 1M | ✅ "Use RAG khi >200K" |
| Stack | Qdrant+Voyage+MCP | ✅ Production validated |
| Coordination | Em main + agents | ✅ "Coordinator+workers" |
Source 4 Anthropic blog posts:
- "Effective Context Engineering for AI Agents" (2025)
- "Contextual Retrieval" (Sept 2024 flagship)
- "Effective Harnesses for Long-Running Agents"
- "Multi-Agent Coordination Patterns"
Community consensus (Tier 1 tools all Hybrid):
- Cursor IDE
@codebaseindexing - Continue.dev MCP transport
- Cline / Roo-Cline filesystem + AST + dynamic context
- Aider code-as-graph
- Sourcegraph Cody graph-aware
→ ZERO tools adopt aggressive Cách B pattern. ALL evolve toward Cách A hybrid.
3-layer hybrid pattern (Anthropic Contextual Retrieval Sept 2024)
Layer 1: Embeddings (Voyage-3-large)
→ Semantic + synonym + multilingual catch
Performance: baseline ~50% recall
+ Contextual prefix (Haiku-generated context):
→ +35% improvement = ~67% recall
Layer 2: BM25 (bm25s Python lib free)
→ Exact identifier + technical terms catch
+ Layer 1 = ~75% recall
Layer 3: Reranking (Voyage rerank-2)
→ Cross-attention deep relevance
+ Layer 1+2 = ~85% recall
Phase rollout incremental:
| Phase | Layer | Recall | Cost/month |
|---|---|---|---|
| Phase 1 (Week 1-4) | Layer 1 vector only | ~70% | ~$1.50 |
| Phase 2 (Month 2) | + Layer 2 BM25 | ~78% | ~$1.50 (BM25 free local) |
| Phase 3 (Month 3) | + Layer 3 + Contextual | ~92% | ~$4-5 |
Multi-agent cost reality (Anthropic warn 8-10× multiplier)
Per entity blanket:
Em main: ~120K
Sub-agent each spawn: ~80-100K (auto-inject baseline + skills + spec)
Cumulative blanket 5 entities = ~520K
Heavy session full 4-agent spawn:
Lazy current: ~700K effective billed
Cách A: ~560K (-20% saving from multi-agent shared cache)
Cost multiplier vs solo em main: ~8-10×
Anthropic acknowledged: "Expect 3-10× token multiplier"
Saving Cách A breakdown (-140K):
- Em main lazy Read → retrieve: -25K
- 4 agents lazy Read → cached retrieve: -160K (share cache 70-90%)
- Reasoning streamlined: -20K
- Plus +60K retrieve cost added
- Net: -145K ≈ -20% per heavy session
Stack validated
| Component | Tool | Reason |
|---|---|---|
| Vector DB | Qdrant local | Rust binary 50MB, agent-native 2026 leader |
| Embedding | Voyage-3-large | Anthropic partner, multilingual 26 lang, $0.18/M |
| MCP server | FastMCP Python | Official Anthropic SDK |
| Chunking | Custom adaptive Python | §6.5 compliant, transparent |
| Tracking | SQLite local | Event log + audit + cost analytics |
| Dashboard | Streamlit custom | 7 pages multi-project |
| Re-index | Pre-commit hook | Native git, delta on commit |
Total cost 5 projects: ~$1.50-5/month depending Phase. ~$0.50 initial embed.
Em main solo S21 turn 2 (no SOLUTION_ERP sub-agent spawn)
Spawn này session:
✅ claude-code-guide × 2 (generic agent for Anthropic research)
❌ Investigator / Implementer / Reviewer / CI/CD Monitor (vẫn seeds-only)
Em main solo qua context paste + Write file + research delegate.
Skills check
6 skills hiện tại unchanged. Decision KHÔNG add skill mới cho RAG vì:
- RAG là decision/architectural pattern, không phải workflow project-specific
- Cross-project applicable → memory entry phù hợp hơn skill
- Per rule §9.5 anti-pattern "viết skill chỉ để có thêm"
- Defer skill creation sau Phase 1 trial validate
Tests
Unit test 81 unchanged (0 test added — pure planning, không code change).
Memory entry mới
feedback_rag_hybrid_pattern.md (NEW — cross-project pattern reusable):
- Decision Cách A rationale (control flow priority)
- Multi-agent cost reality (8-10× multiplier)
- 3-layer hybrid pattern Phase 1-3 incremental rollout
- Stack validated (Voyage + Qdrant + FastMCP)
- When to apply / when NOT apply triggers
- Anti-patterns documented
- Anthropic 4 blog cross-ref
Verify chain
| Check | Status |
|---|---|
| dotnet build | Không chạy (no .cs change) |
| dotnet test | Không chạy (no test added — pure docs) |
| npm build | Không chạy (no FE change) |
| Push origin | Pending end of turn |
| CI Gitea Actions | Skip per path filter .md |
| IIS prod deploy | KHÔNG xảy ra (CI skip, expected) |
Docs updates
- ✅
docs/STATUS.md— Last updated S21 turn 2 + Recently Done row top - ✅
docs/HANDOFF.md— TL;DR Session 21 turn 2 section + Last updated - ✅
docs/rag-setup-plan.md— extend +Section 13 (cost reality) +Section 14 (3-layer) - ✅
docs/changelog/sessions/2026-05-12-1800-s21-turn2-rag-planning.md— file này - ✅ Memory user-level new:
feedback_rag_hybrid_pattern.md - ✅ Memory user-level:
MEMORY.mdindex + 1 entry pointer - ⏭ KHÔNG đụng: rules.md / architecture.md / gotchas.md / database/* / flows/* / skills/* / CLAUDE.md (no real change cho 8 file này)
- ⏭ KHÔNG flush 4 sub-agent MEMORY.md (chưa spawn, per §6.5 KHÔNG add noise)
Handoff Session 21 turn 3+
Plan I NEW — RAG Setup Implementation
Trigger: Bro confirm 5 dự án path + stack + pilot choice + Voyage API key + disk cleanup 5-8GB free.
Schedule: Dedicated session 10-14h weekend (per memory feedback_drastic_refactor_scope rule).
Phases:
- Phase 1 (Week 1-4): Layer 1 vector embeddings only — ~70% recall — ~$1.50/mo
- Phase 2 (Month 2): + Layer 2 BM25 hybrid — ~78% recall — ~$1.50/mo
- Phase 3 (Month 3): + Layer 3 Reranking + Contextual — ~92% recall — ~$4-5/mo
Pre-flight task: Spawn 🔵 Investigator audit MD inventory 5 dự án parallel → tinh chỉnh blanket list per project.
Plan B Contract V2 wire (vẫn pending S21 turn 1)
- Trial Week 1 multi-agent kick-off SOLUTION_ERP
- 6 tasks (Mig 28+29 + Service + Controller + FE × 2 + Pin V2)
- 4 sub-agents pipeline coordinate (lần đầu spawn 4 agents thật)
Plan C Test gap fill (vẫn pending)
Bundle Chunk E Plan B — 5 test pending:
- B4 silent 403 regression (gotcha #44 vi phạm §7)
- V2 Service
ApproveV2AsyncUPSERT opinion - Section gộp Chunk C render
- Mig 25 PATCH
/user-selectable - Mig 27 PATCH
/api/menus/{key}
Plan D-F-G unchanged
- D: Hard blockers ops (UAT/SMTP/creds/backup) — BLOCKED chờ user
- F: Audit định kỳ 2026-06-01 (~3 tuần nữa, KHÔNG tự chạy)
- G: Multi-agent trial 4-week (post-S21 t1 + S21 t2 setup complete)
Stats cumulative S21 turn 2
| Metric | Trước S21 t2 | Sau S21 t2 | Δ |
|---|---|---|---|
| DB tables | 59 | 59 | 0 |
| Migrations | 27 | 27 | 0 |
| Endpoints | ~142 | ~142 | 0 |
| FE pages | 34 | 34 | 0 |
| Unit tests | 81 | 81 | 0 |
| Gotchas | 44 | 44 | 0 |
| Memory entries | 16 | 17 | +1 (RAG hybrid pattern) |
| Skills | 6 | 6 | 0 |
| Sub-agents | 4 seeds-only | 4 seeds-only | 0 (chưa spawn) |
| Commits S21 | 2 (f1c61c9 + 3a34831) |
4 | +2 (1f8e9af + this chốt) |
| MD plan files | 0 | 1 | +1 (rag-setup-plan.md 1223 LOC + 2 section extend) |
Cross-ref
- S21 turn 1 session log:
2026-05-12-0030-s21-cicd-monitor-add.md - Plan file:
docs/rag-setup-plan.md(1223 + extend ~300 LOC = ~1500 LOC) - Memory new:
feedback_rag_hybrid_pattern.md(cross-project reusable) - Industry research: claude-code-guide × 2 spawn agent reports
- 4 Anthropic blog cross-ref trong memory entry
Bài học chốt S21 turn 2
- Em main control flow strong là priority bro — quyết định Cách A defensive over Cách B aggressive
- Multi-agent cost realistic 8-10× solo — KHÔNG tránh được spawn baseline ~400K cumulative 4 agents
- Anthropic recommend 3-layer hybrid pattern — embeddings + BM25 + reranking compound effect
- Industry consensus = hybrid — Cursor + Continue + Cline + Aider all evolve toward hybrid
- Voyage Vietnamese quality cần verify Week 1 — voyage-3-large multilingual nhưng explicit Vietnamese benchmark chưa publish
- RAG setup = dedicated session 10-14h — per
feedback_drastic_refactor_scoperule - 5 projects scale workable — single Qdrant + per-project collection + ~$2-5/month cost