Files
solution-erp/docs/changelog/sessions/2026-05-12-1800-s21-turn2-rag-planning.md
pqhuy1987 0a3b747612 [CLAUDE] Docs: chốt Session 21 turn 2 — RAG Hybrid setup planning + Cách A validation
Sau S21 turn 1 chốt cicd-monitor, bro clarify 5 dự án future > 1M MD tokens → discussion deep ~15 turn về RAG infrastructure. Em main solo (no SOLUTION_ERP sub-agent spawn), delegate claude-code-guide × 2 research Anthropic + community practice.

Quyết định chốt:
- Cách A defensive (giữ blanket 120K em main + RAG retrieve supplement)
- Bỏ Cách B aggressive (cắt 60-70% blanket) — vi phạm priority em main control flow strong
- Industry-validated cross 4 Anthropic blog + 5 community tools (Cursor/Continue/Cline/Aider all hybrid)
- 3-layer pattern Phase 1-3 incremental rollout (vector → +BM25 → +reranking, recall ~70% → ~92%)
- Stack: Voyage-3-large + Qdrant local + FastMCP Python + Streamlit dashboard

Multi-agent cost reality clarify (post-S21 t2):
- Em main blanket: ~120K
- 4 sub-agents spawn cumulative: ~400K
- Total billed heavy session: ~560K Cách A vs ~700K lazy
- Saving -20% từ multi-agent shared cache 70-90%
- Anthropic acknowledge 8-10× multiplier multi-agent

Files updated:
- docs/STATUS.md (Last updated S21 turn 2 + Recently Done row top)
- docs/HANDOFF.md (TL;DR Session 21 turn 2 section + Last updated)
- docs/rag-setup-plan.md (+Section 13 multi-agent cost reality + Section 14 3-layer hybrid Phase 1-3, +355 LOC)
- docs/changelog/sessions/2026-05-12-1800-s21-turn2-rag-planning.md (new session log)

Memory user-level update (outside repo, separate update):
- feedback_rag_hybrid_pattern.md (NEW cross-project pattern reusable)
- MEMORY.md index (+1 entry pointer)

Plan I NEW deferred — trigger bro confirm 5 dự án path + stack + pilot + Voyage API + disk cleanup → dedicated session 10-14h weekend (per feedback_drastic_refactor_scope rule).

Stats:
- 17 memory entries (+1 RAG hybrid)
- 1 plan file rag-setup-plan.md (1500 LOC final)
- 4 sub-agents seeds-only unchanged
- 81 test unchanged
- 4 commits S21 cumulative (f1c61c9 + 3a34831 + 1f8e9af + this)

CI skip per path filter (all .md).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 18:50:28 +07:00

12 KiB
Raw Blame History

Session 21 turn 2 — RAG Hybrid setup planning + Cách A validation deep dive

Date: 2026-05-12 (tiếp S21 turn 1 từ 0030 sáng — sang sáng-chiều-tối discussion deep RAG) Dev: Claude (Opus 4.7 1M Max — em main solo, no SOLUTION_ERP sub-agent spawn) Base commit: 3a34831 (S21 turn 1 chốt cicd-monitor) Commits: 1f8e9af (RAG plan save) + this chốt (2 commit S21 turn 2)

Bối cảnh

Sau S21 turn 1 chốt cicd-monitor (4 sub-agents seeds-only), bro đặt câu hỏi về RAG infrastructure cho 5 dự án future > 1M MD context. Cuộc thảo luận deep ~15+ turn covering:

  1. RAG fundamentals + Vector DB role
  2. Embedding model "AI nhúng" + Voyage AI cost mechanics
  3. Multi-project shared architecture (5 projects)
  4. Audit procedure 3-tier + change tracking SQLite
  5. UI/UX Streamlit dashboard 7 pages
  6. Cách A defensive (giữ blanket) vs Cách B aggressive (cắt 60-70%)
  7. Reasoning depth comparison (lazy current vs Cách A vs Cách B)
  8. Industry validation via claude-code-guide research
  9. Multi-agent cumulative cost reality (4 agents → ~520K cumulative blanket)
  10. 3-layer hybrid pattern (Anthropic Contextual Retrieval: embeddings + BM25 + reranking)

Deliverables

File mới — docs/rag-setup-plan.md (commit 1f8e9af, 1223 LOC)

Cross-project reference plan với 12 section comprehensive:

  1. Context + Why
  2. Architecture overview (6-layer diagram)
  3. BLANKET load list (~100K, 28% MD)
  4. RAG store list (~254K, 72% MD)
  5. Tool stack recommend
  6. Setup scripts copy-paste ready (~250 LOC Python)
  7. Audit procedure 3-tier (weekly/monthly/quarterly)
  8. Multi-AI client access (Claude Code + Desktop + Cursor + GPT-4)
  9. Timeline rollout 10-14h dedicated session
  10. Caveats + risks
  11. Success metrics + decision gate
  12. Future enhancements

File extend S21 turn 2 (this chốt commit)

Add 2 sections vào rag-setup-plan.md:

  • Section 13: Multi-agent cumulative cost reality (Anthropic 8-10× warning)
  • Section 14: 3-layer hybrid RAG upgrade path (Phase 1-3 Anthropic Contextual Retrieval)

Quyết định chốt — Cách A vs Cách B

Chọn Cách A (defensive hybrid)

Blanket: GIỮ NGUYÊN ~120K em main (35% MD)
RAG: ADD as supplement (retrieve on-demand)
Multi-agent: 4 sub-agents share retrieve cache
Sub-agent spawn blanket: ~80-100K each (auto-inject + skills + spec)
Cumulative blanket 5 entities: ~520K
Heavy session billed: ~560K (saving 20% vs lazy)

Why Cách A (priority bro: em main control flow strong):

  1. State ownership strong — em main biết direct project state
  2. Decision quality 90% (vs Cách B 75-80% do fragmentation)
  3. Wall-clock per task 12 phút (vs Cách B 16 phút)
  4. UX smooth — em response fast direct cho state question
  5. Risk-averse — graceful degradation nếu RAG fail (blanket fallback)
  6. Multi-agent leverage cache hit 70-90% common queries
  7. Quality recall +25-55pp (5-15 sources cross-validated vs lazy 1-3)

Bỏ Cách B (aggressive cut)

Blanket: CẮT MẠNH 60-70% (40-50K còn lại)
RAG: PRIMARY access mechanism cho mọi thứ

Why bỏ:

  1. Vi phạm priority "em main control flow strong"
  2. State ownership weak — phải retrieve mỗi câu state question
  3. UX latency +1-2s per state Q
  4. Decision quality 75-80% do reasoning fragmentation
  5. Risk severe nếu RAG fail (em main ngơ ngác)
  6. Anthropic research warn: "context rot inevitable cutting aggressively"
  7. Cascade retrieve problem (1 task → 2-3 retrieves)

Industry validation via claude-code-guide research

Spawn 2 lần claude-code-guide agent research (NOT SOLUTION_ERP sub-agents):

Round 1: Anthropic setup inventory (10 features)

  • Memory tool beta (content-management-2025-06-27)
  • Prompt caching extensions (5min/1h beta)
  • Files API beta (files-api-2025-04-14)
  • Citations stable
  • MCP servers official + community (9,400+ in 2026)
  • Voyage AI embedding partnership
  • Context compaction tool
  • Claude Agent SDK orchestration
  • Batch API 50% discount
  • RAG best practices Anthropic official

Round 2: Industry practice validation

5/5 dimensions Cách A fit Anthropic explicit recommend:

Dimension Bro setup Anthropic pattern
Context approach Hybrid blanket+RAG Recommended explicit
Sub-agent count 4 "3-5 optimal"
MD scale 5 project > 1M "Use RAG khi >200K"
Stack Qdrant+Voyage+MCP Production validated
Coordination Em main + agents "Coordinator+workers"

Source 4 Anthropic blog posts:

  • "Effective Context Engineering for AI Agents" (2025)
  • "Contextual Retrieval" (Sept 2024 flagship)
  • "Effective Harnesses for Long-Running Agents"
  • "Multi-Agent Coordination Patterns"

Community consensus (Tier 1 tools all Hybrid):

  • Cursor IDE @codebase indexing
  • Continue.dev MCP transport
  • Cline / Roo-Cline filesystem + AST + dynamic context
  • Aider code-as-graph
  • Sourcegraph Cody graph-aware

ZERO tools adopt aggressive Cách B pattern. ALL evolve toward Cách A hybrid.

3-layer hybrid pattern (Anthropic Contextual Retrieval Sept 2024)

Layer 1: Embeddings (Voyage-3-large)
  → Semantic + synonym + multilingual catch
  Performance: baseline ~50% recall
  
+ Contextual prefix (Haiku-generated context):
  → +35% improvement = ~67% recall

Layer 2: BM25 (bm25s Python lib free)
  → Exact identifier + technical terms catch
  + Layer 1 = ~75% recall
  
Layer 3: Reranking (Voyage rerank-2)
  → Cross-attention deep relevance
  + Layer 1+2 = ~85% recall

Phase rollout incremental:

Phase Layer Recall Cost/month
Phase 1 (Week 1-4) Layer 1 vector only ~70% ~$1.50
Phase 2 (Month 2) + Layer 2 BM25 ~78% ~$1.50 (BM25 free local)
Phase 3 (Month 3) + Layer 3 + Contextual ~92% ~$4-5

Multi-agent cost reality (Anthropic warn 8-10× multiplier)

Per entity blanket:
  Em main: ~120K
  Sub-agent each spawn: ~80-100K (auto-inject baseline + skills + spec)
  
Cumulative blanket 5 entities = ~520K

Heavy session full 4-agent spawn:
  Lazy current:  ~700K effective billed
  Cách A:        ~560K (-20% saving from multi-agent shared cache)
  
Cost multiplier vs solo em main: ~8-10×
Anthropic acknowledged: "Expect 3-10× token multiplier"

Saving Cách A breakdown (-140K):

  • Em main lazy Read → retrieve: -25K
  • 4 agents lazy Read → cached retrieve: -160K (share cache 70-90%)
  • Reasoning streamlined: -20K
  • Plus +60K retrieve cost added
  • Net: -145K ≈ -20% per heavy session

Stack validated

Component Tool Reason
Vector DB Qdrant local Rust binary 50MB, agent-native 2026 leader
Embedding Voyage-3-large Anthropic partner, multilingual 26 lang, $0.18/M
MCP server FastMCP Python Official Anthropic SDK
Chunking Custom adaptive Python §6.5 compliant, transparent
Tracking SQLite local Event log + audit + cost analytics
Dashboard Streamlit custom 7 pages multi-project
Re-index Pre-commit hook Native git, delta on commit

Total cost 5 projects: ~$1.50-5/month depending Phase. ~$0.50 initial embed.

Em main solo S21 turn 2 (no SOLUTION_ERP sub-agent spawn)

Spawn này session:
  ✅ claude-code-guide × 2 (generic agent for Anthropic research)
  ❌ Investigator / Implementer / Reviewer / CI/CD Monitor (vẫn seeds-only)
  
Em main solo qua context paste + Write file + research delegate.

Skills check

6 skills hiện tại unchanged. Decision KHÔNG add skill mới cho RAG vì:

  • RAG là decision/architectural pattern, không phải workflow project-specific
  • Cross-project applicable → memory entry phù hợp hơn skill
  • Per rule §9.5 anti-pattern "viết skill chỉ để có thêm"
  • Defer skill creation sau Phase 1 trial validate

Tests

Unit test 81 unchanged (0 test added — pure planning, không code change).

Memory entry mới

feedback_rag_hybrid_pattern.md (NEW — cross-project pattern reusable):

  • Decision Cách A rationale (control flow priority)
  • Multi-agent cost reality (8-10× multiplier)
  • 3-layer hybrid pattern Phase 1-3 incremental rollout
  • Stack validated (Voyage + Qdrant + FastMCP)
  • When to apply / when NOT apply triggers
  • Anti-patterns documented
  • Anthropic 4 blog cross-ref

Verify chain

Check Status
dotnet build Không chạy (no .cs change)
dotnet test Không chạy (no test added — pure docs)
npm build Không chạy (no FE change)
Push origin Pending end of turn
CI Gitea Actions Skip per path filter .md
IIS prod deploy KHÔNG xảy ra (CI skip, expected)

Docs updates

  • docs/STATUS.md — Last updated S21 turn 2 + Recently Done row top
  • docs/HANDOFF.md — TL;DR Session 21 turn 2 section + Last updated
  • docs/rag-setup-plan.md — extend +Section 13 (cost reality) +Section 14 (3-layer)
  • docs/changelog/sessions/2026-05-12-1800-s21-turn2-rag-planning.md — file này
  • Memory user-level new: feedback_rag_hybrid_pattern.md
  • Memory user-level: MEMORY.md index + 1 entry pointer
  • ⏭ KHÔNG đụng: rules.md / architecture.md / gotchas.md / database/* / flows/* / skills/* / CLAUDE.md (no real change cho 8 file này)
  • ⏭ KHÔNG flush 4 sub-agent MEMORY.md (chưa spawn, per §6.5 KHÔNG add noise)

Handoff Session 21 turn 3+

Plan I NEW — RAG Setup Implementation

Trigger: Bro confirm 5 dự án path + stack + pilot choice + Voyage API key + disk cleanup 5-8GB free.

Schedule: Dedicated session 10-14h weekend (per memory feedback_drastic_refactor_scope rule).

Phases:

  • Phase 1 (Week 1-4): Layer 1 vector embeddings only — ~70% recall — ~$1.50/mo
  • Phase 2 (Month 2): + Layer 2 BM25 hybrid — ~78% recall — ~$1.50/mo
  • Phase 3 (Month 3): + Layer 3 Reranking + Contextual — ~92% recall — ~$4-5/mo

Pre-flight task: Spawn 🔵 Investigator audit MD inventory 5 dự án parallel → tinh chỉnh blanket list per project.

Plan B Contract V2 wire (vẫn pending S21 turn 1)

  • Trial Week 1 multi-agent kick-off SOLUTION_ERP
  • 6 tasks (Mig 28+29 + Service + Controller + FE × 2 + Pin V2)
  • 4 sub-agents pipeline coordinate (lần đầu spawn 4 agents thật)

Plan C Test gap fill (vẫn pending)

Bundle Chunk E Plan B — 5 test pending:

  • B4 silent 403 regression (gotcha #44 vi phạm §7)
  • V2 Service ApproveV2Async UPSERT opinion
  • Section gộp Chunk C render
  • Mig 25 PATCH /user-selectable
  • Mig 27 PATCH /api/menus/{key}

Plan D-F-G unchanged

  • D: Hard blockers ops (UAT/SMTP/creds/backup) — BLOCKED chờ user
  • F: Audit định kỳ 2026-06-01 (~3 tuần nữa, KHÔNG tự chạy)
  • G: Multi-agent trial 4-week (post-S21 t1 + S21 t2 setup complete)

Stats cumulative S21 turn 2

Metric Trước S21 t2 Sau S21 t2 Δ
DB tables 59 59 0
Migrations 27 27 0
Endpoints ~142 ~142 0
FE pages 34 34 0
Unit tests 81 81 0
Gotchas 44 44 0
Memory entries 16 17 +1 (RAG hybrid pattern)
Skills 6 6 0
Sub-agents 4 seeds-only 4 seeds-only 0 (chưa spawn)
Commits S21 2 (f1c61c9 + 3a34831) 4 +2 (1f8e9af + this chốt)
MD plan files 0 1 +1 (rag-setup-plan.md 1223 LOC + 2 section extend)

Cross-ref

  • S21 turn 1 session log: 2026-05-12-0030-s21-cicd-monitor-add.md
  • Plan file: docs/rag-setup-plan.md (1223 + extend ~300 LOC = ~1500 LOC)
  • Memory new: feedback_rag_hybrid_pattern.md (cross-project reusable)
  • Industry research: claude-code-guide × 2 spawn agent reports
  • 4 Anthropic blog cross-ref trong memory entry

Bài học chốt S21 turn 2

  1. Em main control flow strong là priority bro — quyết định Cách A defensive over Cách B aggressive
  2. Multi-agent cost realistic 8-10× solo — KHÔNG tránh được spawn baseline ~400K cumulative 4 agents
  3. Anthropic recommend 3-layer hybrid pattern — embeddings + BM25 + reranking compound effect
  4. Industry consensus = hybrid — Cursor + Continue + Cline + Aider all evolve toward hybrid
  5. Voyage Vietnamese quality cần verify Week 1 — voyage-3-large multilingual nhưng explicit Vietnamese benchmark chưa publish
  6. RAG setup = dedicated session 10-14h — per feedback_drastic_refactor_scope rule
  7. 5 projects scale workable — single Qdrant + per-project collection + ~$2-5/month cost