Sau S21 turn 1 chốt cicd-monitor, bro clarify 5 dự án future > 1M MD tokens → discussion deep ~15 turn về RAG infrastructure. Em main solo (no SOLUTION_ERP sub-agent spawn), delegate claude-code-guide × 2 research Anthropic + community practice. Quyết định chốt: - Cách A defensive (giữ blanket 120K em main + RAG retrieve supplement) - Bỏ Cách B aggressive (cắt 60-70% blanket) — vi phạm priority em main control flow strong - Industry-validated cross 4 Anthropic blog + 5 community tools (Cursor/Continue/Cline/Aider all hybrid) - 3-layer pattern Phase 1-3 incremental rollout (vector → +BM25 → +reranking, recall ~70% → ~92%) - Stack: Voyage-3-large + Qdrant local + FastMCP Python + Streamlit dashboard Multi-agent cost reality clarify (post-S21 t2): - Em main blanket: ~120K - 4 sub-agents spawn cumulative: ~400K - Total billed heavy session: ~560K Cách A vs ~700K lazy - Saving -20% từ multi-agent shared cache 70-90% - Anthropic acknowledge 8-10× multiplier multi-agent Files updated: - docs/STATUS.md (Last updated S21 turn 2 + Recently Done row top) - docs/HANDOFF.md (TL;DR Session 21 turn 2 section + Last updated) - docs/rag-setup-plan.md (+Section 13 multi-agent cost reality + Section 14 3-layer hybrid Phase 1-3, +355 LOC) - docs/changelog/sessions/2026-05-12-1800-s21-turn2-rag-planning.md (new session log) Memory user-level update (outside repo, separate update): - feedback_rag_hybrid_pattern.md (NEW cross-project pattern reusable) - MEMORY.md index (+1 entry pointer) Plan I NEW deferred — trigger bro confirm 5 dự án path + stack + pilot + Voyage API + disk cleanup → dedicated session 10-14h weekend (per feedback_drastic_refactor_scope rule). Stats: - 17 memory entries (+1 RAG hybrid) - 1 plan file rag-setup-plan.md (1500 LOC final) - 4 sub-agents seeds-only unchanged - 81 test unchanged - 4 commits S21 cumulative (f1c61c9+3a34831+1f8e9af+ this) CI skip per path filter (all .md). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
319 lines
12 KiB
Markdown
319 lines
12 KiB
Markdown
# Session 21 turn 2 — RAG Hybrid setup planning + Cách A validation deep dive
|
||
|
||
**Date:** 2026-05-12 (tiếp S21 turn 1 từ 0030 sáng — sang sáng-chiều-tối discussion deep RAG)
|
||
**Dev:** Claude (Opus 4.7 1M Max — em main solo, no SOLUTION_ERP sub-agent spawn)
|
||
**Base commit:** `3a34831` (S21 turn 1 chốt cicd-monitor)
|
||
**Commits:** `1f8e9af` (RAG plan save) + this chốt (2 commit S21 turn 2)
|
||
|
||
## Bối cảnh
|
||
|
||
Sau S21 turn 1 chốt cicd-monitor (4 sub-agents seeds-only), bro đặt câu hỏi về RAG infrastructure cho **5 dự án future > 1M MD context**. Cuộc thảo luận deep ~15+ turn covering:
|
||
|
||
1. RAG fundamentals + Vector DB role
|
||
2. Embedding model "AI nhúng" + Voyage AI cost mechanics
|
||
3. Multi-project shared architecture (5 projects)
|
||
4. Audit procedure 3-tier + change tracking SQLite
|
||
5. UI/UX Streamlit dashboard 7 pages
|
||
6. Cách A defensive (giữ blanket) vs Cách B aggressive (cắt 60-70%)
|
||
7. Reasoning depth comparison (lazy current vs Cách A vs Cách B)
|
||
8. Industry validation via claude-code-guide research
|
||
9. Multi-agent cumulative cost reality (4 agents → ~520K cumulative blanket)
|
||
10. 3-layer hybrid pattern (Anthropic Contextual Retrieval: embeddings + BM25 + reranking)
|
||
|
||
## Deliverables
|
||
|
||
### File mới — `docs/rag-setup-plan.md` (commit `1f8e9af`, 1223 LOC)
|
||
|
||
Cross-project reference plan với 12 section comprehensive:
|
||
|
||
1. Context + Why
|
||
2. Architecture overview (6-layer diagram)
|
||
3. BLANKET load list (~100K, 28% MD)
|
||
4. RAG store list (~254K, 72% MD)
|
||
5. Tool stack recommend
|
||
6. Setup scripts copy-paste ready (~250 LOC Python)
|
||
7. Audit procedure 3-tier (weekly/monthly/quarterly)
|
||
8. Multi-AI client access (Claude Code + Desktop + Cursor + GPT-4)
|
||
9. Timeline rollout 10-14h dedicated session
|
||
10. Caveats + risks
|
||
11. Success metrics + decision gate
|
||
12. Future enhancements
|
||
|
||
### File extend S21 turn 2 (this chốt commit)
|
||
|
||
Add 2 sections vào `rag-setup-plan.md`:
|
||
- Section 13: Multi-agent cumulative cost reality (Anthropic 8-10× warning)
|
||
- Section 14: 3-layer hybrid RAG upgrade path (Phase 1-3 Anthropic Contextual Retrieval)
|
||
|
||
## Quyết định chốt — Cách A vs Cách B
|
||
|
||
### Chọn **Cách A** (defensive hybrid) ⭐
|
||
|
||
```
|
||
Blanket: GIỮ NGUYÊN ~120K em main (35% MD)
|
||
RAG: ADD as supplement (retrieve on-demand)
|
||
Multi-agent: 4 sub-agents share retrieve cache
|
||
Sub-agent spawn blanket: ~80-100K each (auto-inject + skills + spec)
|
||
Cumulative blanket 5 entities: ~520K
|
||
Heavy session billed: ~560K (saving 20% vs lazy)
|
||
```
|
||
|
||
**Why Cách A (priority bro: em main control flow strong):**
|
||
1. ✅ State ownership strong — em main biết direct project state
|
||
2. ✅ Decision quality 90% (vs Cách B 75-80% do fragmentation)
|
||
3. ✅ Wall-clock per task 12 phút (vs Cách B 16 phút)
|
||
4. ✅ UX smooth — em response fast direct cho state question
|
||
5. ✅ Risk-averse — graceful degradation nếu RAG fail (blanket fallback)
|
||
6. ✅ Multi-agent leverage cache hit 70-90% common queries
|
||
7. ✅ Quality recall +25-55pp (5-15 sources cross-validated vs lazy 1-3)
|
||
|
||
### Bỏ **Cách B** (aggressive cut)
|
||
|
||
```
|
||
Blanket: CẮT MẠNH 60-70% (40-50K còn lại)
|
||
RAG: PRIMARY access mechanism cho mọi thứ
|
||
```
|
||
|
||
**Why bỏ:**
|
||
1. ❌ Vi phạm priority "em main control flow strong"
|
||
2. ❌ State ownership weak — phải retrieve mỗi câu state question
|
||
3. ❌ UX latency +1-2s per state Q
|
||
4. ❌ Decision quality 75-80% do reasoning fragmentation
|
||
5. ❌ Risk severe nếu RAG fail (em main ngơ ngác)
|
||
6. ❌ Anthropic research warn: "context rot inevitable cutting aggressively"
|
||
7. ❌ Cascade retrieve problem (1 task → 2-3 retrieves)
|
||
|
||
## Industry validation via claude-code-guide research
|
||
|
||
Spawn 2 lần claude-code-guide agent research (NOT SOLUTION_ERP sub-agents):
|
||
|
||
### Round 1: Anthropic setup inventory (10 features)
|
||
|
||
- Memory tool beta (`content-management-2025-06-27`)
|
||
- Prompt caching extensions (5min/1h beta)
|
||
- Files API beta (`files-api-2025-04-14`)
|
||
- Citations stable
|
||
- MCP servers official + community (9,400+ in 2026)
|
||
- Voyage AI embedding partnership
|
||
- Context compaction tool
|
||
- Claude Agent SDK orchestration
|
||
- Batch API 50% discount
|
||
- RAG best practices Anthropic official
|
||
|
||
### Round 2: Industry practice validation
|
||
|
||
**5/5 dimensions Cách A fit Anthropic explicit recommend:**
|
||
|
||
| Dimension | Bro setup | Anthropic pattern |
|
||
|---|---|---|
|
||
| Context approach | Hybrid blanket+RAG | ✅ Recommended explicit |
|
||
| Sub-agent count | 4 | ✅ "3-5 optimal" |
|
||
| MD scale | 5 project > 1M | ✅ "Use RAG khi >200K" |
|
||
| Stack | Qdrant+Voyage+MCP | ✅ Production validated |
|
||
| Coordination | Em main + agents | ✅ "Coordinator+workers" |
|
||
|
||
**Source 4 Anthropic blog posts:**
|
||
- "Effective Context Engineering for AI Agents" (2025)
|
||
- "Contextual Retrieval" (Sept 2024 flagship)
|
||
- "Effective Harnesses for Long-Running Agents"
|
||
- "Multi-Agent Coordination Patterns"
|
||
|
||
**Community consensus (Tier 1 tools all Hybrid):**
|
||
- Cursor IDE `@codebase` indexing
|
||
- Continue.dev MCP transport
|
||
- Cline / Roo-Cline filesystem + AST + dynamic context
|
||
- Aider code-as-graph
|
||
- Sourcegraph Cody graph-aware
|
||
|
||
→ **ZERO** tools adopt aggressive Cách B pattern. **ALL** evolve toward Cách A hybrid.
|
||
|
||
## 3-layer hybrid pattern (Anthropic Contextual Retrieval Sept 2024)
|
||
|
||
```
|
||
Layer 1: Embeddings (Voyage-3-large)
|
||
→ Semantic + synonym + multilingual catch
|
||
Performance: baseline ~50% recall
|
||
|
||
+ Contextual prefix (Haiku-generated context):
|
||
→ +35% improvement = ~67% recall
|
||
|
||
Layer 2: BM25 (bm25s Python lib free)
|
||
→ Exact identifier + technical terms catch
|
||
+ Layer 1 = ~75% recall
|
||
|
||
Layer 3: Reranking (Voyage rerank-2)
|
||
→ Cross-attention deep relevance
|
||
+ Layer 1+2 = ~85% recall
|
||
```
|
||
|
||
**Phase rollout incremental:**
|
||
|
||
| Phase | Layer | Recall | Cost/month |
|
||
|---|---|---|---|
|
||
| Phase 1 (Week 1-4) | Layer 1 vector only | ~70% | ~$1.50 |
|
||
| Phase 2 (Month 2) | + Layer 2 BM25 | ~78% | ~$1.50 (BM25 free local) |
|
||
| Phase 3 (Month 3) | + Layer 3 + Contextual | ~92% | ~$4-5 |
|
||
|
||
## Multi-agent cost reality (Anthropic warn 8-10× multiplier)
|
||
|
||
```
|
||
Per entity blanket:
|
||
Em main: ~120K
|
||
Sub-agent each spawn: ~80-100K (auto-inject baseline + skills + spec)
|
||
|
||
Cumulative blanket 5 entities = ~520K
|
||
|
||
Heavy session full 4-agent spawn:
|
||
Lazy current: ~700K effective billed
|
||
Cách A: ~560K (-20% saving from multi-agent shared cache)
|
||
|
||
Cost multiplier vs solo em main: ~8-10×
|
||
Anthropic acknowledged: "Expect 3-10× token multiplier"
|
||
```
|
||
|
||
**Saving Cách A breakdown (-140K):**
|
||
- Em main lazy Read → retrieve: -25K
|
||
- 4 agents lazy Read → cached retrieve: -160K (share cache 70-90%)
|
||
- Reasoning streamlined: -20K
|
||
- Plus +60K retrieve cost added
|
||
- Net: -145K ≈ -20% per heavy session
|
||
|
||
## Stack validated
|
||
|
||
| Component | Tool | Reason |
|
||
|---|---|---|
|
||
| **Vector DB** | Qdrant local | Rust binary 50MB, agent-native 2026 leader |
|
||
| **Embedding** | Voyage-3-large | Anthropic partner, multilingual 26 lang, $0.18/M |
|
||
| **MCP server** | FastMCP Python | Official Anthropic SDK |
|
||
| **Chunking** | Custom adaptive Python | §6.5 compliant, transparent |
|
||
| **Tracking** | SQLite local | Event log + audit + cost analytics |
|
||
| **Dashboard** | Streamlit custom | 7 pages multi-project |
|
||
| **Re-index** | Pre-commit hook | Native git, delta on commit |
|
||
|
||
**Total cost 5 projects:** ~$1.50-5/month depending Phase. ~$0.50 initial embed.
|
||
|
||
## Em main solo S21 turn 2 (no SOLUTION_ERP sub-agent spawn)
|
||
|
||
```
|
||
Spawn này session:
|
||
✅ claude-code-guide × 2 (generic agent for Anthropic research)
|
||
❌ Investigator / Implementer / Reviewer / CI/CD Monitor (vẫn seeds-only)
|
||
|
||
Em main solo qua context paste + Write file + research delegate.
|
||
```
|
||
|
||
## Skills check
|
||
|
||
6 skills hiện tại unchanged. Decision KHÔNG add skill mới cho RAG vì:
|
||
- RAG là decision/architectural pattern, không phải workflow project-specific
|
||
- Cross-project applicable → memory entry phù hợp hơn skill
|
||
- Per rule §9.5 anti-pattern "viết skill chỉ để có thêm"
|
||
- Defer skill creation sau Phase 1 trial validate
|
||
|
||
## Tests
|
||
|
||
Unit test 81 unchanged (0 test added — pure planning, không code change).
|
||
|
||
## Memory entry mới
|
||
|
||
**`feedback_rag_hybrid_pattern.md`** (NEW — cross-project pattern reusable):
|
||
- Decision Cách A rationale (control flow priority)
|
||
- Multi-agent cost reality (8-10× multiplier)
|
||
- 3-layer hybrid pattern Phase 1-3 incremental rollout
|
||
- Stack validated (Voyage + Qdrant + FastMCP)
|
||
- When to apply / when NOT apply triggers
|
||
- Anti-patterns documented
|
||
- Anthropic 4 blog cross-ref
|
||
|
||
## Verify chain
|
||
|
||
| Check | Status |
|
||
|---|---|
|
||
| dotnet build | Không chạy (no .cs change) |
|
||
| dotnet test | Không chạy (no test added — pure docs) |
|
||
| npm build | Không chạy (no FE change) |
|
||
| Push origin | Pending end of turn |
|
||
| CI Gitea Actions | Skip per path filter `.md` |
|
||
| IIS prod deploy | KHÔNG xảy ra (CI skip, expected) |
|
||
|
||
## Docs updates
|
||
|
||
- ✅ `docs/STATUS.md` — Last updated S21 turn 2 + Recently Done row top
|
||
- ✅ `docs/HANDOFF.md` — TL;DR Session 21 turn 2 section + Last updated
|
||
- ✅ `docs/rag-setup-plan.md` — extend +Section 13 (cost reality) +Section 14 (3-layer)
|
||
- ✅ `docs/changelog/sessions/2026-05-12-1800-s21-turn2-rag-planning.md` — file này
|
||
- ✅ Memory user-level new: `feedback_rag_hybrid_pattern.md`
|
||
- ✅ Memory user-level: `MEMORY.md` index + 1 entry pointer
|
||
- ⏭ KHÔNG đụng: rules.md / architecture.md / gotchas.md / database/* / flows/* / skills/* / CLAUDE.md (no real change cho 8 file này)
|
||
- ⏭ KHÔNG flush 4 sub-agent MEMORY.md (chưa spawn, per §6.5 KHÔNG add noise)
|
||
|
||
## Handoff Session 21 turn 3+
|
||
|
||
### Plan I NEW — RAG Setup Implementation
|
||
|
||
**Trigger:** Bro confirm 5 dự án path + stack + pilot choice + Voyage API key + disk cleanup 5-8GB free.
|
||
|
||
**Schedule:** Dedicated session 10-14h weekend (per memory `feedback_drastic_refactor_scope` rule).
|
||
|
||
**Phases:**
|
||
- Phase 1 (Week 1-4): Layer 1 vector embeddings only — ~70% recall — ~$1.50/mo
|
||
- Phase 2 (Month 2): + Layer 2 BM25 hybrid — ~78% recall — ~$1.50/mo
|
||
- Phase 3 (Month 3): + Layer 3 Reranking + Contextual — ~92% recall — ~$4-5/mo
|
||
|
||
**Pre-flight task:** Spawn 🔵 Investigator audit MD inventory 5 dự án parallel → tinh chỉnh blanket list per project.
|
||
|
||
### Plan B Contract V2 wire (vẫn pending S21 turn 1)
|
||
|
||
- Trial Week 1 multi-agent kick-off SOLUTION_ERP
|
||
- 6 tasks (Mig 28+29 + Service + Controller + FE × 2 + Pin V2)
|
||
- 4 sub-agents pipeline coordinate (lần đầu spawn 4 agents thật)
|
||
|
||
### Plan C Test gap fill (vẫn pending)
|
||
|
||
Bundle Chunk E Plan B — 5 test pending:
|
||
- B4 silent 403 regression (gotcha #44 vi phạm §7)
|
||
- V2 Service `ApproveV2Async` UPSERT opinion
|
||
- Section gộp Chunk C render
|
||
- Mig 25 PATCH `/user-selectable`
|
||
- Mig 27 PATCH `/api/menus/{key}`
|
||
|
||
### Plan D-F-G unchanged
|
||
|
||
- D: Hard blockers ops (UAT/SMTP/creds/backup) — BLOCKED chờ user
|
||
- F: Audit định kỳ 2026-06-01 (~3 tuần nữa, KHÔNG tự chạy)
|
||
- G: Multi-agent trial 4-week (post-S21 t1 + S21 t2 setup complete)
|
||
|
||
## Stats cumulative S21 turn 2
|
||
|
||
| Metric | Trước S21 t2 | Sau S21 t2 | Δ |
|
||
|---|---|---|---|
|
||
| DB tables | 59 | 59 | 0 |
|
||
| Migrations | 27 | 27 | 0 |
|
||
| Endpoints | ~142 | ~142 | 0 |
|
||
| FE pages | 34 | 34 | 0 |
|
||
| Unit tests | 81 | 81 | 0 |
|
||
| Gotchas | 44 | 44 | 0 |
|
||
| **Memory entries** | 16 | **17** | **+1** (RAG hybrid pattern) |
|
||
| Skills | 6 | 6 | 0 |
|
||
| Sub-agents | 4 seeds-only | 4 seeds-only | 0 (chưa spawn) |
|
||
| **Commits S21** | 2 (`f1c61c9` + `3a34831`) | **4** | **+2** (1f8e9af + this chốt) |
|
||
| **MD plan files** | 0 | **1** | **+1** (`rag-setup-plan.md` 1223 LOC + 2 section extend) |
|
||
|
||
## Cross-ref
|
||
|
||
- S21 turn 1 session log: `2026-05-12-0030-s21-cicd-monitor-add.md`
|
||
- Plan file: `docs/rag-setup-plan.md` (1223 + extend ~300 LOC = ~1500 LOC)
|
||
- Memory new: `feedback_rag_hybrid_pattern.md` (cross-project reusable)
|
||
- Industry research: claude-code-guide × 2 spawn agent reports
|
||
- 4 Anthropic blog cross-ref trong memory entry
|
||
|
||
## Bài học chốt S21 turn 2
|
||
|
||
1. **Em main control flow strong là priority bro** — quyết định Cách A defensive over Cách B aggressive
|
||
2. **Multi-agent cost realistic 8-10× solo** — KHÔNG tránh được spawn baseline ~400K cumulative 4 agents
|
||
3. **Anthropic recommend 3-layer hybrid pattern** — embeddings + BM25 + reranking compound effect
|
||
4. **Industry consensus = hybrid** — Cursor + Continue + Cline + Aider all evolve toward hybrid
|
||
5. **Voyage Vietnamese quality cần verify Week 1** — voyage-3-large multilingual nhưng explicit Vietnamese benchmark chưa publish
|
||
6. **RAG setup = dedicated session 10-14h** — per `feedback_drastic_refactor_scope` rule
|
||
7. **5 projects scale workable** — single Qdrant + per-project collection + ~$2-5/month cost
|