solution-erp/docs/changelog/sessions/2026-05-12-1800-s21-turn2-rag-planning.md

# Session 21 turn 2 — RAG Hybrid setup planning + Cách A validation deep dive

**Date:** 2026-05-12 (tiếp S21 turn 1 từ 0030 sáng — sang sáng-chiều-tối discussion deep RAG)
**Dev:** Claude (Opus 4.7 1M Max — em main solo, no SOLUTION_ERP sub-agent spawn)
**Base commit:** `3a34831` (S21 turn 1 chốt cicd-monitor)
**Commits:** `1f8e9af` (RAG plan save) + this chốt (2 commit S21 turn 2)

## Bối cảnh

Sau S21 turn 1 chốt cicd-monitor (4 sub-agents seeds-only), bro đặt câu hỏi về RAG infrastructure cho **5 dự án future > 1M MD context**. Cuộc thảo luận deep ~15+ turn covering:

1. RAG fundamentals + Vector DB role
2. Embedding model "AI nhúng" + Voyage AI cost mechanics
3. Multi-project shared architecture (5 projects)
4. Audit procedure 3-tier + change tracking SQLite
5. UI/UX Streamlit dashboard 7 pages
6. Cách A defensive (giữ blanket) vs Cách B aggressive (cắt 60-70%)
7. Reasoning depth comparison (lazy current vs Cách A vs Cách B)
8. Industry validation via claude-code-guide research
9. Multi-agent cumulative cost reality (4 agents → ~520K cumulative blanket)
10. 3-layer hybrid pattern (Anthropic Contextual Retrieval: embeddings + BM25 + reranking)

## Deliverables

### File mới — `docs/rag-setup-plan.md` (commit `1f8e9af`, 1223 LOC)

Cross-project reference plan với 12 section comprehensive:

1. Context + Why
2. Architecture overview (6-layer diagram)
3. BLANKET load list (~100K, 28% MD)
4. RAG store list (~254K, 72% MD)
5. Tool stack recommend
6. Setup scripts copy-paste ready (~250 LOC Python)
7. Audit procedure 3-tier (weekly/monthly/quarterly)
8. Multi-AI client access (Claude Code + Desktop + Cursor + GPT-4)
9. Timeline rollout 10-14h dedicated session
10. Caveats + risks
11. Success metrics + decision gate
12. Future enhancements

### File extend S21 turn 2 (this chốt commit)

Add 2 sections vào `rag-setup-plan.md`:
- Section 13: Multi-agent cumulative cost reality (Anthropic 8-10× warning)
- Section 14: 3-layer hybrid RAG upgrade path (Phase 1-3 Anthropic Contextual Retrieval)

## Quyết định chốt — Cách A vs Cách B

### Chọn **Cách A** (defensive hybrid) ⭐

```
Blanket: GIỮ NGUYÊN ~120K em main (35% MD)
RAG: ADD as supplement (retrieve on-demand)
Multi-agent: 4 sub-agents share retrieve cache
Sub-agent spawn blanket: ~80-100K each (auto-inject + skills + spec)
Cumulative blanket 5 entities: ~520K
Heavy session billed: ~560K (saving 20% vs lazy)
```

**Why Cách A (priority bro: em main control flow strong):**
1. ✅ State ownership strong — em main biết direct project state
2. ✅ Decision quality 90% (vs Cách B 75-80% do fragmentation)
3. ✅ Wall-clock per task 12 phút (vs Cách B 16 phút)
4. ✅ UX smooth — em response fast direct cho state question
5. ✅ Risk-averse — graceful degradation nếu RAG fail (blanket fallback)
6. ✅ Multi-agent leverage cache hit 70-90% common queries
7. ✅ Quality recall +25-55pp (5-15 sources cross-validated vs lazy 1-3)

### Bỏ **Cách B** (aggressive cut)

```
Blanket: CẮT MẠNH 60-70% (40-50K còn lại)
RAG: PRIMARY access mechanism cho mọi thứ
```

**Why bỏ:**
1. ❌ Vi phạm priority "em main control flow strong"
2. ❌ State ownership weak — phải retrieve mỗi câu state question
3. ❌ UX latency +1-2s per state Q
4. ❌ Decision quality 75-80% do reasoning fragmentation
5. ❌ Risk severe nếu RAG fail (em main ngơ ngác)
6. ❌ Anthropic research warn: "context rot inevitable cutting aggressively"
7. ❌ Cascade retrieve problem (1 task → 2-3 retrieves)

## Industry validation via claude-code-guide research

Spawn 2 lần claude-code-guide agent research (NOT SOLUTION_ERP sub-agents):

### Round 1: Anthropic setup inventory (10 features)

- Memory tool beta (`content-management-2025-06-27`)
- Prompt caching extensions (5min/1h beta)
- Files API beta (`files-api-2025-04-14`)
- Citations stable
- MCP servers official + community (9,400+ in 2026)
- Voyage AI embedding partnership
- Context compaction tool
- Claude Agent SDK orchestration
- Batch API 50% discount
- RAG best practices Anthropic official

### Round 2: Industry practice validation

**5/5 dimensions Cách A fit Anthropic explicit recommend:**

| Dimension | Bro setup | Anthropic pattern |
|---|---|---|
| Context approach | Hybrid blanket+RAG | ✅ Recommended explicit |
| Sub-agent count | 4 | ✅ "3-5 optimal" |
| MD scale | 5 project > 1M | ✅ "Use RAG khi >200K" |
| Stack | Qdrant+Voyage+MCP | ✅ Production validated |
| Coordination | Em main + agents | ✅ "Coordinator+workers" |

**Source 4 Anthropic blog posts:**
- "Effective Context Engineering for AI Agents" (2025)
- "Contextual Retrieval" (Sept 2024 flagship)
- "Effective Harnesses for Long-Running Agents"
- "Multi-Agent Coordination Patterns"

**Community consensus (Tier 1 tools all Hybrid):**
- Cursor IDE `@codebase` indexing
- Continue.dev MCP transport
- Cline / Roo-Cline filesystem + AST + dynamic context
- Aider code-as-graph
- Sourcegraph Cody graph-aware

→ **ZERO** tools adopt aggressive Cách B pattern. **ALL** evolve toward Cách A hybrid.

## 3-layer hybrid pattern (Anthropic Contextual Retrieval Sept 2024)

```
Layer 1: Embeddings (Voyage-3-large)
  → Semantic + synonym + multilingual catch
  Performance: baseline ~50% recall

+ Contextual prefix (Haiku-generated context):
  → +35% improvement = ~67% recall

Layer 2: BM25 (bm25s Python lib free)
  → Exact identifier + technical terms catch
  + Layer 1 = ~75% recall

Layer 3: Reranking (Voyage rerank-2)
  → Cross-attention deep relevance
  + Layer 1+2 = ~85% recall
```

**Phase rollout incremental:**

| Phase | Layer | Recall | Cost/month |
|---|---|---|---|
| Phase 1 (Week 1-4) | Layer 1 vector only | ~70% | ~$1.50 |
| Phase 2 (Month 2) | + Layer 2 BM25 | ~78% | ~$1.50 (BM25 free local) |
| Phase 3 (Month 3) | + Layer 3 + Contextual | ~92% | ~$4-5 |

## Multi-agent cost reality (Anthropic warn 8-10× multiplier)

```
Per entity blanket:
  Em main: ~120K
  Sub-agent each spawn: ~80-100K (auto-inject baseline + skills + spec)

Cumulative blanket 5 entities = ~520K

Heavy session full 4-agent spawn:
  Lazy current:  ~700K effective billed
  Cách A:        ~560K (-20% saving from multi-agent shared cache)

Cost multiplier vs solo em main: ~8-10×
Anthropic acknowledged: "Expect 3-10× token multiplier"
```

**Saving Cách A breakdown (-140K):**
- Em main lazy Read → retrieve: -25K
- 4 agents lazy Read → cached retrieve: -160K (share cache 70-90%)
- Reasoning streamlined: -20K
- Plus +60K retrieve cost added
- Net: -145K ≈ -20% per heavy session

## Stack validated

| Component | Tool | Reason |
|---|---|---|
| **Vector DB** | Qdrant local | Rust binary 50MB, agent-native 2026 leader |
| **Embedding** | Voyage-3-large | Anthropic partner, multilingual 26 lang, $0.18/M |
| **MCP server** | FastMCP Python | Official Anthropic SDK |
| **Chunking** | Custom adaptive Python | §6.5 compliant, transparent |
| **Tracking** | SQLite local | Event log + audit + cost analytics |
| **Dashboard** | Streamlit custom | 7 pages multi-project |
| **Re-index** | Pre-commit hook | Native git, delta on commit |

**Total cost 5 projects:** ~$1.50-5/month depending Phase. ~$0.50 initial embed.

## Em main solo S21 turn 2 (no SOLUTION_ERP sub-agent spawn)

```
Spawn này session:
  ✅ claude-code-guide × 2 (generic agent for Anthropic research)
  ❌ Investigator / Implementer / Reviewer / CI/CD Monitor (vẫn seeds-only)

Em main solo qua context paste + Write file + research delegate.
```

## Skills check

6 skills hiện tại unchanged. Decision KHÔNG add skill mới cho RAG vì:
- RAG là decision/architectural pattern, không phải workflow project-specific
- Cross-project applicable → memory entry phù hợp hơn skill
- Per rule §9.5 anti-pattern "viết skill chỉ để có thêm"
- Defer skill creation sau Phase 1 trial validate

## Tests

Unit test 81 unchanged (0 test added — pure planning, không code change).

## Memory entry mới

**`feedback_rag_hybrid_pattern.md`** (NEW — cross-project pattern reusable):
- Decision Cách A rationale (control flow priority)
- Multi-agent cost reality (8-10× multiplier)
- 3-layer hybrid pattern Phase 1-3 incremental rollout
- Stack validated (Voyage + Qdrant + FastMCP)
- When to apply / when NOT apply triggers
- Anti-patterns documented
- Anthropic 4 blog cross-ref

## Verify chain

| Check | Status |
|---|---|
| dotnet build | Không chạy (no .cs change) |
| dotnet test | Không chạy (no test added — pure docs) |
| npm build | Không chạy (no FE change) |
| Push origin | Pending end of turn |
| CI Gitea Actions | Skip per path filter `.md` |
| IIS prod deploy | KHÔNG xảy ra (CI skip, expected) |

## Docs updates

- ✅ `docs/STATUS.md` — Last updated S21 turn 2 + Recently Done row top
- ✅ `docs/HANDOFF.md` — TL;DR Session 21 turn 2 section + Last updated
- ✅ `docs/rag-setup-plan.md` — extend +Section 13 (cost reality) +Section 14 (3-layer)
- ✅ `docs/changelog/sessions/2026-05-12-1800-s21-turn2-rag-planning.md` — file này
- ✅ Memory user-level new: `feedback_rag_hybrid_pattern.md`
- ✅ Memory user-level: `MEMORY.md` index + 1 entry pointer
- ⏭ KHÔNG đụng: rules.md / architecture.md / gotchas.md / database/* / flows/* / skills/* / CLAUDE.md (no real change cho 8 file này)
- ⏭ KHÔNG flush 4 sub-agent MEMORY.md (chưa spawn, per §6.5 KHÔNG add noise)

## Handoff Session 21 turn 3+

### Plan I NEW — RAG Setup Implementation

**Trigger:** Bro confirm 5 dự án path + stack + pilot choice + Voyage API key + disk cleanup 5-8GB free.

**Schedule:** Dedicated session 10-14h weekend (per memory `feedback_drastic_refactor_scope` rule).

**Phases:**
- Phase 1 (Week 1-4): Layer 1 vector embeddings only — ~70% recall — ~$1.50/mo
- Phase 2 (Month 2): + Layer 2 BM25 hybrid — ~78% recall — ~$1.50/mo
- Phase 3 (Month 3): + Layer 3 Reranking + Contextual — ~92% recall — ~$4-5/mo

**Pre-flight task:** Spawn 🔵 Investigator audit MD inventory 5 dự án parallel → tinh chỉnh blanket list per project.

### Plan B Contract V2 wire (vẫn pending S21 turn 1)

- Trial Week 1 multi-agent kick-off SOLUTION_ERP
- 6 tasks (Mig 28+29 + Service + Controller + FE × 2 + Pin V2)
- 4 sub-agents pipeline coordinate (lần đầu spawn 4 agents thật)

### Plan C Test gap fill (vẫn pending)

Bundle Chunk E Plan B — 5 test pending:
- B4 silent 403 regression (gotcha #44 vi phạm §7)
- V2 Service `ApproveV2Async` UPSERT opinion
- Section gộp Chunk C render
- Mig 25 PATCH `/user-selectable`
- Mig 27 PATCH `/api/menus/{key}`

### Plan D-F-G unchanged

- D: Hard blockers ops (UAT/SMTP/creds/backup) — BLOCKED chờ user
- F: Audit định kỳ 2026-06-01 (~3 tuần nữa, KHÔNG tự chạy)
- G: Multi-agent trial 4-week (post-S21 t1 + S21 t2 setup complete)

## Stats cumulative S21 turn 2

| Metric | Trước S21 t2 | Sau S21 t2 | Δ |
|---|---|---|---|
| DB tables | 59 | 59 | 0 |
| Migrations | 27 | 27 | 0 |
| Endpoints | ~142 | ~142 | 0 |
| FE pages | 34 | 34 | 0 |
| Unit tests | 81 | 81 | 0 |
| Gotchas | 44 | 44 | 0 |
| **Memory entries** | 16 | **17** | **+1** (RAG hybrid pattern) |
| Skills | 6 | 6 | 0 |
| Sub-agents | 4 seeds-only | 4 seeds-only | 0 (chưa spawn) |
| **Commits S21** | 2 (`f1c61c9` + `3a34831`) | **4** | **+2** (1f8e9af + this chốt) |
| **MD plan files** | 0 | **1** | **+1** (`rag-setup-plan.md` 1223 LOC + 2 section extend) |

## Cross-ref

- S21 turn 1 session log: `2026-05-12-0030-s21-cicd-monitor-add.md`
- Plan file: `docs/rag-setup-plan.md` (1223 + extend ~300 LOC = ~1500 LOC)
- Memory new: `feedback_rag_hybrid_pattern.md` (cross-project reusable)
- Industry research: claude-code-guide × 2 spawn agent reports
- 4 Anthropic blog cross-ref trong memory entry

## Bài học chốt S21 turn 2

1. **Em main control flow strong là priority bro** — quyết định Cách A defensive over Cách B aggressive
2. **Multi-agent cost realistic 8-10× solo** — KHÔNG tránh được spawn baseline ~400K cumulative 4 agents
3. **Anthropic recommend 3-layer hybrid pattern** — embeddings + BM25 + reranking compound effect
4. **Industry consensus = hybrid** — Cursor + Continue + Cline + Aider all evolve toward hybrid
5. **Voyage Vietnamese quality cần verify Week 1** — voyage-3-large multilingual nhưng explicit Vietnamese benchmark chưa publish
6. **RAG setup = dedicated session 10-14h** — per `feedback_drastic_refactor_scope` rule
7. **5 projects scale workable** — single Qdrant + per-project collection + ~$2-5/month cost