Files
solution-erp/docs/changelog/sessions/2026-05-22-s27-memory-curate-rag-dashboard-multi-agent-fix.md
pqhuy1987 e199603420
All checks were successful
Deploy SOLUTION_ERP / build-deploy (push) Successful in 3m31s
[CLAUDE] Docs+Memory: Session 29 FINAL wrap — Plan CA + Plan B Contract V2 cumulative
20 commits S29 push 4 CI Runs PASS (#229+#230 Plan CA, #231+#232 Plan B).
2 big plans END-TO-END deployed prod.

Changes (docs + memory + scripts — CI skip per paths-ignore):

docs/:
- STATUS.md: S29 FINAL wrap header với cumulative summary 20 commits +
  multi-agent ROI ~565K + 8 patterns NEW + state stats (33 mig, 60 tables,
  51 gotcha, 14 AppRoles, 34 active users, 4× bundle rotate)
- HANDOFF.md: S29 FINAL wrap header với end-to-end V2 capability + pending S30+
  follow-up (anh restart CLI MCP RAG hot-reload, UAT verify V2, test bundle
  Plan B, curate dedicated session)
- gotchas.md: +gotcha #51 INFRASTRUCTURE vs DEMO seed phân biệt (Plan B
  Hotfix CICD lesson) với decision tree + seed classification table
- changelog/sessions/2026-05-22-s29-plan-ca-plan-b-contract-v2-wire.md:
  Session log đầy đủ 20 commits + 4× Smart Friend pattern proven + 8
  patterns NEW + file-touched list + NEW capability end-to-end test plan

.claude/agent-memory/:
- 4 MEMORY.md flush S29 wrap entry FIFO each agent perspective:
  - Investigator (25.2 KB just over threshold) — Plan CA + Plan B pre-flight
    2 spawn + 3 patterns NEW (terrain map, V1+V2 coexist, reference templates)
  - Implementer (35.4 KB over hard threshold, defer curate S30) — 5 spawn
    cookie-cutter + E3 stopped + Pattern 12-bis NEW (cross-module entity mirror)
  - Reviewer (23.0 KB compacted) — 4 spawn 2 MAJOR catches + Cat 3 security
    cross-module validation foundation reinforced
  - CICD Monitor (24.9 KB) — 4 Runs verify + CRITICAL DemoSeed gate catch +
    Stage 4.6 sqlcmd seed verify foundation + Discovery #6 gotcha #51 cross-ref
- implementer/pattern_master_page_mirror.md (NEW Plan CA Chunk B Pattern 16-bis)

scripts/:
- plan-ca-{verify-menu,verify-perms,run-perms}.{sql,ps1} (5 verify scripts)
- plan-b-{verify-prod,run-verify}.{sql,ps1} (2 verify scripts)

Smart Friend pattern proven 4× cumulative S22 #44 + S25 #48 + S29 Reviewer
#ApplicableType + S29 CICD #DemoSeed.

Pending S30+:
- Anh restart CLI hot-reload MCP RAG cho 4 sub-agents (commit b51fc94)
- Anh UAT verify V2 contract end-to-end (Drafter → CCM approve → DaPhatHanh)
- Test bundle Plan B (regression ApproveV2Async + ApplicableType validation)
- Curate dedicated session 4 MEMORY (Implementer 35.4 KB priority)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 13:29:49 +07:00

19 KiB
Raw Blame History

Session 27 — 2026-05-22 — Memory Curate + RAG Manual Control + Multi-agent Setup Fix

Dev: Claude em main SOLUTION_ERP (sub-agent registry empty cả session - pitfall #1+#2 confirmed) + bro pqhuy Duration: ~5h (start ~17:00 → end ~22:00 GMT+7) Base commit: d99069a (S26 chốt cuối Plan AG6) Final commits: TBD (anh push manual sau approve)


🎯 Làm được

Plan A.3 — RAG Manual Control + Custom Dashboard

🟧 5 PS scriptsD:\.claude-rag\scripts\ (em main solo cookie-cutter mirror — Implementer Case 2 ACCEPT criteria miss vì registry empty):

  • start.ps1 (2.4 KB) - Qdrant background + health check (initial 3s warmup + TimeoutSec 5)
  • stop.ps1 (1.4 KB) - Graceful stop + verify
  • status.ps1 (6.2 KB) - 6-section terminal report colored
  • dashboard.ps1 (17.4 KB) - Generate custom HTML + open browser
  • boot.ps1 (1.4 KB) - Auto-boot sequence (Startup shortcut)

🟧 6 $PROFILE aliases: rag-start / rag-stop / rag-status / rag-dashboard / rag-restart / rag-boot

🟧 Auto-boot Startup folder shortcut %APPDATA%\Microsoft\Windows\Start Menu\Programs\Startup\rag-boot.lnk — auto-run boot.ps1 mỗi lần Windows boot.

🟧 Custom MCP Dashboard HTML D:\.claude-rag\dashboard.html (13.3 KB auto-gen) — 7 panels: Qdrant + MCP + Collections + Voyage + System + Agent MEMORY + Recent Logs. Auto-refresh 60s. Anthropic orange theme.

Plan A.4 — RAG Onboarding Guide

🟧 docs/guides/rag-onboarding-guide.md (26.3 KB / ~440 lines / 13 sections + §A2):

  • 13 sections: TL;DR, Background, Pre-check, Bootstrap 3 step, 6 MCP tools matrix, 4 sub-agent brief, Permission, Workflow, Troubleshooting, Tunings, Best practices, Status monitoring §12, Distributed Approach B §13
  • §A2 Cấu trúc chuẩn em main pioneer (replaces Plan A.2 skip với full file tree + 8-step bootstrap checklist + 4 Gotcha discoveries S26-S27)
  • Native Qdrant vs Custom MCP dashboard comparison table

Memory Curate (3 việc anh chốt OK)

🟧 4 MEMORY curate -60% size cumulative:

  • cicd-monitor: 72.4 KB → 15.8 KB (-78%)
  • implementer: 38.8 KB → 22.0 KB (-43%)
  • investigator: 34.9 KB → 16.3 KB (-53%)
  • reviewer: 34.5 KB → 17.9 KB (-48%)
  • Archive total 115 KB preserved verbose entries (rule §6.5 compliance)

🟧 Audit drift §6.4 + §9.4 sớm (trigger +8 gotcha vượt threshold):

  • 5 file patched: 4 26 migration → 31 migration + 1 41 bẫy → 49 bẫy
  • Audit log file: docs/changelog/skill-audit-2026-05-late.md
  • KEEP narrative §6.5: contract-workflow 96→77 test historical preserved

Plan F1 — Qdrant Native Dashboard Fix (UAT catch by anh)

🟧 Root cause: Qdrant Windows binary zip không bundle Web UI static files. Log warn Static content folder for Web UI './static' does not exist + http://localhost:6333/dashboard returns 404. Em pioneer S26 chỉ download qdrant-x86_64-pc-windows-msvc.zip (28.3 MB binary) không biết Web UI cần download riêng.

🟧 Fix: Download dist-qdrant.zip v0.2.12 (6.59 MB, released 2026-05-21 — 0.3 ngày trước) từ qdrant-web-ui GitHub releases + extract vào D:\.claude-rag\qdrant-bin\static\ + flatten dist/ subfolder up + restart Qdrant → HTTP 200 "UI | Qdrant" ✓

🟧 Custom Dashboard panel link "Qdrant Native Dashboard" giờ work end-to-end.

Plan F2 — Multi-agent Setup Pitfalls Fix (VIPIX guide catch)

🚨 Critical discovery: VIPIX project xuất docs/guides/multi-agent-pitfalls.md 2026-05-22 — anh share cho em audit SOLUTION_ERP setup. Em phát hiện:

  • 4 file .claude/agents/*.md dùng model: claude-opus-4-7 (full ID = 200K fallback, KHÔNG 1M Opus)
  • 4 file dùng non-standard field effort: max → silent reject possible
  • → Registry chưa load cả session S27 (Agent error: "Agent type 'investigator' not found")

🟧 Fix applied 4 file: model: inherit (kế thừa 1M Opus parent) + remove effort: max. Pending CLI restart cho hot-reload (pitfall #1).

🟧 Memory user-level NEW: feedback_subagent_setup_pitfalls.md (235 lines) - cross-project pitfall checklist VIPIX + SOLUTION_ERP evidence. Integrated vào MEMORY.md index.


⚠️ Anti-pattern phá vỡ S27 (retrospective analysis)

Em chủ trì kiêm 5 roles cả session vì registry empty:

Task S27 Implementer ACCEPT fit? Outcome
C1 Curate cicd-monitor 72KB REFUSE #1 (judgment §6.5) Em solo CORRECT
C2-C4 Curate 3 agent MEMORY REFUSE #1 Em solo CORRECT
C5 Audit drift §6.4 + §9.4 REFUSE #1 Em solo CORRECT
A3.1 Write 5 PS scripts ACCEPT Case 2 (5 file cookie-cutter mirror) Em miss delegate
A3.2 Dashboard HTML + generator REFUSE #2/#7 (first time pattern) Em solo CORRECT
A4 Onboarding guide 440 lines REFUSE #2 (docs judgment) Em solo CORRECT
F1 Qdrant Web UI fix REFUSE #4 (bug reasoning chain) Em solo CORRECT
F2 4 agent files fix ACCEPT Case 1 (4 file mechanical same edit) Em miss delegate

Verdict: 2/8 task lẽ ra delegate Implementer (Case 1+2) nhưng registry empty → em main forced solo. Net loss ~30 phút time + miss cookie-cutter mirror discipline.

Reviewer Smart Friend guard miss critical:

  • Em main S27 write rag-onboarding-guide.md claim "Qdrant Native Dashboard http://localhost:6333/dashboard" WITHOUT actual verify
  • Anh pqhuy UAT browser catch 404 → escalate
  • → Reviewer pre-commit spawn (nếu registry load) sẽ catch Cat 1 "Wire claim verify" — em main self-review compromised

E2E verified

Plan A.3 verify

Item Status Notes
5 PS scripts execute rag-status / rag-start / rag-stop tested live
Custom Dashboard HTML 13.3 KB, 7 panels rendered, browser opened
$PROFILE aliases 6 functions loaded via . $PROFILE
Startup shortcut rag-boot.lnk 1378 bytes created
ASCII-only PS source Em fix 3 file (stop/boot/dashboard) sau Unicode mangling issue

Plan F1 verify

Item Status
Qdrant native dashboard HTTP 200 "UI
Collections API live proj_solution_erp returned
static/ folder size 16.07 MB (Web UI v0.2.12)

Plan F2 verify (partial)

Item Status
4 file edit model: inherit Applied
4 file remove effort: max Applied
Sub-agent spawn test Pitfall #1 - need CLI restart
Re-spawn post fix Still "Agent type not found"
Registry hot-reload Pending anh restart Claude Code CLI

Memory curate verify

Agent Before After Status
cicd-monitor 72.4 KB 15.8 KB -78%
implementer 38.8 KB 22.0 KB -43%
investigator 34.9 KB 16.3 KB -53%
reviewer 34.5 KB 17.9 KB -48%
Total 180.6 KB 72.0 KB -60%

🐛 Bug gặp + fix

Bug Fix
PowerShell 5.1 mangling Unicode ✓ → âœ" trong .ps1 source Replace ASCII-only [OK]/[FAIL]/->/*/ trong PS code (HTML output unicode OK)
dashboard.ps1 parser fail với emoji 🧠🗄️🔌 trong here-string HTML Rewrite ASCII-only PS source, dùng CSS-styled icon-box div thay emoji
Qdrant native localhost:6333/dashboard 404 Download dist-qdrant.zip v0.2.12 + extract static/ + flatten + restart
start.ps1 health check timeout 30s fail false alarm Adjust to 3s warmup + TimeoutSec 5
Qdrant crashed OOM "allocation 8.4MB failed" mid-session rag-restart recovers (data persisted)
4 sub-agent model: claude-opus-4-7 silent fallback 200K Fix model: inherit per VIPIX pitfall #2
Sub-agent effort: max non-standard field silent reject Remove field
Re-bootstrap fail pydantic serialization error Defer S28 — debug needed

📚 Docs updates

File Update
docs/STATUS.md Last updated S27 chốt cuối (Memory curate + RAG manual control + multi-agent fix)
docs/HANDOFF.md TL;DR S27 với pitfall lesson + Plan B Contract V2 Pending S28
docs/changelog/sessions/2026-05-22-s27-memory-curate-rag-dashboard-multi-agent-fix.md File này — session log đầy đủ
docs/guides/rag-onboarding-guide.md NEW 26.3 KB / 440 lines / 13 sections + §A2 cấu trúc chuẩn
docs/changelog/skill-audit-2026-05-late.md NEW audit log drift trigger sớm (~100 lines)
.claude/agents/*.md × 4 Fix model: inherit + remove effort: max
.claude/agent-memory/*/MEMORY.md × 4 Curate (-60%) + S27 entry proxy flush
.claude/agent-memory/*/archive/2026-05-*.md × 4 NEW archive files preserve verbose Q1 entries
.claude/skills/README.md 26 migration → 31 + 44 bẫy → 49
.claude/skills/ef-core-migration/SKILL.md 4 patch count drift
.claude/skills/dependency-audit-erp/SKILL.md 41 bẫy → 49 bẫy
Memory user-level feedback_subagent_setup_pitfalls.md NEW 235 lines cross-project pitfall checklist
Memory user-level feedback_rag_hybrid_pattern.md +1 bài học #8 (Qdrant Web UI static missing fix)
Memory user-level MEMORY.md index +1 entry feedback_subagent_setup_pitfalls

🤝 Handoff S28+

Pending priority HIGH

  1. Anh restart Claude Code CLI → verify 4 sub-agent load post model: inherit fix. Spawn test Investigator IMMEDIATELY mỗi session start (per feedback_subagent_setup_pitfalls.md §4).
  2. Plan B Contract V2 wire (carry từ S25/S26) — kick off với 4 sub-agent ACTIVE: Investigator pre-flight + Implementer Case 2 mirror PE Mig 22-23 + Reviewer pre-commit + CICD Monitor post-push.
  3. Re-bootstrap RAG SOLUTION_ERP để index S27 changes (rag-onboarding-guide.md + skill-audit-late + 4 MEMORY updates + feedback_subagent_setup_pitfalls). Bootstrap fail pydantic — debug + retry S28.

Pending priority MEDIUM

  1. Audit S20-S26 memory log "Investigator spawn" claims — verify history retroactive (có thật spawn hay nhầm general-purpose default?). Defer dedicated investigation S28+.
  2. Plan AI Phase 5 distributed bootstrap 4 project khác (NamGroup/DH/Ashico/Vipix) — em main project đó tự làm khi anh mở Claude Code project (Approach B distributed).
  3. Test debt catch-up Plan C bundle (S22+1 + S25 + S26 bug fix chưa add regression — UAT mode defer per §7).

Pending priority LOW

  1. Benchmark RAG recall@10 golden dataset 100 query (gap optional)
  2. Disaster recovery weekly backup Qdrant data → Dropbox (gap optional)
  3. Gotcha #48 SQLite tie-break + #49 dual-phase UI confusion add docs/gotchas.md (carry từ S25)

📊 Thông số cumulative S27

Metric S26 chốt S27 chốt Δ
DB tables 59 59 0
Migrations 31 31 0
Endpoints ~146 ~146 0
FE pages 35 35 0
Unit tests 111 111 0 (UAT mode defer per §7)
Gotchas 49 49 0 (gotcha #48, #49 still pending docs add)
Memory user-level 23 24 +1 (feedback_subagent_setup_pitfalls.md NEW)
Skills project-local 6 6 0
Sub-agents 4 (broken registry) 4 (fixed, pending CLI restart) 0 count, +1 fix
Docs files +2 rag-onboarding-guide.md + skill-audit-2026-05-late.md
PS Scripts infra 0 +5 D:\.claude-rag\scripts\*.ps1
Custom Dashboard 0 +1 dashboard.html auto-gen
Agent MEMORY total 180.6 KB 72.0 KB -60% (115 KB archived)
Commits remote e23f51c..d99069a unchanged 0 push S27 (all local + memory + docs - anh chốt push thủ công)

Multi-agent ROI S27

Agent Spawn Actual outcome
🟦 Investigator 0 (registry empty) Em main solo audit + curate
🟨 Implementer 0 (registry empty) Em main solo 5 PS scripts + 4 file fix (lẽ ra delegate Case 1+2)
🟥 Reviewer 0 (registry empty) Em main self-review compromised (miss Qdrant 404)
🟩 CICD Monitor 0 (no remote push) N/A
👤 Em main solo continuous ~5h cả 8 task + meta-discovery pitfall

Net learning S27: Pitfall discovery + fix infrastructure → S28+ multi-agent CÓ THỂ work properly sau CLI restart. Anti-pattern em main solo = forced không phải choice.


🎓 Patterns reusable cross-project (S27 NEW)

  1. Multi-agent setup pitfall checklist - 4 pitfall VIPIX + SOLUTION_ERP evidence (feedback_subagent_setup_pitfalls.md)
  2. PS scripts ASCII-only discipline - HTML output unicode OK, PS source phải ASCII (CSS-styled badges + icon-box thay emoji)
  3. Qdrant Windows binary 2-step setup - binary + Web UI static download separate (gotcha #8 feedback_rag_hybrid_pattern.md)
  4. Custom Dashboard PS generator pattern - here-string HTML template + collect data via Invoke-RestMethod + Out-File UTF-8 + Start-Process browser
  5. Memory curate proxy pattern khi registry empty - em main update sub-agent MEMORY trực tiếp với retrospective analysis (REFUSE log validation + ACCEPT miss flag)
  6. Audit log file separate khi drift sớm - skill-audit-YYYY-MM-late.md thay vì rewrite existing audit log (preserve historical trail per §6.5)
  7. Dashboard SNAPSHOT vs LIVE distinction - static HTML generated by PS = snapshot tại 1 thời điểm, browser meta refresh KHÔNG re-run PS. Anti-pattern em pioneer commit: meta refresh 60s falsely impression live. Fix: add prominent timestamp + warning banner + link to Qdrant native dashboard (LIVE API fetch). Pattern reusable cho bất kỳ custom dashboard PowerShell.
  8. NSSM Windows Service Option 4b upgrade pattern - Qdrant binary natively no --service flag → wrap với NSSM (3 MB binary download from nssm.cc release/2.24, retry 503 transient). install-service.ps1 + fix-service-start.ps1 elevated scripts ready. Auto-start boot-time + auto-restart on crash + survive logout. Recipe transferable cho bất kỳ database/server binary cần Windows Service mode mà không có native support.
  9. Hybrid context loading discipline (Cách A defensive) - Layer 1 blanket auto-load ~120K (CLAUDE+STATUS+HANDOFF+MEMORY index+skills+sub-agents) + Layer 2 RAG retrieve on-demand via 6 MCP tools. Decision gate per project MD size: < 200K skip RAG / 200K-1M lazy / > 1M MANDATORY. Token budget zones 5-tier (green/warning/approach/critical). 7 anti-patterns avoid (Read full session log cũ, search vague, em main solo qualify Implementer Case 1/2, skip store_memory, heavy session > 6h). Onboarding §A3 NEW comprehensive.

🔧 Plan A.3+ Upgrade Post-Wrap — Option 4b NSSM Windows Service

Trigger: Anh phản biện "Qdrant nên auto-start như database server thường (PostgreSQL/SQL Server đều Windows Service auto-start)" → em upgrade Option 4a (manual scripts S26 chốt) lên Option 4b NSSM Windows Service. Done: Post-initial wrap-up cùng session S27 — anh chạy 2 elevated PS scripts → service Running successfully.

NSSM install + service registration

🟧 Download NSSM 2.24 ~351 KB từ https://nssm.cc/release/nssm-2.24.zip (retry 1× sau 503 transient) → extract D:\.claude-rag\nssm\nssm.exe (323 KB win64 binary)

🟧 install-service.ps1 (5.4 KB elevated script):

  • Verify Admin privileges
  • Remove old service nếu exist (idempotent)
  • nssm install Qdrant qdrant.exe + AppDirectory D:\.claude-rag\qdrant-bin
  • Log paths logs/qdrant-service.log + .err với rotation 10 MB
  • Start: SERVICE_AUTO_START (boot-time auto-start trước login)
  • AppExit Default Restart + AppRestartDelay 3000 (auto-respawn 3s sau crash)
  • DisplayName "Qdrant Vector DB (RAG Unified)" + Description friendly
  • Start service + health verify HTTP localhost:6333/healthz

Bug gặp: WAL lock conflict

🟧 Symptom: Step [5/6] Start failed — service Paused state, log Can't init WAL: Kind(WouldBlock).

🟧 Root cause: Em start Qdrant manual PID 9800 ngay trước install-service.ps1 → manual process held Write-Ahead Log file lock → service không init được WAL → fail start + Paused state.

🟧 fix-service-start.ps1 (2.4 KB elevated script) — recovery:

  • Stop-Service -Force (kill Paused state)
  • Kill any remaining qdrant.exe (defensive cho orphan PID)
  • Wait 3s
  • Start-Service Qdrant
  • Verify HTTP + process PID + RAM

🟧 Result confirmed: Service Running PID 4476 RAM 101.8 MB. Auto-start ENABLED ✓

4 PS scripts updated to Service mode

🟧 start.ps1Start-Service Qdrant (cần Admin). Detect already Running + skip + show service info.

🟧 stop.ps1Stop-Service Qdrant -Force (cần Admin). Defensive kill orphan qdrant.exe nếu detected.

🟧 status.ps1Get-Service Qdrant (no Admin) thay process check + Section 1 expanded với service info (Status, StartType, DisplayName) + Process info defensive guard try/catch cho StartTime null khi spawn from service.

🟧 boot.ps1 → drop Qdrant start (service auto-start) + verify service Running + regenerate Dashboard + open browser. Used by Startup folder shortcut rag-boot.lnk.

Dashboard SNAPSHOT vs LIVE clarification

🟧 Bug discovered: Custom Dashboard dashboard.html show "Running" badge stale 10+ phút sau Qdrant DOWN. Root cause: <meta http-equiv="refresh" content="60"> chỉ reload HTML cached, KHÔNG re-run PS script.

🟧 Fix applied: Remove meta refresh + add prominent yellow banner top:

⚠ Dashboard này là STATIC SNAPSHOT tại thời điểm generated (xem timestamp).
Browser auto-refresh chỉ reload HTML cached, KHÔNG poll Qdrant API live.
Muốn data live → mở Qdrant Native Dashboard (fetch API real-time).

🟧 Header timestamp + label "SNAPSHOT (not live)" red color visible.

🟧 Service status panel added Get-Service info: Service (Running) / Auto-start (Automatic) / DisplayName.

Onboarding guide §A3 NEW comprehensive

🟧 Section §A3 Context loading Hybrid pattern (~242 lines added) — 8 sub-sections:

  • §A3.1 Why Hybrid Cách A vs Cách B vs Skip RAG (decision gate)
  • §A3.2 Layer 1 Blanket auto-load checklist (~200-280 KB ≈ 50-70K tokens)
  • §A3.3 Layer 2 RAG retrieve decision tree (7-branch)
  • §A3.4 Token budget guidance 1M Opus (5 utilization zones)
  • §A3.5 6 MCP tools examples concrete (Vietnamese + English queries)
  • §A3.6 Best practices em main daily workflow (morning + in-session + EOS + monthly)
  • §A3.7 7 Anti-patterns cross-project (distilled VIPIX + SOLUTION_ERP)
  • §A3.8 Context monitoring + when to end session (heuristics + protocol)

Memory user-level updates

🟧 feedback_rag_distributed_ownership.md Decision 2 upgraded Option 4a → 4b NSSM với gotcha S27 install (nssm.cc 503 transient + WAL lock conflict + Process.StartTime null guard).

Final state Service mode

State
Qdrant Service Running (Automatic) ✓
HTTP /healthz OK ✓
Collection proj_solution_erp 3460 chunks (⚠ indexed_vectors=0 HNSW broken — pending re-bootstrap S28)
Auto-start on Windows boot ENABLED ✓
Auto-restart on crash ENABLED 3s delay ✓
7 PS scripts total start/stop/status/dashboard/boot/install-service/fix-service-start
Onboarding guide 42.5 KB / 682 lines / 14 sections + §A2 + §A3