[CLAUDE] Docs: setup RAG Framework v1.3 governance + eval framework
All checks were successful
Deploy SOLUTION_ERP / build-deploy (push) Successful in 3m52s
All checks were successful
Deploy SOLUTION_ERP / build-deploy (push) Successful in 3m52s
- docs/governance/README.md: Path B delegation stub → AI_INFRA canonical Phase/BC vocabulary documented (9 phase + 10 BC SOLUTION_ERP-specific) - .claude/rag.json: add _decision_log block (10 rationale entries) + add .claude/agents/**/*.md to corpus_paths (fix Case D harvest gap) - eval/evaluator.md: inline executor spec v1.0 (Spec A strict) - eval/golden-set-solution_erp.jsonl: 14-entry golden set v1.1 (5 gotcha + 3 pattern + 3 decision + 3 negative) - eval/runs/2026-05-26-baseline-v1.0-failed.json: v1.0 attempt recall@5=0.455 FAIL — root cause diagnosis Case A/C/D - eval/runs/2026-05-26-baseline-v1.1-pending.json: v1.1 attempt pending CLI restart for accurate numbers - eval/trial-state-lock.json: 2-section split (quality_gate + drift_monitor) per v1.3 §6.2, 4-week milestones 2026-05-26 → 2026-06-23 CRITICAL lesson: bootstrap.py --project flag overrides collection name only. Use --config D:\...\SOLUTION_ERP\.claude\rag.json for correct project root. Old projects.json had root_path=AI_INFRA for solution_erp (Anti #24) — FIXED. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
51
eval/trial-state-lock.json
Normal file
51
eval/trial-state-lock.json
Normal file
@ -0,0 +1,51 @@
|
||||
{
|
||||
"version": "v1.3",
|
||||
"project_id": "solution_erp",
|
||||
"framework_adopted": "2026-05-26",
|
||||
"governance_path": "docs/governance/README.md",
|
||||
"golden_set_version": "v1.1",
|
||||
"spec_chosen": "A",
|
||||
"baseline_note": "v1.0 attempted 2026-05-26 recall@5=0.455 FAIL. v1.1 attempted same day — pending CLI restart for accurate numbers. Official baseline = after CLI restart + re-run.",
|
||||
"quality_gate": {
|
||||
"baseline_recall_at_5": null,
|
||||
"baseline_recall_at_5_note": "PENDING — use v1.0=0.455 as conservative estimate until v1.1 re-run post CLI restart",
|
||||
"baseline_avg_top1_rerank": 0.870,
|
||||
"gate_threshold_recall": 0.7,
|
||||
"gate_threshold_avg_rerank": 0.65,
|
||||
"pass": false
|
||||
},
|
||||
"drift_monitor": {
|
||||
"chunk_count_baseline": 2949,
|
||||
"chunk_count_registry": 2949,
|
||||
"chunk_count_note": "Anti #24 resolved: projects.json root_path fixed from AI_INFRA → SOLUTION_ERP. Bootstrap re-run 2026-05-26 correct.",
|
||||
"drift_threshold_percent": 5,
|
||||
"last_indexed_at_baseline": "2026-05-26T13:09:21.816262"
|
||||
},
|
||||
"trial_milestones": [
|
||||
{"week": 0, "date": "2026-05-26", "status": "setup", "label": "Setup complete — pending CLI restart for v1.1 baseline"},
|
||||
{"week": 1, "date": "2026-06-02", "status": "pending", "label": "v1.1 re-run after CLI restart + triage 0-result queries"},
|
||||
{"week": 2, "date": "2026-06-09", "status": "pending", "label": "Triage Case C/D failures (q05 IIS 25 + q06 CQRS)"},
|
||||
{"week": 3, "date": "2026-06-16", "status": "pending", "label": "Empirical chunk 512 vs 1500 retest"},
|
||||
{"week": 4, "date": "2026-06-23", "status": "pending", "label": "Final trial evaluation + decide v1.3 stable OR v1.4"}
|
||||
],
|
||||
"_decision_log": {
|
||||
"spec_a_vs_b_resolution_chosen": "Spec A — Strict. SOLUTION_ERP chunks canonical + finite scope (51 gotchas, patterns, decisions) → strict retrieval test appropriate.",
|
||||
"spec_chosen_date": "2026-05-26",
|
||||
"anatomy_threshold_chosen": "6/6 STRICT per v1.3 §5.2 (corpus 2949 chunks mature)",
|
||||
"governance_path_b_reason": "Path B delegation stub — no local customize needed at Phase 9 UAT stable stage. AI_INFRA canonical sufficient.",
|
||||
"bootstrap_correct_command": "python D:\\Dropbox\\CONG_VIEC\\AI_INFRA\\claude-rag\\bootstrap.py --config D:\\Dropbox\\CONG_VIEC\\SOLUTION\\SOLUTION_ERP\\.claude\\rag.json",
|
||||
"bootstrap_wrong_command": "python D:\\Dropbox\\CONG_VIEC\\AI_INFRA\\claude-rag\\bootstrap.py --project solution_erp (DO NOT USE — resolves from CWD, not project config)"
|
||||
},
|
||||
"_anti_patterns_observed": {
|
||||
"anti_24_registry_drift": "projects.json had root_path=AI_INFRA for solution_erp entry. Fixed 2026-05-26. Caused 2 bad bootstraps (1351 AI_INFRA chunks written to proj_solution_erp collection).",
|
||||
"anti_23_source_path": "Absolute Windows path D:\\Dropbox\\... in chunk payload. Low priority fix-forward.",
|
||||
"mcp_reload_lesson": "Bootstrap.py clearing Qdrant collection + BM25 → MCP server must be restarted to pick up new data. Similar to agents/*.md hot-reload requiring CLI restart."
|
||||
},
|
||||
"_lessons": [
|
||||
"CRITICAL: --project flag overrides only collection_name, NOT project root. Always use --config for cross-project bootstrap.",
|
||||
"projects.json root_path for solution_erp was wrong (AI_INFRA) — check ALL projects in registry before first bootstrap.",
|
||||
"MCP server caches/stale after Qdrant collection replace → CLI restart needed for accurate baseline.",
|
||||
"v1.0 baseline (11,922 chunk auto-reindex corpus) may have been from MCP auto-reindex picking up ALL files including HANDOFF.md + STATUS.md not in explicit corpus_paths.",
|
||||
"SOLUTION_ERP failure mode: NOT Anti #9 keyword stacking (AI_INFRA lesson) but corpus gap (agents not indexed) + language mismatch (Vietnamese terms)."
|
||||
]
|
||||
}
|
||||
Reference in New Issue
Block a user