From 282cbd0c7b15f2401447f92c47e8044de96fe381 Mon Sep 17 00:00:00 2001 From: pqhuy1987 Date: Fri, 29 May 2026 22:18:17 +0700 Subject: [PATCH] =?UTF-8?q?[CLAUDE]=20Docs:=20S41=20RAG=20audit=20response?= =?UTF-8?q?=20=E2=80=94=20exclude=20**/-anchor=20fix=20+=20retire=20stale?= =?UTF-8?q?=20=5Fdecision=5Flog=20+=20AI=5FINFRA=20signal?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - rag.json exclude_paths root-anchored -> **/-anchored (defeats gotcha #10: node_modules/** + docs/_archive/** were not matching nested paths) - _decision_log: retire stale "+321% / LIVE 11,922" -> real status (LIVE ~3080 ~= registry 3076, drift closed 2026-05-28) - New docs/governance/RAG-AUDIT-RESPONSE-2026-05-29.md: SE-side prep done + corrections (store_memory at-risk = 3 disk-backed broadcasts, NOT ~27) + re-bootstrap ask for AI_INFRA + post-bootstrap verify checklist Co-Authored-By: Claude Opus 4.8 (1M context) --- .claude/rag.json | 14 ++++---- .../RAG-AUDIT-RESPONSE-2026-05-29.md | 33 +++++++++++++++++++ 2 files changed, 40 insertions(+), 7 deletions(-) create mode 100644 docs/governance/RAG-AUDIT-RESPONSE-2026-05-29.md diff --git a/.claude/rag.json b/.claude/rag.json index 53c02ae..ee1e5cc 100644 --- a/.claude/rag.json +++ b/.claude/rag.json @@ -8,11 +8,11 @@ ".claude/agents/**/*.md" ], "exclude_paths": [ - "docs/_archive/**", - "node_modules/**", - "bin/**", - "obj/**", - ".git/**" + "**/_archive/**", + "**/node_modules/**", + "**/bin/**", + "**/obj/**", + "**/.git/**" ], "extra_corpus": [ "C:\\Users\\pqhuy\\.claude\\projects\\D--Dropbox-CONG-VIEC-SOLUTION-SOLUTION-ERP\\memory\\*.md" @@ -31,8 +31,8 @@ "contextual_retrieval_rationale": "Flag true but per v1.3 §12.1 + §9.4: SOLUTION_ERP chunks self-contained (gotchas, patterns, decisions) → Contextual Retrieval prepend likely wasteful. Evaluate per eval recall@5 trial week 3.", "spec_a_vs_b_resolution_chosen": "Spec A — Strict. Rationale: SOLUTION_ERP chunks canonical + finite scope (51 gotchas, patterns, decisions) → strict retrieval test appropriate.", "spec_chosen_date": "2026-05-26", - "anatomy_threshold_chosen": "6/6 STRICT per v1.3 §5.2 default (corpus 11,922 chunks — mature)", - "registry_drift_note": "Anti #24 — projects.json registry 2830 vs Qdrant LIVE 11,922 (+321% drift). Intentional defer re-bootstrap until Phase 9 UAT stable. Document in trial-lock _baseline_note.", + "anatomy_threshold_chosen": "6/6 STRICT per v1.3 §5.2 default (SE collection ~3080 chunks live 2026-05-29 — mature; the old '11,922' referred to a stale all-projects total, corrected S41)", + "registry_drift_note": "RESOLVED S41 2026-05-29 — re-bootstrap 2026-05-28 closed the count drift (Qdrant LIVE ~3080 ≈ registry 3076). The old '+321% / 11,922' figure was STALE (pre-bootstrap) and is retired. REMAINING corpus-hygiene issue (per AI_INFRA RAG audit 2026-05-29): ~237 node_modules + ~22 _archive junk chunks hidden inside corpus because root-anchored excludes did not match nested paths (gotcha #10). Fixed S41: exclude_paths switched to **/-anchored globs. Takes effect on next re-bootstrap (AI_INFRA op).", "source_path_note": "Anti #23 — absolute Windows path D:\\Dropbox\\... in chunk payload. Fix in next re-bootstrap via bootstrap.py path normalization. Low priority.", "governance_doc": "docs/governance/README.md (Path B delegation stub — AI_INFRA canonical)" }, diff --git a/docs/governance/RAG-AUDIT-RESPONSE-2026-05-29.md b/docs/governance/RAG-AUDIT-RESPONSE-2026-05-29.md new file mode 100644 index 0000000..135fe96 --- /dev/null +++ b/docs/governance/RAG-AUDIT-RESPONSE-2026-05-29.md @@ -0,0 +1,33 @@ +# 📤 SOLUTION_ERP → AI_INFRA — RAG Audit Response (2026-05-29, S41) + +Re: AI_INFRA RAG audit 2026-05-29 (Qdrant LIVE verify). SE-side prep DONE; re-bootstrap = AI_INFRA op (charter v2). Persistent + corpus-backed record. + +## ✅ SE-side DONE (this session) + +1. **Exclude fix (`.claude/rag.json`)** — root-anchored → `**/`-anchored, defeats gotcha #10: + - `node_modules/**` → `**/node_modules/**` + - `docs/_archive/**` → `**/_archive/**` (also `bin`/`obj`/`.git` → `**/`-anchored for consistency) + - JSON validated. Takes effect on next re-bootstrap. +3. **`_decision_log` stale numbers retired** — `registry_drift_note` "+321% / LIVE 11,922" was pre-bootstrap STALE → rewritten to real status (LIVE ~3080 ≈ registry 3076, drift closed 2026-05-28). `anatomy_threshold_chosen` "11,922 mature" → "SE collection ~3080". + +## 🔶 SE-side findings (corrections to audit estimates — verified on disk + Qdrant) + +- **store_memory at-risk ≪ "~27".** True store_memory chunks (`heading_path="(manual)"`) = only the **3 S40 broadcasts**, ALL disk-backed (`docs/governance/BROADCAST-OUT-*.md` confirmed on disk). Replace-mode recreates them from corpus files → **NOT at-risk, no export-reinsert needed.** The "~27" appears to conflate with the 27 user-memory *feedback* entries — those are extra_corpus FILE-based; they need re-index to be **added**, not protected. +- **node_modules junk confirmed:** `docs/_user-guide/node_modules/` = 30 `.md` files on disk (≈237 chunks plausible). +- **_archive risk is now WORSE, not stable:** `docs/_archive/` now holds the 170KB+ pre-S40 STATUS/HANDOFF archives (created S40, after the 05-28 bootstrap). A re-bootstrap WITHOUT the exclude fix would index hundreds of archive chunks. Exclude fix must land before re-bootstrap. +- **⚠️ Slug anomaly for AI_INFRA to confirm:** feedback chunks currently index under the OLD slug path `...projects\D--Dropbox-CONG-VIEC-SOLUTION\memory\` (missing `-SOLUTION-ERP`). Confirms the slug bug; replace-mode should wipe old-path chunks + re-add from the corrected `extra_corpus` path (rag.json:18, fixed S40). + +## 🟢 ASK — AI_INFRA re-bootstrap (1 run, gathers everything) + +`python AI_INFRA/claude-rag/bootstrap.py --project solution_erp` — picks up: (a) exclude fix → 0 node_modules + 0 _archive chunks; (b) corrected extra_corpus slug → 27 feedback entries indexed; (c) S38–S41 content (Proposal/WorkflowApps/consolidated docs). + +Repeat of prior standing items (broadcast 2026-05-29): bootstrap.py corpus-path validation (warn on glob→0 files), verify auto_reindex hook actually fires (last_indexed lagged), search_code corpus gap (src/*.cs + fe/*.tsx not in corpus), registry sync. + +## 🔍 SE post-bootstrap verify (after AI_INFRA confirms run) + +1. node_modules chunks = 0 · _archive chunks = 0 (search a known junk term → expect miss) +2. 27 feedback entries discoverable under corrected slug +3. 3 broadcasts still present · chunk_count sane (no bloat) + +## Stance (charter v2) +SE = USER of infra. SE handled its own config declaration (corpus/exclude) + content; RAG mechanism (bootstrap/chunk/path-resolution) stays AI_INFRA. Conflict → anh pqhuy quyết.