Files
solution-erp/.claude/agent-memory/cicd-monitor/MEMORY.md
pqhuy1987 f1c61c9af6 [CLAUDE] Skill: Add cicd-monitor (4th sub-agent — post-deploy verify Gitea + bundle hash)
Path A chốt sau pre-flight Plan G Trial Week 1 (Session 21 turn 1). Con thứ 4 cicd-monitor green READ tier — poll Gitea Actions API + curl bundle hash 2 app + sqlcmd migration verify prod + endpoint smoke. ~150K spawn cost extra, trade-off để catch deploy ship fail tự động không phụ thuộc em main nhớ verify thủ công (recurring blind spot S20).

Files added:
- .claude/agents/cicd-monitor.md (~7KB) — system prompt + 8-step workflow + 5-stage report + gotcha #25/#39/#40/#41/#44 cross-ref + skill iis-deploy-runbook/dependency-audit-erp/ef-core-migration preload
- .claude/agent-memory/cicd-monitor/MEMORY.md (~5KB seed) — recurring CI bug patterns + 5-stage checklist + baseline build/bundle metrics

Files updated:
- .claude/agents/README.md — 4-agent architecture diagram (green slot) + decision tree (after push + prod issue diagnose branches) + memory routine 4 SendMessage + skills preload 4 agents + cost reality ~750K spawn / ~1.35M heavy / ~700K optimized + trial workflow Week 1-3 CI/CD Monitor spawn integrated + pass criteria + catch ≥1 deploy ship fail

Trade-off rationale:
- 4× solo → 6.5× solo per heavy session (vs 3 agents 6× solo) — Max 20× plan absorbs
- Post-deploy ship verification = recurring blind spot (Em main solo quên verify ~30% push S20)
- Bundle hash unchanged + mig drift prod = silent fail signal (no exception, just user UAT confusion)

CI skip per path filter (all 3 files .md match `**/*.md` paths-ignore).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 00:47:09 +07:00

152 lines
7.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CI/CD Monitor Agent — Persistent Memory
> **Persistent diary cross-session.** Auto-injected first 200 lines / 25KB at spawn.
> Update BEFORE every stop. Curate when > 25KB.
---
## 🎯 Role baseline
Read-only CI/CD pipeline + post-deploy verifier for SOLUTION_ERP. Polls Gitea Actions API, verifies test gate + deploy ship + prod health. Tools: Read, Grep, Glob, Bash, WebFetch. Output: PASS/FAIL verdict + evidence under 500 words. **Spawn cost ~150K tokens** — trade-off để catch fail tự động không phụ thuộc em main nhớ verify.
---
## 🚨 Recurring CI/CD bug patterns (catch with priority)
### Gotcha #39 — act_runner github.com TCP timeout
- **Symptom:** CI run hang ở "Set up job" → timeout 21s, run stays "queued" forever
- **Verify:** log line `Error: dial tcp ... github.com:443 ... i/o timeout`
- **Fix:** manual checkout bypass đã hardcode trong `.gitea/workflows/deploy.yml` (run #108/#109), pass at #110. KHÔNG revert. Nếu pattern returns → escalate em main check VPS network
### Gotcha #40 — npm cache `tsc not found`
- **Symptom:** `build_fe_admin` fail sau khi enable `cache: npm``actions/setup-node@v4`
- **Verify:** log line `sh: tsc: command not found` hoặc `npm error code ETIMEDOUT`
- **Fix:** DISABLED npm cache rolled back ở `a21790d`. KHÔNG re-enable. Build time chấp nhận ~3 min thay vì optimize
### Gotcha #41 — paths-ignore docs-only skip
- **Symptom:** Commit code thật mà CI không trigger (run list không có entry mới)
- **Verify:** `git diff --name-only HEAD~1 HEAD` vs `paths-ignore: ['docs/**', '**/*.md', '.claude/skills/**']`
- **Fix:** Nếu commit có code thật bị skip nhầm → check pattern conflict. Nếu commit chỉ docs → expected behavior (saving ~9 min deploy / commit MD-only)
### Gotcha #25 — IIS WebSocket / module exclusion
- **Symptom:** `notification-hub/negotiate` returns 401 hoặc 404 prod (FE SignalR connect fail)
- **Verify:** `curl -X POST https://api.solutions.com.vn/notification-hub/negotiate` → non-200
- **Fix:** IIS WebSocket module enable trong `web.config` của site api.solutions.com.vn (skill `iis-deploy-runbook`)
### Deploy ship verification — bundle hash unchanged
- **Symptom:** commit push success + Gitea action success + status PASS, **nhưng prod không có thay đổi visible** (user UAT báo "đã deploy mà không thấy")
- **Root cause candidates:**
- IIS app pool chưa recycle → giữ assembly cũ trong memory
- NSSM service script không copy file đúng folder
- Browser cache (rare nếu Vite hash chuẩn)
- **Verify:** `curl -s https://admin.solutions.com.vn/ | grep -oE '/assets/index-[a-z0-9]+\.js'` — hash giữ nguyên = ship fail
- **Fix:** SSH `vietreport-vps "Restart-WebAppPool admin.solutions.com.vn"` + recheck bundle hash
### Migration drift prod vs repo
- **Symptom:** Latest mig trong repo (vd Mig 27) nhưng prod chưa có (DbInitializer startup fail)
- **Verify:** Compare `ls Migrations/*.cs` vs `sqlcmd ... __EFMigrationsHistory`
- **Fix:** Check `Program.cs` startup hook `app.MigrateDatabase()` còn không + app pool recycle. Hoặc manual `dotnet ef database update --connection prod` qua SSH
---
## 📋 5-stage checklist (apply EVERY run)
### Stage 1: Push happened + filter check
- `git log -1 --format='%H %s'` — latest commit
- `git log origin/main..HEAD` — must be empty (synced)
- `git diff --name-only HEAD~1 HEAD` vs `paths-ignore` — nếu chỉ docs → SKIPPED-DOCS
### Stage 2: Gitea Actions poll (max 10 iter × 60s)
- API: `https://git.baocaogiaoduc.vn/api/v1/repos/vietreport-admin/solution-erp/actions/runs?limit=5`
- Match `head_sha == $commitSha` → get `runId`
- Status: queued / in_progress / completed
- Conclusion (when completed): success / failure / cancelled / timed_out
### Stage 3: Test gate verify (Domain 58 + Infra 23 baseline)
- Logs grep: `Passed:` line per stage
- Phase 9 UAT exception: test count may be lower nếu em main skip per chunk (memory `feedback_uat_skip_verify`) — NOT a failure
- Delta from baseline → report
### Stage 4: Post-deploy live verify (if SUCCESS)
- Auth login → bearer (admin + nv.test for non-admin gotcha #44 check)
- 3-5 endpoint smoke 2XX expected (include endpoint mới trong commit)
- FE bundle hash 2 app changed (compare pre vs post)
- SignalR negotiate (gotcha #25 — if commit relates notification)
- EF migration latest prod == latest repo
### Stage 5: Report PASS/FAIL with evidence + MEMORY.md update
---
## ⚠️ Anti-patterns observed (DO NOT)
1. ❌ Push fix code — READ only, escalate to em main
2. ❌ Speculate fail cause without log evidence
3. ❌ Skip post-deploy live verify khi SUCCESS — bundle hash là biggest catch
4. ❌ Skip MEMORY.md update
5. ❌ Poll forever (max 10 iter ~10 min timeout)
6. ❌ Auto-rollback — escalate với recommendation, KHÔNG tự chạy
7. ❌ Verify khi commit docs-only — SKIPPED-DOCS + return ngay
---
## 🧠 SOLUTION_ERP CI/CD essentials
- **Gitea:** https://git.baocaogiaoduc.vn/vietreport-admin/solution-erp
- **Workflow:** `.gitea/workflows/deploy.yml` (test gate 2 step + build BE + build FE × 2 + deploy)
- **Path filter:** `paths-ignore: ['docs/**', '**/*.md', '.claude/skills/**']` (gotcha #41)
- **Prod URLs:** api / admin / eoffice `.solutions.com.vn`
- **SSH VPS:** `ssh vietreport-vps` (user=Administrator, key=id_ed25519)
- **DB prod:** `.\SQLEXPRESS` / `SolutionErp` / vrapp user
- **Tests baseline:** 81/81 (58 Domain + 23 Infra)
- **Mig latest repo:** check `Glob src/Backend/SolutionErp.Infrastructure/Migrations/*.cs | tail -3`
- **Mig latest prod:** sqlcmd `__EFMigrationsHistory ORDER BY MigrationId DESC TOP 5`
- **Bearer test:**
- Admin: `admin@solutions.com.vn / Admin@123456` (full)
- UAT non-admin: `nv.test@solutions.com.vn / TestUser@123456` (Drafter CCM — verify gotcha #44 silent 403 patterns)
---
## 🔑 Critical config (gotcha cross-ref)
- Node CI pin: `20.x` (memory `feedback_node_cicd` — bài học NamGroup)
- MediatR pin: `12.4.1` (gotcha #1)
- Swashbuckle pin: `6.9.0` (gotcha #2)
- act_runner: manual checkout bypass github.com (gotcha #39)
- npm cache: DISABLED (gotcha #40 — KHÔNG re-enable)
Flag commit nếu thấy `<PackageReference Include="MediatR" Version="14...` hoặc `cache: npm` tái xuất hiện.
---
## 📊 Run stats baseline (cumulative)
- **Build time BE (test_domain + test_infra + build_be):** ~90s baseline
- **Build time FE × 2 app:** ~60s baseline mỗi app
- **Deploy NSSM + IIS recycle:** ~30s
- **Total CI run time:** ~3 min code commit / 0s docs-only commit
- **Trend trigger:** nếu run time > 5 min → escalate (cluster network slow hoặc dependency bloat)
- **Bundle size baseline:** fe-admin ~800KB gz / fe-user ~750KB gz (Vite production build)
---
## 📅 Recent runs (FIFO last 20)
- **2026-05-12 (setup):** CI/CD Monitor agent initialized. Baseline knowledge load complete (44 gotchas cross-ref + 5-stage checklist + 3 skills preload + bundle hash verify pattern). No runs monitored yet. Awaiting first SendMessage from em main after push (candidate: Plan B Contract V2 wire Session 21+).
---
## 🔄 Curate trigger
- Memory size > 25KB → archive recent runs to `archive/<period>.md`
- Duplicate failure patterns → merge into single entry (vd act_runner timeout x3 → 1 entry)
- Stale > 3 months → remove
Last curate: 2026-05-12 (initial seed)