- STATUS bundle hash admin DPPTx2Kw / user CjoUEsoV (rotated) + S48 verdict - HANDOFF next-(a) marked done · session log cicd spawn-record + verdict - cicd-monitor MEMORY flush (Run #369 + bundle baseline) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
86 lines
17 KiB
Markdown
86 lines
17 KiB
Markdown
# CI/CD Monitor Agent — Persistent Memory
|
||
|
||
> **Persistent diary cross-session.** Auto-injected first ~200 lines at spawn (L1 HOT).
|
||
> Update BEFORE every stop. Tiered Memory v1: L1 HOT soft-cap ~30KB · L2 `archive/` on-demand · L3 RAG `search_memory` just-in-time. Keep entry ≤ 1.5K chars (gotcha #53).
|
||
> Full verbatim run history pre-S40 → git `d2f52ba` + `archive/2026-05-{runs,q2,q3,q4}.md`.
|
||
|
||
---
|
||
|
||
## 🎯 Role baseline
|
||
|
||
Read-only CI/CD + post-deploy verifier SOLUTION_ERP. Polls Gitea Actions API, verifies test gate + deploy ship + prod health. Tools: Read, Grep, Glob, Bash, WebFetch + 5 RAG MCP. Output: PASS/FAIL + evidence <500 words. Skills: `iis-deploy-runbook` + `dependency-audit-erp` + `ef-core-migration`. Spawn ~150K — trade-off catch fail tự động.
|
||
|
||
---
|
||
|
||
## 🚨 Recurring CI/CD bug patterns (catch priority)
|
||
|
||
- **#39 act_runner github.com TCP timeout** — run hang "Set up job" 21s. Log `dial tcp github.com:443 i/o timeout`. Fix: manual checkout bypass hardcoded `.gitea/workflows/deploy.yml` (pass #110). KHÔNG revert.
|
||
- **#40 npm cache `tsc not found`** — `build_fe_admin` fail post `cache: npm`. DISABLED rolled back `a21790d`. KHÔNG re-enable.
|
||
- **#41 paths-ignore docs-only skip** — code commit không trigger CI? Check `git diff --name-only HEAD~1 HEAD` vs `paths-ignore: ['docs/**','**/*.md','.claude/skills/**']`. Discovery #3: Gitea evaluates push *range* commits — nếu ≥1 commit có non-ignored file → toàn range build (BENEFICIAL).
|
||
- **#25 IIS WebSocket** — `notification-hub/negotiate` 401/404 prod. Fix: WebSocket module enable `web.config` site api (skill `iis-deploy-runbook`).
|
||
- **#48 SQLite tie-break** — `OrderByDescending(CreatedAt).First()` pick wrong khi 2+ `.Add()` cùng frozen-clock. Fix: discriminator filter `.Where(Summary.Contains("Chuyển phase"))` BEFORE OrderBy.
|
||
- **Bundle hash unchanged = ship FAIL** — push+action success nhưng prod không đổi. Verify `curl -s https://admin.solutions.com.vn/ | grep -oE '/assets/index-[a-z0-9]+\.js'`. Fix: SSH `Restart-WebAppPool`. ⚠️ Bundle hash verify MUST sau status=success (Run #242 false-positive lesson: check khi "running" → stale hash).
|
||
- **Migration drift prod vs repo** — compare `ls .../Persistence/Migrations/*.cs` vs `sqlcmd __EFMigrationsHistory`. Fix: check `Program.cs` `app.MigrateDatabase()` + app pool recycle.
|
||
|
||
---
|
||
|
||
## 📋 5-stage checklist (EVERY run)
|
||
|
||
- **Stage 0 RAG infra:** `Get-Service Qdrant` Running + `http://localhost:6333/healthz`. Collection `proj_solution_erp` (prefix `proj_*` 7 project — Discovery #8).
|
||
- **Stage 1 Push+filter:** `git log -1 --format='%H %s'` + `git log origin/main..HEAD` empty + diff vs paths-ignore (docs-only → SKIPPED-DOCS return).
|
||
- **Stage 2 Gitea poll** (max 10 iter × 60s): API `.../actions/tasks?limit=5` (NOT `/runs` 404). Match `head_sha`. ⚠️ task table `updated_at` stale ~2min (gotcha #46) → cross-check VPS mtime.
|
||
- **Stage 3 Test gate:** baseline **130 PASS** (58 Domain + 72 Infra). Phase 9 UAT exception lower OK (`feedback_uat_skip_verify`).
|
||
- **Stage 4 Post-deploy** (if SUCCESS): auth login bearer (admin + nv.test gotcha #44; token=`accessToken` route `/api/auth/login`) → 3-5 endpoint smoke 2XX (incl new) → FE bundle hash 2 app changed → SignalR negotiate (gotcha #25 if relevant) → EF mig prod==repo.
|
||
- **Stage 4.6 (S29 CRITICAL):** sqlcmd seed sample verify post-deploy (NOT chỉ schema). `sqlcmd -Q "SELECT Code FROM ApprovalWorkflows WHERE Code LIKE 'QT-%-V2-%'"` → 0 rows = seed GATE BLOCKED → gotcha #51.
|
||
- Discovery #4: ASP.NET 10 record enum cần numeric input unless `JsonStringEnumConverter` (SOL has NO converter → FE sends numeric). #5: sqlcmd ssh Windows-auth cần `\\\\SQLEXPRESS` 4-backslash. #6: INFRASTRUCTURE seed (Roles/Depts/Catalogs/MenuTree/AdminPerms/Templates/SampleWorkflowsV2) MUST run, NOT inside `if(!demoSeedDisabled)`; DEMO seed (DemoUsers/Contracts/PE) OK gated → gotcha #51.
|
||
- **Stage 5 Report** PASS/FAIL + evidence + MEMORY update.
|
||
|
||
---
|
||
|
||
## ⚠️ Anti-patterns (DO NOT)
|
||
1. ❌ Push fix code — READ only, escalate em main · 2. ❌ Speculate fail without log · 3. ❌ Skip post-deploy bundle hash (biggest catch) · 4. ❌ Skip MEMORY · 5. ❌ Poll forever (max 10 iter) · 6. ❌ Auto-rollback (escalate + recommend) · 7. ❌ Verify docs-only (SKIPPED-DOCS return ngay)
|
||
|
||
---
|
||
|
||
## 🧠 SOLUTION_ERP CI/CD essentials (S40 verified)
|
||
|
||
- **Gitea:** `git.baocaogiaoduc.vn/vietreport-admin/solution-erp` · workflow `.gitea/workflows/deploy.yml` · paths-ignore `['docs/**','**/*.md','.claude/skills/**']`
|
||
- **Prod:** api/admin/eoffice `.solutions.com.vn` · SSH `ssh vietreport-vps` (Administrator, id_ed25519) · IIS site phys paths (S42 verified): API `C:\inetpub\solution-erp\api` · admin `\fe-admin` · user `\fe-user` (3 sites Started). DB `.\SQLEXPRESS`/`SolutionErp`/`vrapp` SQL-auth. **Conn string key = `ConnectionStrings.Default` (NOT `DefaultConnection`!)** — read pw from prod appsettings.Production.json when `$env:PROD_DB_PASSWORD` empty.
|
||
- **SSH→PS quoting (S42 lesson):** nested bash→ssh→powershell mangles `$var`/`\"`. Use `iconv UTF-16LE | base64` → `powershell -EncodedCommand $B64`. Single-quote literal paths.
|
||
- **Tests baseline:** **181 PASS** (S45 Run #368 sha 0c5a014; Domain 58 + Infra 123 = +27 HRM coverage gaps: HrmConfigHolidayTests + EmployeeSatelliteTests + AuthorizePolicyRegressionTests-ext vs prev 154). CI gate runs both test projects BEFORE build/deploy → status=success ⟹ test gate passed (`tasks` endpoint reports terminal as `status:success`, `conclusion` field NOT populated). Local grep undercounts (Theory/InlineData) — trust CI conclusion. Phase 9 UAT mode skip per chunk OK.
|
||
- **Mig latest repo:** **Mig 43 `20260601064128_FilterHolidayUniqueIndexByIsDeleted`** (S45; index-only change, prod tables stay 90-by-sys.tables / 91-by-doc — NO new table). Path `src/Backend/SolutionErp.Infrastructure/Persistence/Migrations/`. Prod check `sqlcmd __EFMigrationsHistory ORDER BY MigrationId DESC TOP 5`. ⚠️ Table-count drift: `sys.tables` count = 90 (verified S42 #364 + S45 #368), CLAUDE.md narrative = 91 — counting-convention diff, NOT missing table. Don't FAIL on 90.
|
||
- **Bearer:** admin `admin@solutions.com.vn/Admin@123456` (full) · UAT `nv.test@solutions.com.vn/TestUser@123456` (Drafter CCM, gotcha #44 check)
|
||
- **Bundle hash live S48:** admin `DPPTx2Kw` · user `CjoUEsoV` (Run #369 sha 350b2bf, login subtitle a11y). Prev admin `Krjvg_3j` · user `6sNStgxa` (#368/0c5a014 — unchanged BE-only). Bundle size ~800KB/750KB gz.
|
||
- **DB pw (S42, when `$PROD_DB_PASSWORD` empty):** `vrapp/buKL3TGBkD0wDDbYVw65QeX9` read from `C:\inetpub\solution-erp\api\appsettings.Production.json`→`ConnectionStrings.Default`. ⚠️ Skill-doc path `C:\inetpub\apps\SolutionErp\Api` is STALE → real path `C:\inetpub\solution-erp\api`. sqlcmd over SSH works direct (no UTF-16 encode needed). ⚠️ sys-catalog string-concat queries hit collation conflict (`Latin1_General_CI_AS_KS_WS` vs `SQL_Latin1_General_CP1_CI_AS`) → add `COLLATE DATABASE_DEFAULT` per concatenated column.
|
||
|
||
## 🔑 Critical config (flag commit nếu tái xuất)
|
||
Node CI `20.x` (`feedback_node_cicd`) · MediatR `12.4.1` (gotcha #1, flag `Version="14`) · Swashbuckle `6.9.0` (gotcha #2) · act_runner manual checkout (#39) · npm cache DISABLED (#40, flag `cache: npm`)
|
||
|
||
---
|
||
|
||
## 🎯 Per-NV admin opt-in wire — 10-point checklist (cumulative S22→S23)
|
||
Cross-ref `feedback_per_nv_permission_scope`. Per-NV/per-Level refactor MUST verify: 1 Domain field · 2 EF `HasDefaultValue(false)` · 3 Mig 3-file · 4 Service read · 5 Domain+App DTO mirror · 6 Designer FE checkbox · 7 AwLevelDto+ToDto · 8 CreateAwLevelInput+Update mutation · 9 **Lookup discrimination** (`FirstOrDefault` ADD `ApproverUserId==actorId` + admin fallback) · 10 **Controller body record count == Command record count**. Bug latency 2-3 days prod silent khi miss 9-10. Scan `grep -n "FirstOrDefault.*Order.*==" *.cs` after OR-of-N refactor.
|
||
|
||
## 📊 Run stats baseline
|
||
BE (test+build) ~90s · FE × 2 ~60s/app · deploy ~30s · **total ~3min code / 0s docs-only**. >5min → escalate.
|
||
|
||
---
|
||
|
||
## 📅 Recent runs (FIFO — older → archive/git)
|
||
|
||
- **2026-06-03 Run #369 (run_number 255) sha=`350b2bf` PASS ~4m13s (S48 FE-only login subtitle a11y `text-slate-500→600`, ZERO BE/Mig):** Push range `7bbfa5a..350b2bf` 2 commits: `009dd94` DOCS/GOVERNANCE-only (9 files: STATUS/HANDOFF + 3 adap-reports + error-ledger + session-log + frontend-designer MEMORY + session-end.md cmd — ALL `.md`/`.claude/**`) + `350b2bf` CODE 2 files `fe-{admin,user}/src/pages/LoginPage.tsx` (1-line each, slate-500→600 subtitle contrast). Mixed push: `.tsx` present → **NOT path-filter skipped, full pipeline RAN** (gotcha #41 Discovery #3 — ≥1 non-ignored file in range ⟹ whole range builds; docs commit alone would skip but `.tsx` overrides). Poll iter5 status=success (started 00:06:33 → 00:10:46). **Bundle ROTATE admin `Krjvg_3j→DPPTx2Kw` + user `6sNStgxa→CjoUEsoV`** (BOTH changed ✓ FE shipped — verified AFTER status=success; pre-deploy snapshot iter0 still showed OLD `Krjvg_3j`/`6sNStgxa`, correct timing per anti-pattern #3). **NO migration** — repo 43 == prod `__EFMigrationsHistory` 43, latest both `...FilterHolidayUniqueIndexByIsDeleted` (Mig 43 unchanged, BE/Domain untouched ✓). Health live+ready 200 + admin/eoffice index 200. Test gate 181 (CI both proj pre-deploy ⟹ success=passed). 0 regression. NEW LESSON: smallest possible FE change (1-line className) still rotates bundle hash — Vite content-hash sensitive to any source byte; mixed docs+tsx push is the canonical case where docs-only-skip does NOT apply. Tag `[s48, run369, pass, fe-only-a11y, mixed-push-not-skipped]`.
|
||
- **2026-06-01 Run #368 (run_number 254) sha=`0c5a014` PASS ~4m20s (S45 Mig 43 filter Holiday UNIQUE by IsDeleted + 3 HRM test gaps — BE+tests ONLY, ZERO FE):** Push range `dbbed15..0c5a014` 2 commits: `051b62b` Tests +27 (HrmConfigHolidayTests + EmployeeSatelliteTests + AuthorizePolicyRegressionTests-ext → baseline 154→**181**) + `0c5a014` Mig 43 `20260601064128_FilterHolidayUniqueIndexByIsDeleted` (drops+recreates `IX_Holidays_Year_Date` as filtered UNIQUE `WHERE [IsDeleted]=0`, was unfiltered) + HolidayConfiguration.cs edit + Case-7 test flip. 7 files, all BE+tests, none in paths-ignore → CI ran. Poll iter4 status=success (started 13:43:47 → 13:48:07). **Bundle hashes UNCHANGED admin `Krjvg_3j` + user `6sNStgxa`** (= #367) — CORRECT for BE-only push, NOT ship-fail (Run #243 precedent; ship-proof = Mig 43 applied, not bundle rotate). **Mig 43 auto-applied prod** (history top = `...FilterHolidayUniqueIndexByIsDeleted` ✓). **THE FIX VERIFIED prod:** `IX_Holidays_Year_Date | unique=1 | filter=([IsDeleted]=(0))` — filter_definition non-NULL = filtered UNIQUE live (soft-deleted holidays no longer collide on UNIQUE). Health live+ready 200 Healthy. `Holidays` table exists, 10 rows, 2 named idx (PK + filtered UNIQUE). Prod tables=90-by-sys.tables (index-only change, NO new table — consistent #364 delta). NEW LESSON: filtered-index migration verify = check `sys.indexes.filter_definition` non-NULL (NOT just mig-history row); index-only mig = bundle unchanged + table-count unchanged both EXPECTED. Tag `[s45, run368, pass, mig43-filtered-index, be-only-bundle-unchanged]`.
|
||
- **2026-05-30 Run #367 (run_number 253) sha=`82d7fcf` PASS ~4m08s (S42 P11-B LeaveBalance business logic, Mig 42):** Code commit 22 files (4 BE: Domain `LeaveBalance.cs` + App `LeaveBalanceFeatures.cs`/`LeaveOtApprovalFeatures` deduction hook + `LeaveBalancesController` + IApplicationDbContext + DbContext + Config + Mig42 3-file + 2 FE `WorkflowAppDetailPage`×2 +`workflowApps.ts`×2 + 2 tests + 4 agent-memory .md). Started 11:11:40 → success iter4 11:15:48. **Bundle rotate admin `BU8FTBRi→Krjvg_3j` + user `tepE4jvR→6sNStgxa`** (both changed ✓ FE shipped, verified AFTER status=success — pre-deploy snapshot still showed old hash, correct timing). **Mig 42 `20260530034336_AddLeaveBalances` auto-applied prod** (tables 90→**91**, `LeaveBalances` EXISTS). Schema ✓: UserId/LeaveTypeId/Year/EntitledDays/UsedDays/AdjustmentDays decimal + AuditableEntity soft-delete. **UNIQUE `IX_LeaveBalances_UserId_LeaveTypeId_Year`** + **FK→LeaveTypes del=NO_ACTION** (=Restrict) ✓. New endpoint smoke: `GET /api/leave-balances/my` unauth=**401** (route live not 404) + admin auth=**200** lazy-default 5 LeaveTypes (ANNUAL12/COMPASSIONATE3/MATERNITY180/SICK30/UNPAID0, all Used=0, `remainingDays`=entitled ✓ DTO shape has remainingDays/entitledDays) + `?year=2026` admin route 401 unauth + `PUT /adjust`=411 (route reg). health live/ready 200 Healthy. **NO seed gate concern** (plain table, lazy DTO — Stage 4.6 N/A). 0 regression. Note: prev run #366 (ffb2062 docs STATUS update) was a CODE-path push w/ status=success — NOT docs-only-skipped (commit touched only .md but Gitea still ran since prior range?); actually #366 display_title is Docs but ran full → confirms agent-memory .md NOT in paths-ignore (`.claude/skills/**` ignored, `.claude/agent-memory/**` NOT). Tag `[s42, run367, pass, p11b-leavebalance, mig42]`.
|
||
- **2026-05-30 Run #365 sha=`75df04e` PASS ~4m05s (S42 P11-A fix workflow picker 2-bug + SetWorkflow endpoint, NO migration):** Code commit 11 files (4 BE controllers + 2 App features `LeaveOtApprovalFeatures`/`TravelVehicleApprovalFeatures` +125 lines + 2 FE `WorkflowAppDetailPage` ×2 + 1 test +79 lines). Status=success iter5 (started 10:15:45). **Bundle rotate admin `BLA09-qv→6D4k-aRi` + user `CXvejOE-→DkME-974`** (both changed ✓ FE fix shipped, verified AFTER status=success). +4 endpoint `PUT /api/{leave,ot,travel,vehicle-bookings}/{id}/workflow` (`Set{Module}WorkflowCommand`, route `[HttpPut("{id:guid}/workflow")]` body record `SetWorkflowBody(Guid ApprovalWorkflowId)`). Unauth smoke leave+ot/workflow → **401** (route exists, NOT 404 ✓). health live+ready 200 Healthy. Test gate **144** (CI both proj pre-deploy; grep undercounts InlineData=14 Fact at WorkflowAppApproveV2Tests). **NO migration** → skipped Stage 4.6 seed (verified #250). **NAMING RECONCILE:** Gitea task IDs are real #364 (e7b66cd, mem-labeled "#250") + #365 (this). Going forward use actual Gitea task id. **HEADS-UP em main:** follow-up commit `e47ef1d` (FE-User ProposalCreatePage workflow dropdown shape, latent S37 bug) pushed 10:19:17 DURING poll — NOT yet triggered CI run, will redeploy FE shortly (bundle may re-rotate). Out of scope this verdict. Tag `[s42, run365, pass, p11a-setworkflow]`.
|
||
- **2026-05-30 Run #364 (mem #250) sha=`e7b66cd` PASS ~4m07s (S42 P11-A wire ApproveV2+LevelOpinions 4 WorkflowApps):** 1 commit BE+FE×2+Mig41+Tests. Status=success iter3. Bundle rotate admin `cWAXid0q→BLA09-qv` + user `CX79e2kZ→CXvejOE-`. **Mig 41 auto-applied prod** (latest=`20260530021936_WireWorkflowAppsApprovalV2`). Tables 84→**90** (+5: Leave/Ot/Travel/VehicleRequest LevelOpinions + WorkflowAppCodeSequences — ALL EXIST). 4 new endpoint smoke 200 auth (leave/ot/travel/vehicle-requests) + unauth 401 (route exists) + POST .../approve=411 (route reg). health live/ready 200. **Stage 4.6 seed gate PASS** (gotcha #51): 4 WF seeded prod despite DemoSeed:Disabled — QT-NP/OT/CT/XE-V2-001 AppType=5/6/7/9, verified call-site L142-145 OUTSIDE `if(!demoSeedDisabled)` gate. Test gate 141 (CI runs both proj pre-deploy). Note: table count 90 vs spec-expected 89 = baseline-count diff, NOT missing table (all 5 present). Stale doc drift deploy.yml comments "54/17 test" (cosmetic, flag em main). Tag `[s42, run250, pass, p11a-approvev2-workflowapps]`.
|
||
- **2026-05-28 Run #247 sha=`e54a22d` PASS 3m25s (S38 SKELETON 5-plan combo Mig 39+40 dual):** Push 1 commit mega `Domain+App+Infra+Api+FE×2`. ALL PASS. Bundle rotate admin `CGueDk22→cWAXid0q` + user `CEt0QRgX→CX79e2kZ`. Mig 39+40 dual auto-applied startup (90830→90839). 6 endpoint smoke 200 (leave/ot/travel/vehicle/it-tickets/hr-dashboard `totalEmployees=33 male=17 female=16`). 6 new tables + 8 menu seeded. 0 regression. Fastest S38 deploy. Tag `[s38, run247, pass, skeleton-combo]`.
|
||
- **2026-05-28 Run #246 sha=`de1c378` PASS 3m53s (S37 Proposal Mig 37+38):** Bundle admin `C9kzTTmq→CGueDk22` + user `CC4DQ-Tr→CEt0QRgX`. Mig 38 AddProposals + 37 ExtendApplicableType. `/api/proposals` 200 empty + workflow `QT-DX-V2-001` ApplicableType=4 seed + 4 Off_DeXuat menu. Stage 4.6 sample seed INFRASTRUCTURE-gated correct (gotcha #51). Tag `[s37, run246, pass, proposal-v2]`.
|
||
- **Archived Run #359/#243/#242/#241/#240 + S35/S36 startup → `archive/2026-05-q4.md` + git d2f52ba (S40 curate):** Run #359 G-O2 Meeting Mig 36 · #243 HrmConfig BE 16 endpoint (BE-only bundle unchanged anti-pattern verify) · #242 FE inline forms 5 satellite · #241 Mig 35 HRM foundation · #240 satellite CRUD. Discovery #7 path-filter eval/** + #8 collection `proj_*`. KEY absorbed in essentials/Stage sections above.
|
||
- **Archived Run #232 (S29 gotcha #51 catch — SeedSampleContractWorkflowV2 nested in demoSeedDisabled → empty V2 dropdown, hoist fix) → `archive/2026-05-q4.md` + git. Smart Friend ROI 4× cumulative (S22 #44 + S25 #48 + S29 ApplicableType + S29 DemoSeed).**
|
||
|
||
---
|
||
|
||
## 🔄 Curate trigger
|
||
- >~30KB → archive recent runs → L2 `archive/<period>.md`. Dup failure patterns → merge. Stale >3mo → remove.
|
||
- **Last curate: 2026-05-29 S40 em main proxy** (35.3→~21KB): archived Run #359/#243/#242/#241/#240 + S35/S36 startup → q4 + git d2f52ba; refreshed stale 120→130 test + Mig 34→40 + Stage 3 111→130. Foundation (gotcha patterns + Stage 0-5 + Stage 4.6 + 10-point + Discovery #4-8) preserved. Prev: S34 q3 · S32 q2 · S22 runs.
|