# CI/CD Monitor Agent — Persistent Memory > **Persistent diary cross-session.** Auto-injected first ~200 lines at spawn (L1 HOT). > Update BEFORE every stop. Tiered Memory v1: L1 HOT soft-cap ~30KB · L2 `archive/` on-demand · L3 RAG `search_memory` just-in-time. Keep entry ≤ 1.5K chars (gotcha #53). > Full verbatim run history pre-S40 → git `d2f52ba` + `archive/2026-05-{runs,q2,q3,q4}.md`. --- ## 🎯 Role baseline Read-only CI/CD + post-deploy verifier SOLUTION_ERP. Polls Gitea Actions API, verifies test gate + deploy ship + prod health. Tools: Read, Grep, Glob, Bash, WebFetch + 5 RAG MCP. Output: PASS/FAIL + evidence <500 words. Skills: `iis-deploy-runbook` + `dependency-audit-erp` + `ef-core-migration`. Spawn ~150K — trade-off catch fail tự động. --- ## 🚨 Recurring CI/CD bug patterns (catch priority) - **#39 act_runner github.com TCP timeout** — run hang "Set up job" 21s. Log `dial tcp github.com:443 i/o timeout`. Fix: manual checkout bypass hardcoded `.gitea/workflows/deploy.yml` (pass #110). KHÔNG revert. - **#40 npm cache `tsc not found`** — `build_fe_admin` fail post `cache: npm`. DISABLED rolled back `a21790d`. KHÔNG re-enable. - **#41 paths-ignore docs-only skip** — code commit không trigger CI? Check `git diff --name-only HEAD~1 HEAD` vs `paths-ignore: ['docs/**','**/*.md','.claude/skills/**']`. Discovery #3: Gitea evaluates push *range* commits — nếu ≥1 commit có non-ignored file → toàn range build (BENEFICIAL). - **#25 IIS WebSocket** — `notification-hub/negotiate` 401/404 prod. Fix: WebSocket module enable `web.config` site api (skill `iis-deploy-runbook`). - **#48 SQLite tie-break** — `OrderByDescending(CreatedAt).First()` pick wrong khi 2+ `.Add()` cùng frozen-clock. Fix: discriminator filter `.Where(Summary.Contains("Chuyển phase"))` BEFORE OrderBy. - **Bundle hash unchanged = ship FAIL** — push+action success nhưng prod không đổi. Verify `curl -s https://admin.solutions.com.vn/ | grep -oE '/assets/index-[a-z0-9]+\.js'`. Fix: SSH `Restart-WebAppPool`. ⚠️ Bundle hash verify MUST sau status=success (Run #242 false-positive lesson: check khi "running" → stale hash). - **Migration drift prod vs repo** — compare `ls .../Persistence/Migrations/*.cs` vs `sqlcmd __EFMigrationsHistory`. Fix: check `Program.cs` `app.MigrateDatabase()` + app pool recycle. --- ## 📋 5-stage checklist (EVERY run) - **Stage 0 RAG infra:** `Get-Service Qdrant` Running + `http://localhost:6333/healthz`. Collection `proj_solution_erp` (prefix `proj_*` 7 project — Discovery #8). - **Stage 1 Push+filter:** `git log -1 --format='%H %s'` + `git log origin/main..HEAD` empty + diff vs paths-ignore (docs-only → SKIPPED-DOCS return). - **Stage 2 Gitea poll** (max 10 iter × 60s): API `.../actions/tasks?limit=5` (NOT `/runs` 404). Match `head_sha`. ⚠️ task table `updated_at` stale ~2min (gotcha #46) → cross-check VPS mtime. - **Stage 3 Test gate:** baseline **130 PASS** (58 Domain + 72 Infra). Phase 9 UAT exception lower OK (`feedback_uat_skip_verify`). - **Stage 4 Post-deploy** (if SUCCESS): auth login bearer (admin + nv.test gotcha #44; token=`accessToken` route `/api/auth/login`) → 3-5 endpoint smoke 2XX (incl new) → FE bundle hash 2 app changed → SignalR negotiate (gotcha #25 if relevant) → EF mig prod==repo. - **Stage 4.6 (S29 CRITICAL):** sqlcmd seed sample verify post-deploy (NOT chỉ schema). `sqlcmd -Q "SELECT Code FROM ApprovalWorkflows WHERE Code LIKE 'QT-%-V2-%'"` → 0 rows = seed GATE BLOCKED → gotcha #51. - Discovery #4: ASP.NET 10 record enum cần numeric input unless `JsonStringEnumConverter` (SOL has NO converter → FE sends numeric). #5: sqlcmd ssh Windows-auth cần `\\\\SQLEXPRESS` 4-backslash. #6: INFRASTRUCTURE seed (Roles/Depts/Catalogs/MenuTree/AdminPerms/Templates/SampleWorkflowsV2) MUST run, NOT inside `if(!demoSeedDisabled)`; DEMO seed (DemoUsers/Contracts/PE) OK gated → gotcha #51. - **Stage 5 Report** PASS/FAIL + evidence + MEMORY update. --- ## ⚠️ Anti-patterns (DO NOT) 1. ❌ Push fix code — READ only, escalate em main · 2. ❌ Speculate fail without log · 3. ❌ Skip post-deploy bundle hash (biggest catch) · 4. ❌ Skip MEMORY · 5. ❌ Poll forever (max 10 iter) · 6. ❌ Auto-rollback (escalate + recommend) · 7. ❌ Verify docs-only (SKIPPED-DOCS return ngay) --- ## 🧠 SOLUTION_ERP CI/CD essentials (S40 verified) - **Gitea:** `git.baocaogiaoduc.vn/vietreport-admin/solution-erp` · workflow `.gitea/workflows/deploy.yml` · paths-ignore `['docs/**','**/*.md','.claude/skills/**']` - **Prod:** api/admin/eoffice `.solutions.com.vn` · SSH `ssh vietreport-vps` (Administrator, id_ed25519) · IIS site phys paths (S42 verified): API `C:\inetpub\solution-erp\api` · admin `\fe-admin` · user `\fe-user` (3 sites Started). DB `.\SQLEXPRESS`/`SolutionErp`/`vrapp` SQL-auth. **Conn string key = `ConnectionStrings.Default` (NOT `DefaultConnection`!)** — read pw from prod appsettings.Production.json when `$env:PROD_DB_PASSWORD` empty. - **SSH→PS quoting (S42 lesson):** nested bash→ssh→powershell mangles `$var`/`\"`. Use `iconv UTF-16LE | base64` → `powershell -EncodedCommand $B64`. Single-quote literal paths. - **Tests baseline:** **228 PASS** (S56 Run #379 sha a20cde8; Domain 58 + Infra 170 = +12 golive-harden `ItTicketReassignAuthzTests`/`LeaveBalanceTests`/`WorkflowAppApproveV2Tests`/`DocxRendererTests` vs prev 216). CI gate runs both test projects BEFORE build/deploy → status=success ⟹ test gate passed (`tasks` endpoint reports terminal as `status:success`, `conclusion` field NOT populated). Local grep undercounts (Theory/InlineData) — trust CI conclusion. Phase 9 UAT mode skip per chunk OK. - **Mig latest repo:** **Mig 48 `20260609020759_AddProjectMasterFields`** (S55; AddColumn-only, Project +Year/Investor/Location/Package nullable, NO new table; kèm `SeedRealMasterDataAsync` ungated). Prev Mig 47 FilterMasterCatalog... + 46 AddSlaFieldsToItTicket. Path `src/Backend/SolutionErp.Infrastructure/Persistence/Migrations/`. Prod check `sqlcmd __EFMigrationsHistory ORDER BY MigrationId DESC TOP 5`. ⚠️ Table-count: `sys.tables` (is_ms_shipped=0) count = **93** (S56 Run #379 verified, Mig 48 col-only no delta); narrative also 93 now — reconciled. Don't FAIL on 92↔93 convention diff. - **Bearer:** admin `admin@solutions.com.vn/Admin@123456` (full) · UAT `nv.test@solutions.com.vn/TestUser@123456` (Drafter CCM, gotcha #44 check) - **Bundle hash live S56:** admin `4SUwDLD8` · user `XdKzt9LL` (Run #378/#379, FROZEN since S55 #378 FE redesign — BE-only #379 kept both unchanged ✓). Prev admin `B-d6893W` · user `XdKzt9LL` (#377). Bundle size ~800KB/750KB gz. ⚠️ S50 mid-deploy transient lesson: pre-success snapshot can show intermediate FE copy in-flight — re-confirm hash AFTER status=success ALWAYS (anti-pattern #3). - **DB pw (S42, when `$PROD_DB_PASSWORD` empty):** `vrapp/buKL3TGBkD0wDDbYVw65QeX9` read from `C:\inetpub\solution-erp\api\appsettings.Production.json`→`ConnectionStrings.Default`. ⚠️ Skill-doc path `C:\inetpub\apps\SolutionErp\Api` is STALE → real path `C:\inetpub\solution-erp\api`. sqlcmd over SSH works direct (no UTF-16 encode needed). ⚠️ sys-catalog string-concat queries hit collation conflict (`Latin1_General_CI_AS_KS_WS` vs `SQL_Latin1_General_CP1_CI_AS`) → add `COLLATE DATABASE_DEFAULT` per concatenated column. ## 🔑 Critical config (flag commit nếu tái xuất) Node CI `20.x` (`feedback_node_cicd`) · MediatR `12.4.1` (gotcha #1, flag `Version="14`) · Swashbuckle `6.9.0` (gotcha #2) · act_runner manual checkout (#39) · npm cache DISABLED (#40, flag `cache: npm`) --- ## 🎯 Per-NV admin opt-in wire — 10-point checklist (cumulative S22→S23) Cross-ref `feedback_per_nv_permission_scope`. Per-NV/per-Level refactor MUST verify: 1 Domain field · 2 EF `HasDefaultValue(false)` · 3 Mig 3-file · 4 Service read · 5 Domain+App DTO mirror · 6 Designer FE checkbox · 7 AwLevelDto+ToDto · 8 CreateAwLevelInput+Update mutation · 9 **Lookup discrimination** (`FirstOrDefault` ADD `ApproverUserId==actorId` + admin fallback) · 10 **Controller body record count == Command record count**. Bug latency 2-3 days prod silent khi miss 9-10. Scan `grep -n "FirstOrDefault.*Order.*==" *.cs` after OR-of-N refactor. ## 📊 Run stats baseline BE (test+build) ~90s · FE × 2 ~60s/app · deploy ~30s · **total ~3min code / 0s docs-only**. >5min → escalate. --- ## 📅 Recent runs (FIFO — older → archive/git) - **2026-06-11 Run #382 (run_number 268) sha=`5998163` PASS ~3m31s (S58 FIX the Run #381 lock NO-OP — DbInitializer.cs ONLY, BE-only, NO Mig/FE):** Push `dd117b7..5998163` 1 commit 1 file `DbInitializer.cs` (+28/-5). Fix: (1) `LockDemoSampleUsersAsync` union +20 UAT-matrix prod email (`{act,equ,fin,hra,pm,qs}.{nv,pp,tp}@`+`bod.{1,2}@`) into prior 14 named-person = 34-email list; (2) `DemoUserPassword` 11→12 chars (`User@123456`→`User@1234567`) fixing silent CreateAsync-fail vs prod `RequiredLength=12` (S56 helpdesk-inert root cause). `.cs` present → full pipeline RAN. Poll iter5 status=success (started 12:58:06 → 13:01:37). **Bundle FROZEN admin `CP4CB1ym` + user `BmZ3VHnm`** (= #381 UNCHANGED ✓ CORRECT for BE-only, verified AFTER status=success — NOT ship-fail). **NO migration** — prod `__EFMigrationsHistory` top = Mig 49 `AddWorkItemToPurchaseEvaluation` == repo, GIỮ NGUYÊN ✓. sys.tables=**93** unchanged. Health live/ready 200 + admin/eoffice root 200. **THE FIX VERIFIED prod (Users table — note: custom Identity table name `Users` NOT `AspNetUsers`):** total **55** users · **21 active** · **34 inactive==34 locked-future** (== lock-list size exactly). 12-sample UAT-matrix all `active=0 locked=1` (#381 NO-OP now RESOLVED — these exist in prod + got locked ✓). Named-person 14/14 found+locked (CREATED this startup via 12-char pw fix + locked same run). **Must-stay-active 6/6** admin·catalog.manager·nv.test·chuong.phan@solution.com.vn(typo-domain)·**nv.cao+nv.truong** ALL `active=1` (IT helpdesk pool ALIVE — S56 ops-pending RESOLVED by pw fix, created this startup not in lock-list). **5 new real staff** (thanh.lethanh/anh.nguyen/tring.le/truong.le/long.nguyen) all CREATED+`active=1` ✓ (12-char pw passes RequiredLength=12). Smoke nv.test login OK (token 477) + GET /api/menus 200 + /purchase-evaluations 200. 0 regression. **LESSON: lock-by-email NO-OP (#381) was a DATA-mismatch not code-bug → S58 reconciled email-list to actual prod population (UAT-matrix created via admin UI, never in seed) + the 11-vs-12 pw bug was a SECOND latent cause silently blocking ALL non-existing-user CREATE on prod (RequiredLength=12) — same fix resurrected 16 named + 5 staff + helpdesk pool. Verify lock-fix = dump Users cohorts (active/inactive split + named exact-IN), NOT just total count.** Tag `[s58, run382, pass, fix-lock-noop, pw-11to12, be-only-bundle-frozen]`. - **2026-06-11 Run #381 (run_number 267) sha=`dd117b7` PASS+1PARTIAL ~4m25s (S57bis PE gắn WorkItem Mig 49 + all-role Pe perm + menu Cá nhân regroup + lock-14-demo-user — cross-stack BE+FE×2+Mig+test, +12 PeWorkItemGuardTests→240):** 2-commit push: prev `17b23a4` (governance+hmw.js → Run #380 **cancelled**, superseded — correct, no FE/BE contract change) then `dd117b7` (PRODUCT, Run 381 = the deciding run). 26 files: Mig 49 `20260611044424_AddWorkItemToPurchaseEvaluation` (3-file, PE.WorkItemId Guid? loose-Guid NO physical FK + `IX_PurchaseEvaluations_WorkItemId`) + Domain `PurchaseEvaluation.cs` + Config + Features + DbInitializer (perm + `LockDemoSampleUsersAsync` + menu regroup) + MenuKeys + 3 master controllers (write-lock Admin/CatMgr) + FE×2 (PeDetailTabs/PeHeaderForm/PeWorkspaceCreateView/menuKeys/types). **Run IN-PROGRESS at first check (status=running 12:14) — polled to terminal** (12:14:16→12:18:41 ≈4m25s success). ⚠️ poll-grep gotcha: `"status"` field sits AFTER `"display_title"` in tasks JSON → `[^}]*"display_title"` regex cut before status (showed blank all 10 iters); final FULL-object parse `\{"id":381,...deploy.yml[^}]*\}` confirmed status=success. **Bundle ROTATE BOTH** admin `4SUwDLD8→CP4CB1ym` + user `XdKzt9LL→BmZ3VHnm` (PE in both apps ✓ shipped, verified AFTER status=success). **Mig 49 applied prod** (`__EFMigrationsHistory` top = AddWorkItem... ✓ + WorkItemId col=1 + IX=1). sys.tables=**93** (col-only, no delta). Health live/ready 200 + admin/eoffice 200. **Perm seed STRONG: Pe_* CanCreate=1 = 130 rows across 13 roles** (was 3-role → all-role open landed); PeWf%=0 + AwV2%=2 (designer stays admin-only ✓ no leak). Menu regroup ✓: Personal root@30 · Off_ChamCong→Personal@1 · **Hrm_Config**→Master@25 (spec said key `HrmConfig`, real key has underscore `Hrm_Config` — verify by ParentKey/Order NOT literal Key) · Contracts@31 · Hrm_Dashboard→Hrm@1. Smoke PE unauth 401 (/purchase-evaluations + /catalogs/work-items) vs control 404 (auth real). WorkItems VT/TP/MEP/TB=71. **⚠️ PARTIAL item 7 — lock-14-users is a prod NO-OP:** `LockDemoSampleUsersAsync` SHIPPED+RAN but its 14 hardcoded emails (`bod.huynh@`,`pm.nguyen@`,`fin.do@`,`qs.hoang@`...) **DON'T EXIST in prod** — real demo set uses dept.position scheme `bod.1@`/`bod.2@`/`pm.{nv,pp,tp}@`/`fin.{nv,pp,tp}@`/`qs.{nv,pp,tp}@` (34 users ALL active, INACTIVE_TOTAL=0). Each FindByEmail→null→locked=0. Guard `nv.cao`/`nv.truong` also absent (-1, vacuously safe); catalog.manager+admin confirmed active. NOT a deploy fail (code correct) — email list stale vs this DB seed. Escalated em main: reconcile lock-list to actual `*.{nv,pp,tp}@` scheme OR confirm named-person legacy users were ever seeded. **LESSON: lock/deactivate-by-email assertion returning 0/`-1` ⟹ ALWAYS dump actual `Users` set before scoring FAIL — code may have run as no-op against mismatched data, NOT broken.** Tag `[s57bis, run381, pass-partial, mig49-pe-workitem, allrole-perm-130, lock-noop-email-mismatch]`. - **2026-06-09 Run #379 (run_number 265) sha=`a20cde8` PASS ~4m20s (S56 GOLIVE-HARDEN BE fixes — LeaveBalance concurrency + ItTicket authz-order + DocxRenderer null-guard + tests, ZERO FE/Mig):** Push `bef5825..a20cde8` 1 commit 13 files: 3 BE `LeaveOtApprovalFeatures.cs` (atomic ExecuteUpdate + Serializable tx vs lost-update) + `WorkflowAppsFeatures.cs` (authz reorder Forbidden-before-NotFound) + `DocxRenderer.cs` (null-guard) + 4 test files (+12 → 216→**228**) + 6 agent-memory `.md`. `.cs`+test present → NOT docs-skip, full pipeline RAN. **Run IN-PROGRESS at first check (status=running 17:51) — correctly did NOT FAIL, polled to terminal** (started 17:51:45 → updated 17:56:05 ≈4m20s status=success iter5). **Bundle FROZEN admin `4SUwDLD8` + user `XdKzt9LL`** (= #378, UNCHANGED ✓ CORRECT for BE-only — NOT ship-fail, mirror Run #243/#368; verified pre-deploy + post-success + +5s re-confirm, NO transient, NO unexpected rotation). **NO migration** — prod `__EFMigrationsHistory` top = `20260609020759_AddProjectMasterFields` (Mig 48) == repo latest, GIỮ NGUYÊN ✓ (BE-logic-only, schema untouched). sys.tables=**93** unchanged. Health live+ready **200/200** + admin/eoffice root 200. **Smoke changed-area endpoints (all gated, none crash):** `GET /it-tickets/assignable-staff` unauth=**401** · `PUT /it-tickets/{guid}/assign` unauth+body=**401** (authz-reorder fix live, route wired) · `GET /leave-balances/my` unauth=**401** (concurrency fix dll deployed) · control fake `/it-tickets/zzz-not-a-route`=**404** (proves 401s are real auth gates not catch-all). 0 regression. **Ship-proof for BE-only no-contract-change = run success + test 228 + Mig 48 unchanged + bundle frozen + health 200** (no observable API delta — fixes are internal handler logic: atomic tx / exception order / null-guard; cannot curl-assert lost-update fix, rely on +12 tests passing in CI gate). Tag `[s56, run379, pass, golive-harden, be-only-bundle-frozen, no-mig]`. - **2026-06-09 (S56 pre-golive verify — NO deploy, read-only audit):** Re-verified prod truth at golive gate (HEAD `bef5825` docs-only → prod correctly = Run #378). build SolutionErp.slnx 0-err + fe-admin/fe-user npm build 0-TS each + test **216** (58D+158I) exact. Prod health live+ready 200; admin root serves `4SUwDLD8` / eoffice `XdKzt9LL` (== baseline, NO drift). `__EFMigrationsHistory` top = Mig 48 == repo; 92 tables. Master-data prod spot: Projects=70 (62 real+8 demo), CAL01.Investor=N'Công ty TNHH Calofic' exact, WorkItems real=71 (VT16/TP30/MEP9/TB16) of 86, Suppliers 3/3. **LESSON — local-vs-prod FE hash divergence is EXPECTED, not ship-fail:** fresh local `npm build` produces a DIFFERENT content-hash than CI-built prod artifact (node_modules/timestamp inputs not byte-reproducible) → load-bearing check is `prod-hash == documented-baseline`, NOT `== my-local-rebuild`. Don't false-alarm on local≠prod when HEAD unchanged. Tag [s56, pre-golive-verify, prod-truth-pass, local-vs-prod-hash-lesson]. - **2026-06-09 Run #378 (run_number 264) sha=`7feb53e` PASS ~4m24s (S55 Phase-1 FE-Admin VISUAL redesign density-first design-system NAMGROUP-ref keep brand #1F7DC1 — FE-ADMIN-ONLY, ZERO BE/Mig/fe-user):** Push `84fa638..7feb53e` 1 commit 15 files: 13 fe-admin (`index.css` design tokens + 6 ui primitives Button/Dialog/Input/Label/Select/Textarea + 6 shell DataTable/EmptyState/Layout/PageHeader/PhaseBadge/TopBar + DashboardPage) + 2 agent-memory `.md` (frontend-designer/reviewer). NO fe-user, NO `.cs`, NO Mig. `.tsx`/`.css` present → NOT docs-skip, pipeline RAN. **Run IN-PROGRESS at first check (status=running 11:51) — correctly did NOT FAIL, polled to terminal** (started 11:51:06 → updated 11:55:30 ≈4m24s status=success; updated_at froze 11:55:30 across 3 poll iters = terminal signal before status field parsed). **THE KEY PROOF — admin bundle ROTATE `B-d6893W→4SUwDLD8`** (✓ redesign shipped, verified AFTER status=success; pre-success snapshot 11:51 still showed OLD `B-d6893W` = anti-pattern #3 timing confirmed AGAIN; re-confirm +3s post-success = stable `4SUwDLD8`, NO transient this run). **fe-user bundle UNCHANGED `XdKzt9LL`** (= #377; untouched ✓ NOT ship-fail — correct, no fe-user file in commit). Admin root **200 text/html** + serves `