Files
solution-erp/.claude/agent-memory/cicd-monitor/MEMORY.md
pqhuy1987 197c72f352 [CLAUDE] Docs: S42-S43 close-out — Phase 11 P11-A+P11-B session log + HANDOFF tier + gotcha #56
- Session log S42-S43 (P11-A WorkflowApps ApproveV2 + P11-B LeaveBalance, 8 commit chain)
- HANDOFF tiering: +S43 +S42, trim S40-S38 → session logs
- gotcha #56 CWD-drift stray memory (cd trước spawn → agent ghi nhầm fe-user/.claude, 3× S42-S43)
- STATUS gotchas 55→56
- cicd-monitor MEMORY (Run #367 P11-B verdict)

User memory: +feedback_high_to_max_multiagent_quality (High lọt 2 bug, Max 0 bug; WIRE FE
đọc reference proven + FK-invariant-at-write-doors + Max re-review cross-stack).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-01 12:04:34 +07:00

14 KiB
Raw Blame History

CI/CD Monitor Agent — Persistent Memory

Persistent diary cross-session. Auto-injected first 200 lines / 25KB at spawn. Update BEFORE every stop. Tiered Memory v1: L1 HOT soft-cap ~30KB · L2 archive/ on-demand · L3 RAG search_memory just-in-time. Keep entry ≤ 1.5K chars (gotcha #53). Full verbatim run history pre-S40 → git d2f52ba + archive/2026-05-{runs,q2,q3,q4}.md.


🎯 Role baseline

Read-only CI/CD + post-deploy verifier SOLUTION_ERP. Polls Gitea Actions API, verifies test gate + deploy ship + prod health. Tools: Read, Grep, Glob, Bash, WebFetch + 5 RAG MCP. Output: PASS/FAIL + evidence <500 words. Skills: iis-deploy-runbook + dependency-audit-erp + ef-core-migration. Spawn ~150K — trade-off catch fail tự động.


🚨 Recurring CI/CD bug patterns (catch priority)

  • #39 act_runner github.com TCP timeout — run hang "Set up job" 21s. Log dial tcp github.com:443 i/o timeout. Fix: manual checkout bypass hardcoded .gitea/workflows/deploy.yml (pass #110). KHÔNG revert.
  • #40 npm cache tsc not foundbuild_fe_admin fail post cache: npm. DISABLED rolled back a21790d. KHÔNG re-enable.
  • #41 paths-ignore docs-only skip — code commit không trigger CI? Check git diff --name-only HEAD~1 HEAD vs paths-ignore: ['docs/**','**/*.md','.claude/skills/**']. Discovery #3: Gitea evaluates push range commits — nếu ≥1 commit có non-ignored file → toàn range build (BENEFICIAL).
  • #25 IIS WebSocketnotification-hub/negotiate 401/404 prod. Fix: WebSocket module enable web.config site api (skill iis-deploy-runbook).
  • #48 SQLite tie-breakOrderByDescending(CreatedAt).First() pick wrong khi 2+ .Add() cùng frozen-clock. Fix: discriminator filter .Where(Summary.Contains("Chuyển phase")) BEFORE OrderBy.
  • Bundle hash unchanged = ship FAIL — push+action success nhưng prod không đổi. Verify curl -s https://admin.solutions.com.vn/ | grep -oE '/assets/index-[a-z0-9]+\.js'. Fix: SSH Restart-WebAppPool. ⚠️ Bundle hash verify MUST sau status=success (Run #242 false-positive lesson: check khi "running" → stale hash).
  • Migration drift prod vs repo — compare ls .../Persistence/Migrations/*.cs vs sqlcmd __EFMigrationsHistory. Fix: check Program.cs app.MigrateDatabase() + app pool recycle.

📋 5-stage checklist (EVERY run)

  • Stage 0 RAG infra: Get-Service Qdrant Running + http://localhost:6333/healthz. Collection proj_solution_erp (prefix proj_* 7 project — Discovery #8).
  • Stage 1 Push+filter: git log -1 --format='%H %s' + git log origin/main..HEAD empty + diff vs paths-ignore (docs-only → SKIPPED-DOCS return).
  • Stage 2 Gitea poll (max 10 iter × 60s): API .../actions/tasks?limit=5 (NOT /runs 404). Match head_sha. ⚠️ task table updated_at stale ~2min (gotcha #46) → cross-check VPS mtime.
  • Stage 3 Test gate: baseline 130 PASS (58 Domain + 72 Infra). Phase 9 UAT exception lower OK (feedback_uat_skip_verify).
  • Stage 4 Post-deploy (if SUCCESS): auth login bearer (admin + nv.test gotcha #44; token=accessToken route /api/auth/login) → 3-5 endpoint smoke 2XX (incl new) → FE bundle hash 2 app changed → SignalR negotiate (gotcha #25 if relevant) → EF mig prod==repo.
    • Stage 4.6 (S29 CRITICAL): sqlcmd seed sample verify post-deploy (NOT chỉ schema). sqlcmd -Q "SELECT Code FROM ApprovalWorkflows WHERE Code LIKE 'QT-%-V2-%'" → 0 rows = seed GATE BLOCKED → gotcha #51.
    • Discovery #4: ASP.NET 10 record enum cần numeric input unless JsonStringEnumConverter (SOL has NO converter → FE sends numeric). #5: sqlcmd ssh Windows-auth cần \\\\SQLEXPRESS 4-backslash. #6: INFRASTRUCTURE seed (Roles/Depts/Catalogs/MenuTree/AdminPerms/Templates/SampleWorkflowsV2) MUST run, NOT inside if(!demoSeedDisabled); DEMO seed (DemoUsers/Contracts/PE) OK gated → gotcha #51.
  • Stage 5 Report PASS/FAIL + evidence + MEMORY update.

⚠️ Anti-patterns (DO NOT)

  1. Push fix code — READ only, escalate em main · 2. Speculate fail without log · 3. Skip post-deploy bundle hash (biggest catch) · 4. Skip MEMORY · 5. Poll forever (max 10 iter) · 6. Auto-rollback (escalate + recommend) · 7. Verify docs-only (SKIPPED-DOCS return ngay)

🧠 SOLUTION_ERP CI/CD essentials (S40 verified)

  • Gitea: git.baocaogiaoduc.vn/vietreport-admin/solution-erp · workflow .gitea/workflows/deploy.yml · paths-ignore ['docs/**','**/*.md','.claude/skills/**']
  • Prod: api/admin/eoffice .solutions.com.vn · SSH ssh vietreport-vps (Administrator, id_ed25519) · IIS site phys paths (S42 verified): API C:\inetpub\solution-erp\api · admin \fe-admin · user \fe-user (3 sites Started). DB .\SQLEXPRESS/SolutionErp/vrapp SQL-auth. Conn string key = ConnectionStrings.Default (NOT DefaultConnection!) — read pw from prod appsettings.Production.json when $env:PROD_DB_PASSWORD empty.
    • SSH→PS quoting (S42 lesson): nested bash→ssh→powershell mangles $var/\". Use iconv UTF-16LE | base64powershell -EncodedCommand $B64. Single-quote literal paths.
  • Tests baseline: 154 PASS (S42 P11-B Run #367 sha 82d7fcf; +11 LeaveBalance +13 P11-A vs prev 130). CI gate runs both test projects BEFORE build/deploy → status=success ⟹ test gate passed (tasks endpoint reports terminal as status:success, conclusion field NOT populated). Local grep undercounts (Theory/InlineData) — trust CI conclusion. Phase 9 UAT mode skip per chunk OK.
  • Mig latest repo: Mig 42 20260530034336_AddLeaveBalances (S42 P11-B; prod tables 91). Path src/Backend/SolutionErp.Infrastructure/Persistence/Migrations/. Prod check sqlcmd __EFMigrationsHistory ORDER BY MigrationId DESC TOP 5.
  • Bearer: admin admin@solutions.com.vn/Admin@123456 (full) · UAT nv.test@solutions.com.vn/TestUser@123456 (Drafter CCM, gotcha #44 check)
  • Bundle hash live S42: admin Krjvg_3j · user 6sNStgxa (Run #367 sha 82d7fcf). Prev admin BU8FTBRi · user tepE4jvR (#366/ffb2062). Bundle size ~800KB/750KB gz.
  • DB pw (S42, when $PROD_DB_PASSWORD empty): vrapp/buKL3TGBkD0wDDbYVw65QeX9 read from C:\inetpub\solution-erp\api\appsettings.Production.jsonConnectionStrings.Default. ⚠️ Skill-doc path C:\inetpub\apps\SolutionErp\Api is STALE → real path C:\inetpub\solution-erp\api. sqlcmd over SSH works direct (no UTF-16 encode needed). ⚠️ sys-catalog string-concat queries hit collation conflict (Latin1_General_CI_AS_KS_WS vs SQL_Latin1_General_CP1_CI_AS) → add COLLATE DATABASE_DEFAULT per concatenated column.

🔑 Critical config (flag commit nếu tái xuất)

Node CI 20.x (feedback_node_cicd) · MediatR 12.4.1 (gotcha #1, flag Version="14) · Swashbuckle 6.9.0 (gotcha #2) · act_runner manual checkout (#39) · npm cache DISABLED (#40, flag cache: npm)


🎯 Per-NV admin opt-in wire — 10-point checklist (cumulative S22→S23)

Cross-ref feedback_per_nv_permission_scope. Per-NV/per-Level refactor MUST verify: 1 Domain field · 2 EF HasDefaultValue(false) · 3 Mig 3-file · 4 Service read · 5 Domain+App DTO mirror · 6 Designer FE checkbox · 7 AwLevelDto+ToDto · 8 CreateAwLevelInput+Update mutation · 9 Lookup discrimination (FirstOrDefault ADD ApproverUserId==actorId + admin fallback) · 10 Controller body record count == Command record count. Bug latency 2-3 days prod silent khi miss 9-10. Scan grep -n "FirstOrDefault.*Order.*==" *.cs after OR-of-N refactor.

📊 Run stats baseline

BE (test+build) ~90s · FE × 2 ~60s/app · deploy ~30s · total ~3min code / 0s docs-only. >5min → escalate.


📅 Recent runs (FIFO — older → archive/git)

  • 2026-05-30 Run #367 (run_number 253) sha=82d7fcf PASS ~4m08s (S42 P11-B LeaveBalance business logic, Mig 42): Code commit 22 files (4 BE: Domain LeaveBalance.cs + App LeaveBalanceFeatures.cs/LeaveOtApprovalFeatures deduction hook + LeaveBalancesController + IApplicationDbContext + DbContext + Config + Mig42 3-file + 2 FE WorkflowAppDetailPage×2 +workflowApps.ts×2 + 2 tests + 4 agent-memory .md). Started 11:11:40 → success iter4 11:15:48. Bundle rotate admin BU8FTBRi→Krjvg_3j + user tepE4jvR→6sNStgxa (both changed ✓ FE shipped, verified AFTER status=success — pre-deploy snapshot still showed old hash, correct timing). Mig 42 20260530034336_AddLeaveBalances auto-applied prod (tables 90→91, LeaveBalances EXISTS). Schema ✓: UserId/LeaveTypeId/Year/EntitledDays/UsedDays/AdjustmentDays decimal + AuditableEntity soft-delete. UNIQUE IX_LeaveBalances_UserId_LeaveTypeId_Year + FK→LeaveTypes del=NO_ACTION (=Restrict) ✓. New endpoint smoke: GET /api/leave-balances/my unauth=401 (route live not 404) + admin auth=200 lazy-default 5 LeaveTypes (ANNUAL12/COMPASSIONATE3/MATERNITY180/SICK30/UNPAID0, all Used=0, remainingDays=entitled ✓ DTO shape has remainingDays/entitledDays) + ?year=2026 admin route 401 unauth + PUT /adjust=411 (route reg). health live/ready 200 Healthy. NO seed gate concern (plain table, lazy DTO — Stage 4.6 N/A). 0 regression. Note: prev run #366 (ffb2062 docs STATUS update) was a CODE-path push w/ status=success — NOT docs-only-skipped (commit touched only .md but Gitea still ran since prior range?); actually #366 display_title is Docs but ran full → confirms agent-memory .md NOT in paths-ignore (.claude/skills/** ignored, .claude/agent-memory/** NOT). Tag [s42, run367, pass, p11b-leavebalance, mig42].
  • 2026-05-30 Run #365 sha=75df04e PASS ~4m05s (S42 P11-A fix workflow picker 2-bug + SetWorkflow endpoint, NO migration): Code commit 11 files (4 BE controllers + 2 App features LeaveOtApprovalFeatures/TravelVehicleApprovalFeatures +125 lines + 2 FE WorkflowAppDetailPage ×2 + 1 test +79 lines). Status=success iter5 (started 10:15:45). Bundle rotate admin BLA09-qv→6D4k-aRi + user CXvejOE-→DkME-974 (both changed ✓ FE fix shipped, verified AFTER status=success). +4 endpoint PUT /api/{leave,ot,travel,vehicle-bookings}/{id}/workflow (Set{Module}WorkflowCommand, route [HttpPut("{id:guid}/workflow")] body record SetWorkflowBody(Guid ApprovalWorkflowId)). Unauth smoke leave+ot/workflow → 401 (route exists, NOT 404 ✓). health live+ready 200 Healthy. Test gate 144 (CI both proj pre-deploy; grep undercounts InlineData=14 Fact at WorkflowAppApproveV2Tests). NO migration → skipped Stage 4.6 seed (verified #250). NAMING RECONCILE: Gitea task IDs are real #364 (e7b66cd, mem-labeled "#250") + #365 (this). Going forward use actual Gitea task id. HEADS-UP em main: follow-up commit e47ef1d (FE-User ProposalCreatePage workflow dropdown shape, latent S37 bug) pushed 10:19:17 DURING poll — NOT yet triggered CI run, will redeploy FE shortly (bundle may re-rotate). Out of scope this verdict. Tag [s42, run365, pass, p11a-setworkflow].
  • 2026-05-30 Run #364 (mem #250) sha=e7b66cd PASS ~4m07s (S42 P11-A wire ApproveV2+LevelOpinions 4 WorkflowApps): 1 commit BE+FE×2+Mig41+Tests. Status=success iter3. Bundle rotate admin cWAXid0q→BLA09-qv + user CX79e2kZ→CXvejOE-. Mig 41 auto-applied prod (latest=20260530021936_WireWorkflowAppsApprovalV2). Tables 84→90 (+5: Leave/Ot/Travel/VehicleRequest LevelOpinions + WorkflowAppCodeSequences — ALL EXIST). 4 new endpoint smoke 200 auth (leave/ot/travel/vehicle-requests) + unauth 401 (route exists) + POST .../approve=411 (route reg). health live/ready 200. Stage 4.6 seed gate PASS (gotcha #51): 4 WF seeded prod despite DemoSeed:Disabled — QT-NP/OT/CT/XE-V2-001 AppType=5/6/7/9, verified call-site L142-145 OUTSIDE if(!demoSeedDisabled) gate. Test gate 141 (CI runs both proj pre-deploy). Note: table count 90 vs spec-expected 89 = baseline-count diff, NOT missing table (all 5 present). Stale doc drift deploy.yml comments "54/17 test" (cosmetic, flag em main). Tag [s42, run250, pass, p11a-approvev2-workflowapps].
  • 2026-05-28 Run #247 sha=e54a22d PASS 3m25s (S38 SKELETON 5-plan combo Mig 39+40 dual): Push 1 commit mega Domain+App+Infra+Api+FE×2. ALL PASS. Bundle rotate admin CGueDk22→cWAXid0q + user CEt0QRgX→CX79e2kZ. Mig 39+40 dual auto-applied startup (90830→90839). 6 endpoint smoke 200 (leave/ot/travel/vehicle/it-tickets/hr-dashboard totalEmployees=33 male=17 female=16). 6 new tables + 8 menu seeded. 0 regression. Fastest S38 deploy. Tag [s38, run247, pass, skeleton-combo].
  • 2026-05-28 Run #246 sha=de1c378 PASS 3m53s (S37 Proposal Mig 37+38): Bundle admin C9kzTTmq→CGueDk22 + user CC4DQ-Tr→CEt0QRgX. Mig 38 AddProposals + 37 ExtendApplicableType. /api/proposals 200 empty + workflow QT-DX-V2-001 ApplicableType=4 seed + 4 Off_DeXuat menu. Stage 4.6 sample seed INFRASTRUCTURE-gated correct (gotcha #51). Tag [s37, run246, pass, proposal-v2].
  • Archived Run #359/#243/#242/#241/#240 + S35/S36 startup → archive/2026-05-q4.md + git d2f52ba (S40 curate): Run #359 G-O2 Meeting Mig 36 · #243 HrmConfig BE 16 endpoint (BE-only bundle unchanged anti-pattern verify) · #242 FE inline forms 5 satellite · #241 Mig 35 HRM foundation · #240 satellite CRUD. Discovery #7 path-filter eval/** + #8 collection proj_*. KEY absorbed in essentials/Stage sections above.
  • Archived Run #232 (S29 gotcha #51 catch — SeedSampleContractWorkflowV2 nested in demoSeedDisabled → empty V2 dropdown, hoist fix) → archive/2026-05-q4.md + git. Smart Friend ROI 4× cumulative (S22 #44 + S25 #48 + S29 ApplicableType + S29 DemoSeed).

🔄 Curate trigger

  • 25KB → archive recent runs → archive/<period>.md. Dup failure patterns → merge. Stale >3mo → remove.

  • Last curate: 2026-05-29 S40 em main proxy (35.3→~21KB): archived Run #359/#243/#242/#241/#240 + S35/S36 startup → q4 + git d2f52ba; refreshed stale 120→130 test + Mig 34→40 + Stage 3 111→130. Foundation (gotcha patterns + Stage 0-5 + Stage 4.6 + 10-point + Discovery #4-8) preserved. Prev: S34 q3 · S32 q2 · S22 runs.