Files
solution-erp/docs/guides/runbook.md
pqhuy1987 f3fb3fd565 [CLAUDE] Phase5 prep: production infra + deploy scripts + 4 guides + FE refresh token
Backend production infra:
- Packages: Serilog.Sinks.File, HealthChecks.EntityFrameworkCore (RateLimiting built-in .NET 10)
- appsettings.Production.json MOI: placeholder __SET_VIA_SECRETS__, AllowedOrigins, Serilog File sink rolling daily retention 30d, RateLimit config
- appsettings.json + Development.json: them Serilog WriteTo Console
- Program.cs REWRITE:
  - Serilog ReadFrom.Configuration (prod file / dev console)
  - Rate limiter: policy auth-login 5/min/IP (AuthController.Login) + GlobalLimiter 300/min/IP
  - Health checks: /health/live liveness (empty predicate) + /health/ready DB probe (AddDbContextCheck)
  - HSTS production 1 year
  - CORS origins from config AllowedOrigins (default dev 2 localhost)
- AuthController.Login gắn [EnableRateLimiting("auth-login")]

Deploy scripts:
- scripts/deploy-iis.ps1: stop pool → backup current → clean+extract artifact → start pool → health check loop 30s timeout → rollback instruction if fail
- scripts/backup-sql.ps1: BACKUP DATABASE voi INIT+COMPRESSION+CHECKSUM + retention 30d auto cleanup
- .gitea/workflows/deploy.yml MOI: 4 job build BE (Windows) + build 2 FE (Ubuntu, pin .nvmrc 20) + deploy-iis qua WinRM PSSession (secrets IIS_HOST/USER/PASSWORD/JWT_SECRET/DB_CONNECTION)

Docs guides MOI (4 file):
- deployment-iis.md: prereqs (IIS features, Hosting Bundle, SQL, WinRM) + setup lan dau (app pool, 3 site, HTTPS win-acme, user-secrets) + deploy hang ngay (CI/CD + manual) + rollback + monitoring + troubleshooting + SPA web.config sample
- cicd.md: pipeline overview 4 job, secrets setup, runner Windows+Ubuntu, branch strategy, build optimizations, common CI/CD issues
- security-checklist.md: OWASP top 10 2021 mapping voi status + pre go-live checklist + incident response
- runbook.md: daily ops (health/logs), restart/rollback, DB backup/restore/migration revert, user management (reset password, unlock, disable), monitoring (CPU/disk/connection pool), deployment checklist, common gotcha

Frontend refresh token (ca 2 app fe-admin + fe-user):
- lib/api.ts REWRITE: them REFRESH_KEY, axios response interceptor 401 → POST /auth/refresh → retry request goc. Queue pattern cho nhieu request song song chi 1 refresh call chay. Skip retry /auth/login + /auth/refresh tranh infinite loop. _retry flag tren original config.
- contexts/AuthContext.tsx: luu+xoa REFRESH_KEY trong login/logout

E2E verified:
- GET /health/live → 200 Healthy
- GET /health/ready → 200 Healthy (DB probe)
- Rate limit flood 7 POST /auth/login → #1-5 HTTP 400 (cred sai) + #6-7 HTTP 429 Too Many Requests 
- TS check fe-admin + fe-user → pass
- dotnet build → 0 errors

Docs updates:
- docs/STATUS.md: Phase 5 prep done, next Phase 5 deploy production + Phase 5.1 security hardening, cumulative stats 8 commits
- docs/HANDOFF.md: phase table them Phase 5 prep row, file tree update voi guides + scripts + workflows, git state commit 8
- docs/changelog/migration-todos.md: tick Phase 5 prep items (12 items done) + Phase 5 deploy items remaining + Phase 5.1 security hardening list
- docs/changelog/sessions/2026-04-21-1530-phase5-prep.md: session log chi tiet

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 12:57:12 +07:00

6.2 KiB

Runbook — Operations

Tác vụ vận hành thường gặp. Copy-paste command khi cần.

1. Daily operations

1.1 Check health

Invoke-WebRequest https://api.solutionerp.local/health/ready -SkipCertificateCheck
# → Status 200 "Healthy"

1.2 Check logs

# Tail log hôm nay
Get-Content "C:\inetpub\solution-erp\api\logs\solution-erp-$(Get-Date -Format 'yyyyMMdd').log" -Tail 50 -Wait

# Grep error
Select-String -Path "C:\inetpub\solution-erp\api\logs\*.log" -Pattern "ERR|FTL" -Context 2

1.3 Check recent failed logins

-- Nếu có audit log (Phase 5.1). Hiện chỉ có ContractApprovals → check Serilog file.

2. Restart / rollback

2.1 Restart Api app pool

Restart-WebAppPool -Name SolutionErpApi

2.2 Restart toàn bộ IIS (nặng, chỉ khi cần)

iisreset /noforce

2.3 Rollback deploy

# Deploy script auto-backup vào C:\inetpub\solution-erp\backups\api-{timestamp}
Stop-WebAppPool SolutionErpApi
$latest = Get-ChildItem "C:\inetpub\solution-erp\backups" | Sort-Object Name -Descending | Select-Object -First 1
Copy-Item "$($latest.FullName)\*" -Destination "C:\inetpub\solution-erp\api\" -Recurse -Force
Start-WebAppPool SolutionErpApi
Invoke-WebRequest https://api.solutionerp.local/health/ready -SkipCertificateCheck  # verify

3. Database

3.1 Manual backup (ngoài daily job)

.\scripts\backup-sql.ps1 -Server "." -Database "SolutionErp" -BackupDir "D:\Backups\SolutionErp-manual"

3.2 Restore từ backup

-- WARNING: Destructive. Stop app trước.
USE master;
ALTER DATABASE SolutionErp SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
RESTORE DATABASE SolutionErp
FROM DISK = N'D:\Backups\SolutionErp\SolutionErp_20260421-020000.bak'
WITH REPLACE, RECOVERY;
ALTER DATABASE SolutionErp SET MULTI_USER;

3.3 Rollback migration

# List migrations đã apply
cd C:\Deploy\staging  # nơi có .NET SDK
dotnet ef migrations list --project src\Backend\SolutionErp.Infrastructure --startup-project src\Backend\SolutionErp.Api

# Rollback về migration cụ thể
dotnet ef database update <PreviousMigrationName> --project ... --startup-project ...

3.4 Clear test data (dev)

-- Clear toàn bộ Contracts + related (giữ master data)
DELETE FROM ContractApprovals;
DELETE FROM ContractComments;
DELETE FROM ContractAttachments;
DELETE FROM Contracts;
DELETE FROM ContractCodeSequences;

4. User management

4.1 Tạo user mới

-- Phase 5.1 có FE, hiện manual qua SQL (không khuyến khích — password hash phải đúng format)
-- Recommend: tạo qua UserManager trong 1 script .NET, hoặc API `POST /api/users` (chưa implement)

4.2 Reset password admin (emergency)

# Run script one-off trên server
cd C:\inetpub\solution-erp\api
dotnet SolutionErp.Api.dll --reset-password admin@solutionerp.local NewPassword@2026
# (Feature chưa có — Phase 5.1)

Temporary workaround: update PasswordHash qua Identity UserManager trong code, redeploy.

4.3 Unlock account bị lock

UPDATE Users SET LockoutEnd = NULL, AccessFailedCount = 0 WHERE Email = 'user@example.com';

4.4 Disable user

UPDATE Users SET IsActive = 0 WHERE Email = 'user@example.com';
-- Note: JWT hiện tại vẫn valid tới hết expiry (1h) — Phase 5.1 cần check IsActive trong middleware

5. Monitoring + incident

5.1 High CPU app pool

# Identify worker process
Get-Process w3wp | Select-Object Id, CPU, WorkingSet64, StartTime
# Kill nếu stuck (IIS tự restart)
Stop-Process -Id <pid> -Force

5.2 Out of disk

# Check logs folder
Get-ChildItem "C:\inetpub\solution-erp\api\logs" | Sort-Object LastWriteTime | Select -First 20
# Delete logs cũ hơn 30 ngày (đã config retention nhưng check)
Get-ChildItem "C:\inetpub\solution-erp\api\logs" -Filter "*.log" |
    Where-Object LastWriteTime -lt (Get-Date).AddDays(-30) | Remove-Item

5.3 Suspected brute-force attack

# Grep 401 qua IIS log
Get-Content C:\inetpub\logs\LogFiles\W3SVC1\*.log -Tail 5000 |
    Select-String " 401 " | Group-Object { ($_ -split ' ')[8] } |
    Sort-Object Count -Descending | Select -First 10
# Nếu thấy IP suspicious → block IIS IP Restriction hoặc firewall rule

5.4 DB connection pool exhausted

-- Check active connections
SELECT DB_NAME(dbid) AS DB, COUNT(*) AS Connections, loginame AS Login
FROM sys.sysprocesses
WHERE dbid > 0
GROUP BY dbid, loginame
ORDER BY 2 DESC;

-- Kill connection cụ thể nếu stuck
KILL <spid>;

6. Deployment checklist

Trước khi deploy:

  • Backup DB (manual nếu chưa auto chạy)
  • Note commit SHA đang live
  • Check CI/CD passed all checks
  • Notify team trong Slack/Teams (nếu có downtime)

Sau deploy:

  • Health check /health/ready → 200
  • Smoke test: login + list HĐ + export Excel
  • Check log 5 phút đầu không có ERR
  • Monitor CPU/RAM 15 phút

7. Common "gotcha" vận hành

Symptom Fix
App pool crash rapid fail sau deploy Disable temp: Set-ItemProperty IIS:\AppPools\SolutionErpApi -Name failure.rapidFailProtection -Value false — debug log → enable lại
User bị logout mass sau deploy Check Jwt:Secret có đổi không — rotate secret → buộc mọi user login lại (expected nếu intentional)
Migration fail "connection string" Check user secrets / env var chưa set trong app pool advanced settings
FE trắng trang F12 console check path — thường do base trong vite.config.ts khác env, hoặc missing web.config SPA rewrite
Export Excel 500 Check wwwroot/templates có đủ 5 file .docx/.xlsx không — ClosedXML fail khi template missing

8. Escalation contacts

Role Name Contact
Dev lead pqhuy@solutions.local pqhuy1987@gmail.com
DBA TBD
On-call 24/7 TBD

9. Liên quan