Files
solution-erp/docs/guides/runbook.md
pqhuy1987 f3fb3fd565 [CLAUDE] Phase5 prep: production infra + deploy scripts + 4 guides + FE refresh token
Backend production infra:
- Packages: Serilog.Sinks.File, HealthChecks.EntityFrameworkCore (RateLimiting built-in .NET 10)
- appsettings.Production.json MOI: placeholder __SET_VIA_SECRETS__, AllowedOrigins, Serilog File sink rolling daily retention 30d, RateLimit config
- appsettings.json + Development.json: them Serilog WriteTo Console
- Program.cs REWRITE:
  - Serilog ReadFrom.Configuration (prod file / dev console)
  - Rate limiter: policy auth-login 5/min/IP (AuthController.Login) + GlobalLimiter 300/min/IP
  - Health checks: /health/live liveness (empty predicate) + /health/ready DB probe (AddDbContextCheck)
  - HSTS production 1 year
  - CORS origins from config AllowedOrigins (default dev 2 localhost)
- AuthController.Login gắn [EnableRateLimiting("auth-login")]

Deploy scripts:
- scripts/deploy-iis.ps1: stop pool → backup current → clean+extract artifact → start pool → health check loop 30s timeout → rollback instruction if fail
- scripts/backup-sql.ps1: BACKUP DATABASE voi INIT+COMPRESSION+CHECKSUM + retention 30d auto cleanup
- .gitea/workflows/deploy.yml MOI: 4 job build BE (Windows) + build 2 FE (Ubuntu, pin .nvmrc 20) + deploy-iis qua WinRM PSSession (secrets IIS_HOST/USER/PASSWORD/JWT_SECRET/DB_CONNECTION)

Docs guides MOI (4 file):
- deployment-iis.md: prereqs (IIS features, Hosting Bundle, SQL, WinRM) + setup lan dau (app pool, 3 site, HTTPS win-acme, user-secrets) + deploy hang ngay (CI/CD + manual) + rollback + monitoring + troubleshooting + SPA web.config sample
- cicd.md: pipeline overview 4 job, secrets setup, runner Windows+Ubuntu, branch strategy, build optimizations, common CI/CD issues
- security-checklist.md: OWASP top 10 2021 mapping voi status + pre go-live checklist + incident response
- runbook.md: daily ops (health/logs), restart/rollback, DB backup/restore/migration revert, user management (reset password, unlock, disable), monitoring (CPU/disk/connection pool), deployment checklist, common gotcha

Frontend refresh token (ca 2 app fe-admin + fe-user):
- lib/api.ts REWRITE: them REFRESH_KEY, axios response interceptor 401 → POST /auth/refresh → retry request goc. Queue pattern cho nhieu request song song chi 1 refresh call chay. Skip retry /auth/login + /auth/refresh tranh infinite loop. _retry flag tren original config.
- contexts/AuthContext.tsx: luu+xoa REFRESH_KEY trong login/logout

E2E verified:
- GET /health/live → 200 Healthy
- GET /health/ready → 200 Healthy (DB probe)
- Rate limit flood 7 POST /auth/login → #1-5 HTTP 400 (cred sai) + #6-7 HTTP 429 Too Many Requests 
- TS check fe-admin + fe-user → pass
- dotnet build → 0 errors

Docs updates:
- docs/STATUS.md: Phase 5 prep done, next Phase 5 deploy production + Phase 5.1 security hardening, cumulative stats 8 commits
- docs/HANDOFF.md: phase table them Phase 5 prep row, file tree update voi guides + scripts + workflows, git state commit 8
- docs/changelog/migration-todos.md: tick Phase 5 prep items (12 items done) + Phase 5 deploy items remaining + Phase 5.1 security hardening list
- docs/changelog/sessions/2026-04-21-1530-phase5-prep.md: session log chi tiet

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 12:57:12 +07:00

196 lines
6.2 KiB
Markdown

# Runbook — Operations
> Tác vụ vận hành thường gặp. Copy-paste command khi cần.
## 1. Daily operations
### 1.1 Check health
```powershell
Invoke-WebRequest https://api.solutionerp.local/health/ready -SkipCertificateCheck
# → Status 200 "Healthy"
```
### 1.2 Check logs
```powershell
# Tail log hôm nay
Get-Content "C:\inetpub\solution-erp\api\logs\solution-erp-$(Get-Date -Format 'yyyyMMdd').log" -Tail 50 -Wait
# Grep error
Select-String -Path "C:\inetpub\solution-erp\api\logs\*.log" -Pattern "ERR|FTL" -Context 2
```
### 1.3 Check recent failed logins
```sql
-- Nếu có audit log (Phase 5.1). Hiện chỉ có ContractApprovals → check Serilog file.
```
## 2. Restart / rollback
### 2.1 Restart Api app pool
```powershell
Restart-WebAppPool -Name SolutionErpApi
```
### 2.2 Restart toàn bộ IIS (nặng, chỉ khi cần)
```powershell
iisreset /noforce
```
### 2.3 Rollback deploy
```powershell
# Deploy script auto-backup vào C:\inetpub\solution-erp\backups\api-{timestamp}
Stop-WebAppPool SolutionErpApi
$latest = Get-ChildItem "C:\inetpub\solution-erp\backups" | Sort-Object Name -Descending | Select-Object -First 1
Copy-Item "$($latest.FullName)\*" -Destination "C:\inetpub\solution-erp\api\" -Recurse -Force
Start-WebAppPool SolutionErpApi
Invoke-WebRequest https://api.solutionerp.local/health/ready -SkipCertificateCheck # verify
```
## 3. Database
### 3.1 Manual backup (ngoài daily job)
```powershell
.\scripts\backup-sql.ps1 -Server "." -Database "SolutionErp" -BackupDir "D:\Backups\SolutionErp-manual"
```
### 3.2 Restore từ backup
```sql
-- WARNING: Destructive. Stop app trước.
USE master;
ALTER DATABASE SolutionErp SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
RESTORE DATABASE SolutionErp
FROM DISK = N'D:\Backups\SolutionErp\SolutionErp_20260421-020000.bak'
WITH REPLACE, RECOVERY;
ALTER DATABASE SolutionErp SET MULTI_USER;
```
### 3.3 Rollback migration
```powershell
# List migrations đã apply
cd C:\Deploy\staging # nơi có .NET SDK
dotnet ef migrations list --project src\Backend\SolutionErp.Infrastructure --startup-project src\Backend\SolutionErp.Api
# Rollback về migration cụ thể
dotnet ef database update <PreviousMigrationName> --project ... --startup-project ...
```
### 3.4 Clear test data (dev)
```sql
-- Clear toàn bộ Contracts + related (giữ master data)
DELETE FROM ContractApprovals;
DELETE FROM ContractComments;
DELETE FROM ContractAttachments;
DELETE FROM Contracts;
DELETE FROM ContractCodeSequences;
```
## 4. User management
### 4.1 Tạo user mới
```sql
-- Phase 5.1 có FE, hiện manual qua SQL (không khuyến khích — password hash phải đúng format)
-- Recommend: tạo qua UserManager trong 1 script .NET, hoặc API `POST /api/users` (chưa implement)
```
### 4.2 Reset password admin (emergency)
```powershell
# Run script one-off trên server
cd C:\inetpub\solution-erp\api
dotnet SolutionErp.Api.dll --reset-password admin@solutionerp.local NewPassword@2026
# (Feature chưa có — Phase 5.1)
```
Temporary workaround: update `PasswordHash` qua Identity `UserManager` trong code, redeploy.
### 4.3 Unlock account bị lock
```sql
UPDATE Users SET LockoutEnd = NULL, AccessFailedCount = 0 WHERE Email = 'user@example.com';
```
### 4.4 Disable user
```sql
UPDATE Users SET IsActive = 0 WHERE Email = 'user@example.com';
-- Note: JWT hiện tại vẫn valid tới hết expiry (1h) — Phase 5.1 cần check IsActive trong middleware
```
## 5. Monitoring + incident
### 5.1 High CPU app pool
```powershell
# Identify worker process
Get-Process w3wp | Select-Object Id, CPU, WorkingSet64, StartTime
# Kill nếu stuck (IIS tự restart)
Stop-Process -Id <pid> -Force
```
### 5.2 Out of disk
```powershell
# Check logs folder
Get-ChildItem "C:\inetpub\solution-erp\api\logs" | Sort-Object LastWriteTime | Select -First 20
# Delete logs cũ hơn 30 ngày (đã config retention nhưng check)
Get-ChildItem "C:\inetpub\solution-erp\api\logs" -Filter "*.log" |
Where-Object LastWriteTime -lt (Get-Date).AddDays(-30) | Remove-Item
```
### 5.3 Suspected brute-force attack
```powershell
# Grep 401 qua IIS log
Get-Content C:\inetpub\logs\LogFiles\W3SVC1\*.log -Tail 5000 |
Select-String " 401 " | Group-Object { ($_ -split ' ')[8] } |
Sort-Object Count -Descending | Select -First 10
# Nếu thấy IP suspicious → block IIS IP Restriction hoặc firewall rule
```
### 5.4 DB connection pool exhausted
```sql
-- Check active connections
SELECT DB_NAME(dbid) AS DB, COUNT(*) AS Connections, loginame AS Login
FROM sys.sysprocesses
WHERE dbid > 0
GROUP BY dbid, loginame
ORDER BY 2 DESC;
-- Kill connection cụ thể nếu stuck
KILL <spid>;
```
## 6. Deployment checklist
Trước khi deploy:
- [ ] Backup DB (manual nếu chưa auto chạy)
- [ ] Note commit SHA đang live
- [ ] Check CI/CD passed all checks
- [ ] Notify team trong Slack/Teams (nếu có downtime)
Sau deploy:
- [ ] Health check `/health/ready` → 200
- [ ] Smoke test: login + list HĐ + export Excel
- [ ] Check log 5 phút đầu không có ERR
- [ ] Monitor CPU/RAM 15 phút
## 7. Common "gotcha" vận hành
| Symptom | Fix |
|---|---|
| App pool crash rapid fail sau deploy | Disable temp: `Set-ItemProperty IIS:\AppPools\SolutionErpApi -Name failure.rapidFailProtection -Value false` — debug log → enable lại |
| User bị logout mass sau deploy | Check Jwt:Secret có đổi không — rotate secret → buộc mọi user login lại (expected nếu intentional) |
| Migration fail "connection string" | Check user secrets / env var chưa set trong app pool advanced settings |
| FE trắng trang | F12 console check path — thường do `base` trong vite.config.ts khác env, hoặc missing web.config SPA rewrite |
| Export Excel 500 | Check `wwwroot/templates` có đủ 5 file .docx/.xlsx không — ClosedXML fail khi template missing |
## 8. Escalation contacts
| Role | Name | Contact |
|---|---|---|
| Dev lead | pqhuy@solutions.local | pqhuy1987@gmail.com |
| DBA | TBD | — |
| On-call 24/7 | TBD | — |
## 9. Liên quan
- [`deployment-iis.md`](deployment-iis.md) — setup chi tiết
- [`cicd.md`](cicd.md) — CI/CD pipeline
- [`security-checklist.md`](security-checklist.md) — incident response
- [`../gotchas.md`](../gotchas.md) — bẫy dev + ops
- [`../database/database-guide.md`](../database/database-guide.md) — backup/restore detail