Files
solution-erp/.claude/skills/iis-deploy-runbook/SKILL.md
pqhuy1987 66c1a5c170
All checks were successful
Deploy SOLUTION_ERP / build-deploy (push) Successful in 2m52s
[CLAUDE] Rebrand: 3 domain huypham.vn → solutions.com.vn + migrate script
User request: anh trỏ 3 subdomain mới về VPS IP 103.124.94.38:
  - api.huypham.vn        → api.solutions.com.vn
  - admin.huypham.vn      → admin.solutions.com.vn
  - user.huypham.vn       → eoffice.solutions.com.vn

Verified DNS: cả 3 resolve 103.124.94.38 ✓

Update 17 file repo:
FE (4): fe-admin/.env.production + fe-user/.env.production
        (VITE_API_BASE_URL → https://api.solutions.com.vn)
        fe-admin/src/lib/{api,realtime}.ts + fe-user equivalents (comment)
BE (1): appsettings.Production.json.example — CORS AllowedOrigins
CI/CD (1): .gitea/workflows/deploy.yml — smoke test URL
Scripts (3): setup-iis-sites (DomainApi/Admin/User), setup-ssl (3 host),
             deploy-all (verify curls)
Docs (5): STATUS, HANDOFF, PROJECT-MAP, vps-setup, gotchas
Skill (1): iis-deploy-runbook — 3 site table + description
Email admin@huypham.vn giữ nguyên (Let's Encrypt contact — không phải
domain serve).

Thêm scripts/migrate-domains.ps1 — 1-shot VPS migration:
  1. Pre-flight: resolve DNS 3 domain → verify IP VPS khớp
  2. Add HTTP binding mới cho 3 IIS site (giữ binding cũ làm fallback)
  3. Run win-acme xin 3 cert Let's Encrypt qua HTTP-01 challenge
     (auto add HTTPS binding + http→https redirect)
  4. Verify /health/live + /health/ready + 2 FE endpoint
  5. (Optional -RemoveOld) xóa binding huypham.vn sau verify OK
Rollback: nếu fail, binding cũ vẫn active → site serve qua huypham.vn.

Anh chạy trên VPS:
  cd C:\solution-erp\scripts  ;  .\migrate-domains.ps1
  # Sau 1-2 ngày verify stable:
  .\migrate-domains.ps1 -RemoveOld -SkipCert
2026-04-24 09:43:05 +07:00

364 lines
13 KiB
Markdown

---
name: iis-deploy-runbook
description: Ops runbook cho SOLUTION_ERP deploy trên Windows Server IIS — 3 site (api/admin/eoffice.solutions.com.vn), win-acme Let's Encrypt, NSSM gitea-runner shared với VIETREPORT, LibreOffice soffice headless. Dùng khi debug 500/502 prod, restart site, rotate cert, fix CI/CD runner, troubleshoot WebSocket, thêm site mới.
when-to-use:
- "prod 500 error"
- "IIS site fail"
- "cert hết hạn"
- "win-acme"
- "gitea runner"
- "deploy IIS"
- "restart app pool"
- "webSocket 500"
- "reverse proxy FE"
- "LibreOffice prod"
---
# IIS Deploy Runbook — SOLUTION_ERP
> **Context:** VPS Windows Server shared với VIETREPORT project. IIS + URL Rewrite + ARR + WebSockets module + win-acme. Deploy qua Gitea Actions self-hosted runner.
## Production topology
```
Internet
│ 443 (HTTPS)
┌─────────────────────────────────────────────────────┐
│ IIS (Windows Server VPS) │
│ │
│ ┌─ api.solutions.com.vn ─┐ ┌─ admin.solutions.com.vn ─┐ ┌─ eoffice.solutions.com.vn ─┐
│ │ SolutionErp-Api │ │ SolutionErp-Admin │ │ SolutionErp-User │
│ │ → out-of-process │ │ (static SPA, URL │ │ (static SPA, URL │
│ │ Kestrel :5443 │ │ Rewrite /api → 5443)│ │ Rewrite...) │
│ │ ASP.NET Core 10 │ │ React build/ │ │ React build/ │
│ │ │ │ │ │ │
│ └────────────────────┘ └────────────────────┘ └────────────────────┘
│ │
│ Let's Encrypt (win-acme) — 3 cert auto-renew 60d │
│ Shared gitea-runner NSSM service (with VIETREPORT) │
│ LibreOffice 25.8.6 headless │
│ SQL Server 2019 Express (\\.\SQLEXPRESS) │
└─────────────────────────────────────────────────────┘
```
## 3 IIS sites
| Site | Binding | Physical path | Apool | Purpose |
|---|---|---|---|---|
| `SolutionErp-Api` | `*:443:api.solutions.com.vn` HTTPS | `C:\inetpub\apps\SolutionErp\Api\` | out-of-process Kestrel | ASP.NET Core 10 API (port 5443 internal) |
| `SolutionErp-Admin` | `*:443:admin.solutions.com.vn` HTTPS + `*:80` redirect | `C:\inetpub\apps\SolutionErp\Admin\` | static (no app pool .NET) | React build fe-admin |
| `SolutionErp-User` | `*:443:eoffice.solutions.com.vn` HTTPS + `*:80` redirect | `C:\inetpub\apps\SolutionErp\User\` | static | React build fe-user |
**SPA web.config:** 2 FE có `URL Rewrite` rule:
1. HTTP → HTTPS redirect (bắt buộc, CORS whitelist chỉ https)
2. `/api/* → http://127.0.0.1:5443/api/*` (ARR reverse proxy)
3. `/hubs/* → http://127.0.0.1:5443/hubs/*` (SignalR)
4. React Router fallback: `/*``/index.html`
## Quick commands
### Restart 1 site
```powershell
# PowerShell as Admin
Import-Module WebAdministration
Stop-WebSite -Name "SolutionErp-Api"
Start-WebSite -Name "SolutionErp-Api"
# Hoặc recycle app pool (API out-of-process):
Restart-WebAppPool -Name "SolutionErp-Api"
# Check site status:
Get-Website -Name "SolutionErp-*" | Format-Table Name, State, Bindings
```
### Xem log API
```powershell
# Serilog file rolling daily
Get-Content "C:\inetpub\apps\SolutionErp\Api\Logs\log-$(Get-Date -Format 'yyyyMMdd').txt" -Tail 50
# IIS log
Get-Content "C:\inetpub\logs\LogFiles\W3SVC<ID>\u_ex$(Get-Date -Format 'yyMMdd').log" -Tail 30
# Stdout log khi crash startup
Get-Content "C:\inetpub\apps\SolutionErp\Api\Logs\stdout_*.log" -Tail 30
```
### Health check
```powershell
# Từ server
curl http://127.0.0.1:5443/health/live
curl http://127.0.0.1:5443/health/ready
# Từ ngoài
curl https://api.solutions.com.vn/health/ready
```
## Let's Encrypt cert — win-acme
### Check trạng thái
```powershell
# Mở win-acme interactive
& "C:\tools\win-acme\wacs.exe"
# Menu > Manage renewals > list — xem 3 cert + next renew date
# Hoặc file:
Get-Content "C:\ProgramData\win-acme\Production\$(hostname)\Renewals\*.renewal.json"
```
### Cert hết hạn emergency
```powershell
# Force renew 1 cert
& "C:\tools\win-acme\wacs.exe" --renew --force --id {renewal-id}
# Full re-issue nếu renewal fail:
& "C:\tools\win-acme\wacs.exe" # interactive → 'N' create new
# Chọn: HTTP validation, web root = site physical path, auto install IIS
```
**Gotcha:** Shared runner với VIETREPORT → win-acme HTTP challenge cần `.well-known/acme-challenge/` accessible qua HTTP (port 80). Rule HTTP→HTTPS redirect trong web.config PHẢI **exclude** path này:
```xml
<rule name="Redirect to HTTPS" stopProcessing="true">
<match url="(.*)" />
<conditions>
<add input="{HTTPS}" pattern="off" />
<add input="{REQUEST_URI}" pattern="^/\.well-known/" negate="true" />
</conditions>
<action type="Redirect" url="https://{HTTP_HOST}/{R:1}" />
</rule>
```
## Gitea Actions runner (NSSM service)
### Status
```powershell
# NSSM service name: gitea-runner (shared với VIETREPORT)
Get-Service gitea-runner
nssm status gitea-runner
# Restart
Restart-Service gitea-runner
# Log
Get-Content "C:\tools\gitea-runner\logs\act_runner.log" -Tail 50
```
### Token rotate (nếu runner disconnected)
```powershell
# Stop service
Stop-Service gitea-runner
# Re-register qua Gitea admin UI → Actions → Runners → get new registration token
& "C:\tools\gitea-runner\act_runner.exe" register `
--instance https://git.baocaogiaoduc.vn `
--token <new-token> `
--no-interactive
# Start lại
Start-Service gitea-runner
```
## LibreOffice headless (PDF / docx converter)
### Check install
```powershell
& "C:\Program Files\LibreOffice\program\soffice.exe" --version
# → LibreOffice 25.8.6.x
```
### Test convert manual
```powershell
# Tạo temp dir isolated (mô phỏng per-request pattern của LibreOfficeDocumentConverter)
$work = New-Item -ItemType Directory -Path "$env:TEMP\lo-test-$(Get-Random)"
$userInst = "$work\userinst"
& "C:\Program Files\LibreOffice\program\soffice.exe" `
--headless `
"-env:UserInstallation=file:///$($userInst.Replace('\', '/'))" `
--convert-to pdf `
--outdir $work `
"C:\path\to\test.docx"
# Output: $work\test.pdf
ls $work
Remove-Item -Recurse -Force $work
```
### Prod fail patterns
- **60s timeout** → PDF lớn (>100 page) có thể quá. Xem `LibreOfficeDocumentConverter` — tăng timeout nếu cần
- **Locked font fallback** → Be Vietnam Pro missing → text render hỏng. Install font trên server
- **Concurrent request lock** → mỗi request 1 `UserInstallation` dir riêng → tránh lock
## Debug playbook — prod error
### HTTP 500 all site
Xem gotcha #25 (docs/gotchas.md):
```powershell
# Likely config lock:
& "$env:SystemRoot\system32\inetsrv\appcmd.exe" list config -section:system.webServer/webSocket
# → overrideMode="Deny" → fix:
& "$env:SystemRoot\system32\inetsrv\appcmd.exe" unlock config -section:system.webServer/webSocket
```
### HTTP 502 Bad Gateway (Admin/User → API)
```
1. Check API up: curl http://127.0.0.1:5443/health/live
- Down → restart API site + check stdout log
2. Check ARR enabled: IIS Manager > server level > Application Request Routing
- "Enable proxy" phải tick
3. Check URL Rewrite rule fe web.config
- action type="Rewrite" url="http://127.0.0.1:5443/{R:0}"
```
### SignalR 401 (WebSocket connect fail)
Xem gotcha #26:
```
1. FE console: check ?access_token= query có trong negotiate URL không
2. BE log: JwtBearer OnMessageReceived có fire cho /hubs/* không
3. IIS WebSocket module: Install-WindowsFeature Web-WebSockets (đã có)
4. Section unlock: appcmd unlock config -section:system.webServer/webSocket
```
### Login "Network Error"
Xem `docs/gotchas.md` CORS + HTTPS redirect:
```
1. User gõ http://admin.solutions.com.vn → không redirect → CORS block
2. Fix: SPA web.config PHẢI có HTTP→HTTPS rule (đã có)
3. Test: curl -I http://admin.solutions.com.vn → expect 301 Location: https://...
```
### DB connection fail
```powershell
# 1. SQL service up?
Get-Service MSSQL*
# 2. TCP enabled?
Import-Module SqlServer
# Hoặc check registry:
Get-ItemProperty "HKLM:\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL*.SQLEXPRESS\MSSQLServer\SuperSocketNetLib\Tcp"
# 3. vrapp login OK?
sqlcmd -S .\SQLEXPRESS -U vrapp -P <pw> -Q "SELECT DB_NAME()"
# Expect: SolutionErp
# 4. appsettings connection string (qua Gitea secrets)
# Check C:\inetpub\apps\SolutionErp\Api\appsettings.Production.json có ConnectionStrings:DefaultConnection
```
## Deploy steps (CI/CD xanh)
Gitea Actions workflow: `.gitea/workflows/deploy.yml`. Flow:
```
Push to main
→ Runner pick up job
→ checkout repo
→ setup .NET 10 + Node 20
→ npm ci (fe-admin + fe-user, rolldown native binding OK nếu fresh node_modules)
→ dotnet restore + publish
→ npm run build (fe-admin + fe-user)
→ render appsettings.Production.json từ secrets (JWT_SECRET, DB_CONNECTION)
→ stop app pool SolutionErp-Api
→ xcopy publish → C:\inetpub\apps\SolutionErp\{Api,Admin,User}
→ start app pool
→ curl /health/ready → must be 200 trong 30s
→ report status
```
### Manual deploy (emergency)
```powershell
# Local build
dotnet publish src/Backend/SolutionErp.Api -c Release -o .\publish\api
cd fe-admin; npm ci; npm run build; cd ..
cd fe-user; npm ci; npm run build; cd ..
# Scp sang server (cần plink/pscp hoặc rsync)
scp -r .\publish\api\* user@server:C:/inetpub/apps/SolutionErp/Api/
scp -r .\fe-admin\dist\* user@server:C:/inetpub/apps/SolutionErp/Admin/
scp -r .\fe-user\dist\* user@server:C:/inetpub/apps/SolutionErp/User/
# Trên server:
Restart-WebAppPool -Name "SolutionErp-Api"
curl http://127.0.0.1:5443/health/ready
```
## Backup + recovery
```powershell
# DB backup (script sẵn, chưa schedule):
& "C:\inetpub\apps\SolutionErp\scripts\backup-sql.ps1"
# Output: backup/SolutionErp_<ts>.bak (compressed + retention 30d)
# Schedule daily 03:00:
schtasks /create /tn "SolutionErp Backup" `
/tr "powershell -ExecutionPolicy Bypass -File C:\inetpub\apps\SolutionErp\scripts\backup-sql.ps1" `
/sc DAILY /st 03:00 /ru SYSTEM
```
Restore: xem `docs/guides/runbook.md`.
## Rotate credentials (Phase 5.1 backlog)
- [ ] SQL `sa` password (rotate)
- [ ] SQL `vrapp` password (update Gitea secret `DB_CONNECTION` + appsettings.Production.json)
- [ ] JWT secret (update Gitea secret `JWT_SECRET`, next deploy sẽ lan tỏa. Tất cả token cũ invalid)
- [ ] Gitea runner registration token (re-register service)
- [ ] Admin default `Admin@123456` (đổi qua `/system/users` admin UI ngay sau deploy)
## Hardening — IPv4/IPv6 port hijack (G-084 VietReport incident)
**Bài học từ VPS shared với VIETREPORT (2026-04-23):** VietReport team
deploy Next.js app chiếm port 3000 (0.0.0.0 bind) khiến Gitea bị đẩy
sang IPv6-only `[::]:3000` → IIS ARR `localhost:3000` resolve IPv4
first → hit Next.js thay vì Gitea → `git.baocaogiaoduc.vn` trả homepage
VietReport.
**3 rules áp dụng cho mọi service trên VPS shared:**
1. **Reverse-proxy luôn dùng IP literal `127.0.0.1`**, không dùng `localhost`
- IIS ARR rewrite rule: `http://127.0.0.1:5443/{R:0}`
- Health check curl: `curl http://127.0.0.1:5443/health/live`
- Windows DNS resolver có thể cache IPv6 first → fail nếu service bind IPv4-only
2. **Backend services bind loopback IPv4 explicit**, không `0.0.0.0`
- ASP.NET Core Kestrel (standalone): `UseUrls("http://127.0.0.1:5443")` hoặc env `ASPNETCORE_URLS=http://127.0.0.1:5443`
- IIS ASP.NET Core Module out-of-process: ANCM tự inject port ephemeral → KHÔNG cần manual (OK)
- Nếu deploy Kestrel standalone qua NSSM (tương lai): hardcode 127.0.0.1 trong appsettings.Production.json
3. **Service dependency cho boot order** khi nhiều services cùng port family
- NSSM: `nssm set <svc> DependOnService <other>`
- Không cần cho SOLUTION_ERP hiện tại (API in IIS app pool, không NSSM service)
**Hiện trạng SOLUTION_ERP — risk THẤP:**
- API host trong IIS app pool out-of-process → ANCM quản lý port Kestrel ephemeral
- FE gọi trực tiếp `https://api.solutions.com.vn` qua CORS (không ARR proxy)
- Không có standalone Kestrel service trên port cố định
- **Nhưng** tương lai nếu thêm reverse proxy (fe-admin/user → `/api` → api.solutions.com.vn, hoặc /hubs for SignalR) → PHẢI dùng 127.0.0.1 không localhost
## Related
- `docs/guides/deployment-iis.md` — first-time setup
- `docs/guides/runbook.md` — operations guide chi tiết
- `docs/guides/cicd.md` — CI/CD pipeline
- `docs/gotchas.md`#25 webSocket lock, #26 SignalR, #28 LibreOffice 404, #29 PS 5.1 UTF-16, **#33 IPv4/IPv6 port hijack (G-084)**
- `scripts/deploy-iis.ps1` · `scripts/backup-sql.ps1` · `scripts/install-libreoffice.ps1`
- `.gitea/workflows/deploy.yml` — CI/CD definition