136ed700dd
- soroush-cicd/SKILL.md: CI/CD method for Gitea + Nexus, production safety rules - README.md: catalog of 70+ skills organized by category with trigger phrases
658 lines
30 KiB
Markdown
658 lines
30 KiB
Markdown
---
|
|
name: soroush-cicd
|
|
description: Soroush's standard CI/CD method - self-hosted Gitea (git.soroushasadi.com) + self-hosted Nexus mirror (mirror.soroushasadi.com) for any project. Use whenever the user says "soroush ci cd method", "my ci cd method", "soroush pipeline", "set up gitea pipeline", "use my nexus mirror", or asks to add CI/CD to a new project. Covers Gitea Actions workflow design, NuGet/npm/Docker/MCR/PyPI proxies through Nexus, runner labels (container vs host), self-hosted deploy job patterns, ENV_FILE secret, dual remote setup (github + gitea), and domain/Caddy cutover.
|
|
---
|
|
|
|
# Soroush CI/CD Method
|
|
|
|
The canonical recipe Soroush uses to ship any project. Two pieces of infrastructure are always reused across projects; only the per-project workflow YAML and compose files change.
|
|
|
|
| Component | URL | Role |
|
|
| --------- | -------------------------------- | ---------------------------------------------------------- |
|
|
| Gitea | https://git.soroushasadi.com | Git host + Actions runner (CI/CD trigger) |
|
|
| Nexus | https://mirror.soroushasadi.com | Pull-through mirror for NuGet, npm, Docker, MCR, PyPI, APT |
|
|
|
|
GitHub is kept as a backup remote (`origin`), Gitea is the CI remote (`gitea`). Only pushes to Gitea trigger pipelines.
|
|
|
|
## When to invoke this skill
|
|
|
|
Trigger automatically when the user says any of:
|
|
- "soroush ci cd method" / "my ci cd method" / "soroush pipeline"
|
|
- "set up gitea pipeline" / "wire this repo to gitea"
|
|
- "use my nexus mirror" / "route through mirror.soroushasadi.com"
|
|
- "add CI/CD to this project"
|
|
- "configure deploy job for the server"
|
|
|
|
Do not invoke for unrelated CI questions about GitHub Actions on github.com, CircleCI, GitLab.com, etc.
|
|
|
|
## How this skill works - read this first
|
|
|
|
This is a **method**, not a copy-paste script. Every project Soroush ships is different - different stacks, different services, different deploy targets, different secrets. The two pieces that NEVER change are:
|
|
|
|
1. Push to `gitea` -> CI runs.
|
|
2. All packages and base images come from `mirror.soroushasadi.com`.
|
|
|
|
Everything else (which jobs, which services, which compose files, which health checks, whether there's even a deploy step) is **derived per project**. Don't paste the templates blindly. Run the intake first, then generate a tailored workflow + checklist.
|
|
|
|
### Step 0 - Per-project intake
|
|
|
|
Before writing any YAML, gather these answers. Ask the user only the ones you can't infer from reading the repo:
|
|
|
|
| Question | Why it matters |
|
|
| ----------------------------------------------------- | -------------------------------------------------------------------- |
|
|
| What's the project name (for `concurrency.group:` and container names)? | Affects every `meezi-cicd-${{ github.ref }}` style identifier |
|
|
| What stacks are in this repo? (.NET / Node / Python / Flutter / static) | Decides which CI job templates apply |
|
|
| For each service, what's the build command and what does "passing" mean? (build only? tests? tsc? lint?) | Defines each CI job's `steps:` |
|
|
| Does the API need a real Postgres/Redis/Mongo during tests, or are mocks fine? | Adds `services:` block + healthchecks vs not |
|
|
| Is there a deploy target at all, or is this CI-only? (e.g. library, mobile app) | Decide whether to include the `deploy` job |
|
|
| If deploy: same server as Gitea, or remote? Docker compose, k8s, plain systemd, or a static upload? | Picks the deploy job pattern (host runner vs SSH vs rsync) |
|
|
| Which services need `NEXT_PUBLIC_*` (or other build-time) env vars? | Those must be in `ENV_FILE` BEFORE first build |
|
|
| What external services need secrets? (payment, SMS, email, S3, etc.) | Defines the `ENV_FILE` template |
|
|
| Is there already a domain, or IP-only for now? | Decides whether to wire Caddy and HTTPS now or later |
|
|
| Which Node / .NET / Python version per service? | Sets the exact mirror image tag |
|
|
| Are there migrations or one-shot init steps? | Decides `RUN_MIGRATIONS` flag + first-deploy ordering |
|
|
|
|
Once these are answered, build a **project-specific checklist** (see Step 0.5 below) and only then generate files.
|
|
|
|
### Step 0.5 - Build the per-project checklist
|
|
|
|
The checklist below ("First-time bring-up checklist") is a **superset** - it covers the maximum case. For each project, prune items that don't apply and add items that are unique to that project. Example:
|
|
|
|
- Static Next.js site with no backend? Drop the Postgres/Redis, drop `RUN_MIGRATIONS`, drop the API health-wait loop, keep tsc + deploy.
|
|
- .NET-only API with no frontend? Drop all Node jobs, keep dotnet + postgres service.
|
|
- Library / SDK project? Drop the entire deploy job; keep only build + test on PR.
|
|
- Project where Gitea Actions deploys to a DIFFERENT server than Gitea itself? Replace the `self-hosted:host` deploy job with an SSH-based deploy job.
|
|
- Mobile Flutter project? Replace dotnet/node images with `mirror.soroushasadi.com/cirrusci/flutter:<ver>`; deploy job uploads artifacts instead of `docker compose up`.
|
|
|
|
Present the tailored checklist to the user BEFORE writing files, so they can confirm or adjust. Format it as a numbered todo so it's actionable.
|
|
|
|
## Mental model
|
|
|
|
```
|
|
Developer machine
|
|
| git push origin <branch> -> GitHub backup, no CI
|
|
| git push gitea <branch> -> https://git.soroushasadi.com (TRIGGERS CI)
|
|
v
|
|
Gitea Actions (act_runner registered with two labels)
|
|
|
|
|
+-- "CI" jobs runs-on: ubuntu-latest
|
|
| container: mirror.soroushasadi.com/<image>
|
|
| all package managers point at Nexus groups
|
|
|
|
|
+-- "deploy" job runs-on: self-hosted (label "host")
|
|
shells into docker compose on the server
|
|
reads ENV_FILE secret -> writes .env
|
|
```
|
|
|
|
Two runner labels are required on the act_runner config:
|
|
|
|
| Label | Runs where | Used for |
|
|
| ------------------------------------------- | --------------------------- | ----------------------- |
|
|
| `ubuntu-latest:docker://node:20-alpine` | Inside a Docker container | build / test / type-check |
|
|
| `self-hosted:host` | Directly on the server shell | the `deploy` job |
|
|
|
|
The `:host` suffix on the second label is what lets the deploy job call `docker compose` against the host's docker daemon.
|
|
|
|
## Nexus repositories - the four that every project uses
|
|
|
|
Provisioned once on the server via `mirrors/nexus/provision.sh` (idempotent):
|
|
|
|
| Nexus repo | Type | Upstream | Consumed by |
|
|
| ------------------- | -------------- | ------------------------------- | ------------------------------ |
|
|
| `nuget-group` | NuGet group | `nuget-proxy` -> api.nuget.org | `dotnet restore` in CI + Docker |
|
|
| `npm-group` | npm group | `npm-proxy` -> registry.npmjs.org| `npm install` in CI + Docker |
|
|
| `docker-hub-proxy` | Docker proxy | Docker Hub (or Liara mirror) | `mirror.soroushasadi.com/node:...`, postgres, redis... |
|
|
| `mcr-proxy` | Docker proxy | mcr.microsoft.com | `mirror.soroushasadi.com/dotnet/sdk:...`, aspnet |
|
|
|
|
Optional but useful when projects need them:
|
|
|
|
| Nexus repo | Type | Upstream | Use |
|
|
| --------------------- | ----------- | -------------------- | ------------------------- |
|
|
| `pypi-proxy` | PyPI proxy | Liara | `pip install` |
|
|
| `ubuntu-proxy` | APT proxy | Liara (jammy) | `apt-get` in Dockerfiles |
|
|
| `ubuntu-security-proxy` | APT proxy | Liara (jammy-security) | `apt-get` security updates |
|
|
|
|
The `*-group` repos are what clients talk to; they hide upstream fallback logic (Liara primary, Runflare/direct fallback). Never point clients at a `*-proxy` directly - always at the group.
|
|
|
|
### Host docker daemon mirror entry
|
|
|
|
So that any `docker pull` on the server (including outside CI) goes through Nexus, drop this into `/etc/docker/daemon.json` and restart docker:
|
|
|
|
```json
|
|
{ "registry-mirrors": ["https://mirror.soroushasadi.com"] }
|
|
```
|
|
|
|
CI jobs still reference `mirror.soroushasadi.com/<image>` explicitly so they work regardless of the runner's docker config.
|
|
|
|
## The workflow file - `.gitea/workflows/ci-cd.yml`
|
|
|
|
### Skeleton
|
|
|
|
```yaml
|
|
name: CI/CD
|
|
|
|
on:
|
|
push: { branches: [main] }
|
|
pull_request:{ branches: [main] }
|
|
|
|
concurrency:
|
|
group: <project>-cicd-${{ github.ref }}
|
|
cancel-in-progress: true
|
|
|
|
jobs:
|
|
# one or more CI jobs (build / test / typecheck) per service
|
|
# one deploy job at the bottom that needs: all CI jobs
|
|
```
|
|
|
|
`concurrency.cancel-in-progress: true` so rapid pushes don't stack deploys.
|
|
|
|
### CI job - .NET (template)
|
|
|
|
```yaml
|
|
api-build:
|
|
name: "CI - API (dotnet build + test)"
|
|
runs-on: ubuntu-latest
|
|
container:
|
|
image: mirror.soroushasadi.com/dotnet/sdk:<version>
|
|
options: --add-host=gitea:host-gateway
|
|
services: # optional: integration DB/redis
|
|
postgres:
|
|
image: mirror.soroushasadi.com/postgres:16-alpine
|
|
env: { POSTGRES_DB: app_test, POSTGRES_USER: app, POSTGRES_PASSWORD: test_pass }
|
|
options: --health-cmd pg_isready --health-interval 5s --health-timeout 5s --health-retries 10
|
|
redis:
|
|
image: mirror.soroushasadi.com/redis:7-alpine
|
|
options: --health-cmd "redis-cli ping" --health-interval 5s --health-timeout 3s --health-retries 10
|
|
steps:
|
|
- name: Checkout
|
|
env:
|
|
TOKEN: ${{ github.token }}
|
|
REF: ${{ github.ref }}
|
|
run: |
|
|
git init
|
|
git remote add origin "${{ github.server_url }}/${{ github.repository }}.git"
|
|
git config http.extraheader "Authorization: Bearer ${TOKEN}"
|
|
git fetch --depth=1 origin "${REF}"
|
|
git checkout FETCH_HEAD
|
|
|
|
- name: Write NuGet config
|
|
run: |
|
|
cat > /tmp/nuget.ci.config << 'EOF'
|
|
<?xml version="1.0" encoding="utf-8"?>
|
|
<configuration>
|
|
<packageSources>
|
|
<clear />
|
|
<add key="nexus"
|
|
value="https://mirror.soroushasadi.com/repository/nuget-group/index.json"
|
|
protocolVersion="3" />
|
|
</packageSources>
|
|
</configuration>
|
|
EOF
|
|
|
|
- name: Restore
|
|
run: dotnet restore src/<Project>/<Project>.csproj --configfile /tmp/nuget.ci.config
|
|
env: { DOTNET_CLI_TELEMETRY_OPTOUT: 1 }
|
|
|
|
- name: Build
|
|
run: dotnet build src/<Project>/<Project>.csproj --no-restore -c Release
|
|
|
|
- name: Test
|
|
run: dotnet test --no-build -c Release --logger "console;verbosity=minimal"
|
|
env:
|
|
ConnectionStrings__DefaultConnection: "Host=postgres;Port=5432;Database=app_test;Username=app;Password=test_pass"
|
|
ConnectionStrings__Redis: "redis:6379"
|
|
```
|
|
|
|
### CI job - Node / Next.js (template)
|
|
|
|
```yaml
|
|
web-check:
|
|
name: "CI - Web (tsc)"
|
|
runs-on: ubuntu-latest
|
|
container:
|
|
image: mirror.soroushasadi.com/node:20-alpine
|
|
options: --add-host=gitea:host-gateway
|
|
steps:
|
|
- name: Checkout
|
|
env:
|
|
TOKEN: ${{ github.token }}
|
|
SHA: ${{ github.sha }}
|
|
run: |
|
|
wget -q --header "Authorization: Bearer ${TOKEN}" \
|
|
"${{ github.server_url }}/api/v1/repos/${{ github.repository }}/archive/${SHA}.tar.gz" \
|
|
-O /tmp/repo.tar.gz
|
|
tar -xzf /tmp/repo.tar.gz --strip-components=1
|
|
|
|
- name: Install
|
|
working-directory: web/<app>
|
|
run: npm install --legacy-peer-deps --ignore-scripts \
|
|
--registry https://mirror.soroushasadi.com/repository/npm-group/
|
|
|
|
- name: TypeScript check
|
|
working-directory: web/<app>
|
|
run: npx tsc --noEmit
|
|
```
|
|
|
|
The two checkout variants both work; the tarball one is faster when git history isn't needed. Either way DON'T rely on `actions/checkout@v4` - on self-hosted Gitea it's not guaranteed to be available.
|
|
|
|
### Deploy job (always self-hosted)
|
|
|
|
```yaml
|
|
deploy:
|
|
name: "Deploy - all services"
|
|
runs-on: self-hosted
|
|
env:
|
|
# act runner host mode starts with minimal PATH - extend so docker/snap are found
|
|
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
|
|
needs:
|
|
- api-build
|
|
- web-check
|
|
# ... every CI job
|
|
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
|
|
timeout-minutes: 40
|
|
|
|
steps:
|
|
- name: Checkout
|
|
env:
|
|
TOKEN: ${{ github.token }}
|
|
REF: ${{ github.ref }}
|
|
run: |
|
|
git init
|
|
git remote add origin "${{ github.server_url }}/${{ github.repository }}.git"
|
|
git config http.extraheader "Authorization: Bearer ${TOKEN}"
|
|
git fetch --depth=1 origin "${REF}"
|
|
git checkout FETCH_HEAD
|
|
|
|
- name: Write .env
|
|
run: printf '%s' "$ENV_FILE" > .env
|
|
env: { ENV_FILE: ${{ secrets.ENV_FILE }} }
|
|
|
|
- name: Build images
|
|
run: docker compose build --parallel <svc1> <svc2> ...
|
|
env: { DOCKER_BUILDKIT: 1, COMPOSE_DOCKER_CLI_BUILD: 1 }
|
|
|
|
- name: Start services
|
|
run: docker compose up -d --no-deps <svc1> <svc2> ...
|
|
|
|
- name: Wait for API healthy
|
|
run: |
|
|
for i in $(seq 1 24); do
|
|
STATUS=$(docker inspect --format='{{.State.Health.Status}}' <api-container> 2>/dev/null || echo "missing")
|
|
echo " [$i/24] $STATUS"
|
|
[ "$STATUS" = "healthy" ] && echo "OK <api-container> healthy" && break
|
|
[ "$i" = "24" ] && echo "TIMEOUT <api-container>" && docker compose logs --tail=40 <api> && exit 1
|
|
sleep 5
|
|
done
|
|
|
|
- name: Prune old images
|
|
if: success()
|
|
run: docker image prune -f
|
|
```
|
|
|
|
Things that bite if you forget them:
|
|
- `runs-on: self-hosted` (NOT `ubuntu-latest`) for deploy.
|
|
- The explicit `PATH:` env var - act runners strip PATH and won't find docker.
|
|
- `--no-deps` keeps a one-service redeploy from cascading restart on databases.
|
|
- Every compose service that the health-wait loop checks MUST define a `healthcheck:`.
|
|
- `NEXT_PUBLIC_*` env vars are baked at Next.js build time - changes to them in `ENV_FILE` only take effect after the next CI rebuild.
|
|
|
|
## Dockerfile patterns
|
|
|
|
### .NET image - NuGet through Nexus
|
|
|
|
Copy `nuget.docker.config` into the repo with:
|
|
|
|
```xml
|
|
<?xml version="1.0" encoding="utf-8"?>
|
|
<configuration>
|
|
<packageSources>
|
|
<clear />
|
|
<add key="nexus"
|
|
value="https://mirror.soroushasadi.com/repository/nuget-group/index.json"
|
|
protocolVersion="3" />
|
|
</packageSources>
|
|
<config>
|
|
<add key="http_retry_count" value="8" />
|
|
<add key="http_retry_delay_milliseconds" value="1000" />
|
|
</config>
|
|
</configuration>
|
|
```
|
|
|
|
Then in Dockerfile:
|
|
|
|
```dockerfile
|
|
FROM mirror.soroushasadi.com/dotnet/sdk:<version> AS build
|
|
WORKDIR /src
|
|
COPY nuget.docker.config /tmp/nuget.config
|
|
COPY src/ .
|
|
RUN dotnet restore <Project>.csproj --configfile /tmp/nuget.config
|
|
RUN dotnet publish <Project>.csproj -c Release -o /out --no-restore
|
|
|
|
FROM mirror.soroushasadi.com/dotnet/aspnet:<version>
|
|
WORKDIR /app
|
|
COPY --from=build /out ./
|
|
ENTRYPOINT ["dotnet", "<Project>.dll"]
|
|
```
|
|
|
|
### Node image - npm through Nexus
|
|
|
|
```dockerfile
|
|
FROM mirror.soroushasadi.com/node:20-alpine AS build
|
|
WORKDIR /app
|
|
COPY package*.json ./
|
|
RUN npm install --legacy-peer-deps --ignore-scripts \
|
|
--registry https://mirror.soroushasadi.com/repository/npm-group/
|
|
COPY . .
|
|
RUN npm run build
|
|
|
|
FROM mirror.soroushasadi.com/node:20-alpine
|
|
WORKDIR /app
|
|
COPY --from=build /app/.next ./.next
|
|
COPY --from=build /app/node_modules ./node_modules
|
|
COPY --from=build /app/package.json ./
|
|
EXPOSE 3000
|
|
CMD ["npm", "start"]
|
|
```
|
|
|
|
### Python image - pip through Nexus
|
|
|
|
```dockerfile
|
|
FROM mirror.soroushasadi.com/python:3.12-slim
|
|
RUN pip config set global.index-url https://mirror.soroushasadi.com/repository/pypi-proxy/simple/ \
|
|
&& pip config set global.trusted-host mirror.soroushasadi.com
|
|
COPY requirements.txt .
|
|
RUN pip install -r requirements.txt
|
|
```
|
|
|
|
## Git remote setup on developer machine
|
|
|
|
```bash
|
|
# In a fresh repo, after `git init`:
|
|
git remote add origin https://github.com/<user>/<repo>.git
|
|
git remote add gitea https://git.soroushasadi.com/<user>/<repo>.git
|
|
|
|
git remote -v
|
|
# origin https://github.com/... (fetch+push, GitHub backup)
|
|
# gitea https://git.soroushasadi.com/... (fetch+push, CI/CD)
|
|
```
|
|
|
|
Daily flow:
|
|
|
|
```bash
|
|
git push origin main # GitHub backup, no CI runs
|
|
git push gitea main # Gitea triggers CI + deploy on main
|
|
```
|
|
|
|
For a new project: create the repo in Gitea UI first (or via `gh`-style API), then add the `gitea` remote locally.
|
|
|
|
## Secrets
|
|
|
|
One secret rules them all: `ENV_FILE`. Set at:
|
|
|
|
```
|
|
https://git.soroushasadi.com/<user>/<repo>/settings/secrets
|
|
```
|
|
|
|
The deploy job writes it verbatim to `.env`, which `docker compose` reads. Contents are project-specific but always include:
|
|
|
|
- `ASPNETCORE_ENVIRONMENT=Production` (for .NET projects)
|
|
- `RUN_MIGRATIONS=true` on first deploy, `false` after
|
|
- Connection strings (`DB_CONNECTION_STRING`, etc.)
|
|
- `JWT_KEY` - generate with `openssl rand -hex 32`
|
|
- `NEXT_PUBLIC_*` URLs (baked at build time - require CI rerun if changed)
|
|
- `CORS_ORIGIN_*` for every front-end origin
|
|
- Provider keys (payment gateway, SMS, etc.)
|
|
- Host port mappings (`*_PORT` while pre-domain)
|
|
|
|
To rotate any secret: edit `ENV_FILE` in Gitea, then push any commit to trigger a redeploy.
|
|
|
|
## Default bring-up checklist (superset - PRUNE per project)
|
|
|
|
This is the maximum case (full-stack app with .NET + Node + Postgres + Redis + deploy to the same server). For every actual project, drop items that don't apply, rephrase items that need different commands, and add items unique to that project. The result is the per-project checklist you should hand the user.
|
|
|
|
**Server one-time setup** (skip entirely if Soroush already runs Gitea + Nexus on the target server):
|
|
|
|
1. Docker + docker compose v2 + Gitea + Nexus + act_runner installed.
|
|
2. `/etc/docker/daemon.json` has the Nexus mirror entry.
|
|
3. Nexus `provision.sh` has been run; the four standard groups exist with anonymous read.
|
|
4. act_runner registered with both labels (`ubuntu-latest:docker://...` and `self-hosted:host`).
|
|
|
|
**Per-project setup** (adjust per intake answers):
|
|
|
|
5. Repo created on Gitea, mirrored from GitHub (or vice versa).
|
|
6. `.gitea/workflows/ci-cd.yml` committed - jobs match the services we identified in intake.
|
|
7. Dockerfiles committed for each service that needs an image. (Drop if pure static / library.)
|
|
8. `docker-compose.yml` committed; every service that the deploy job health-waits on has a `healthcheck:`. (Drop if no Docker deploy.)
|
|
9. `nuget.docker.config` committed (only if .NET is in the stack).
|
|
10. `.npmrc` or inline `--registry` flag in CI (only if Node is in the stack).
|
|
11. `ENV_FILE` secret set on the Gitea repo with every key the deploy job and `docker-compose.yml` reference. (Drop if CI-only project.)
|
|
12. Developer machine has both `origin` and `gitea` remotes.
|
|
13. First push: `git push gitea <branch>`. Watch `https://git.soroushasadi.com/<user>/<repo>/actions`.
|
|
|
|
**Project-specific items to consider adding:**
|
|
|
|
- Migration runner step / `RUN_MIGRATIONS=true` flag (databases only)
|
|
- Caddy + Let's Encrypt overlay (when domain is ready)
|
|
- One-time seed/import step on first deploy
|
|
- Cron job or scheduled job hooks
|
|
- External webhook URLs that need to be registered after deploy
|
|
- Object storage bucket creation
|
|
- DNS A records list
|
|
|
|
Expected first-run time depends on stack: ~3 min (static site) to ~15 min (full-stack with cold Nexus cache). Subsequent runs are fast.
|
|
|
|
## Adding a new service to an existing pipeline
|
|
|
|
1. Add a new CI job using the Node/.NET/Python template above.
|
|
2. Add the new job name to `deploy.needs:`.
|
|
3. Add the service to `docker-compose.yml` with a `healthcheck:`.
|
|
4. Add a `docker compose build` step and `docker compose up -d` step in the deploy job.
|
|
5. Add a health-wait loop if it's an API.
|
|
6. Push to gitea.
|
|
|
|
## Domain / HTTPS cutover (when DNS is ready)
|
|
|
|
When subdomains resolve to the server:
|
|
|
|
1. Update `ENV_FILE`: swap IP-based `NEXT_PUBLIC_*` / `CORS_*` for `https://*.<domain>`, drop the `*_PORT` host-port vars.
|
|
2. Add `-f docker-compose.caddy.yml` to every `docker compose` invocation in the deploy job.
|
|
3. `ufw allow 80 && ufw allow 443` on the server.
|
|
4. Push to gitea. Caddy issues Let's Encrypt certs on first run; no certbot/manual renewal needed.
|
|
|
|
A typical `docker-compose.caddy.yml` overlay defines a `caddy` service publishing 80/443 with a Caddyfile that reverse-proxies each subdomain to the corresponding internal service.
|
|
|
|
## Troubleshooting
|
|
|
|
| Symptom | Cause | Fix |
|
|
| ----------------------------------------------- | ------------------------------------------------ | -------------------------------------------------------------------- |
|
|
| Job hangs at "Pulling image" | Runner can't reach mirror.soroushasadi.com | DNS / Nexus down. `curl -s https://mirror.soroushasadi.com/service/rest/v1/status` |
|
|
| `dotnet restore` returns 401/403 | Nexus anonymous read disabled | Re-run `provision.sh` (enables anon access + realms) |
|
|
| `npm install` extremely slow first run | First fetch through `npm-proxy` is cold | Wait; subsequent runs hit cached blobs |
|
|
| Deploy: `docker: command not found` | Runner PATH stripped | Confirm `env: PATH:` line in deploy job |
|
|
| Deploy: `permission denied ... /var/run/docker.sock` | Runner user not in `docker` group | `usermod -aG docker <runner-user>` and restart act_runner |
|
|
| Health-wait times out | Service has no `healthcheck:` defined | Add `HEALTHCHECK` in Dockerfile or `healthcheck:` in compose |
|
|
| `NEXT_PUBLIC_*` URL didn't change in browser | Vars baked at Next.js build time | Push a commit to trigger image rebuild |
|
|
| Deploy ran but old code still serving | Container not recreated | Use `docker compose up -d --force-recreate <svc>` or rebuild image |
|
|
| Two pushes only deployed once | `concurrency.cancel-in-progress: true` cancelled the earlier run | Expected. Sequence pushes if both must deploy. |
|
|
| Gitea checkout returns 401 | Token scope changed or runner re-registered | Workflow uses `${{ github.token }}`; re-register runner if compromised |
|
|
| `act_runner` won't start | Token expired | Generate a fresh runner registration token in Gitea admin |
|
|
|
|
## Files / commands cheat sheet
|
|
|
|
| Thing | Where |
|
|
| ------------------------------------ | ---------------------------------------------------------- |
|
|
| Workflow | `.gitea/workflows/ci-cd.yml` |
|
|
| NuGet config for CI | `nuget.mirror.config` (root) |
|
|
| NuGet config for Docker builds | `nuget.docker.config` (root) |
|
|
| Host docker mirror entry | `/etc/docker/daemon.json` |
|
|
| Nexus provisioning | `mirrors/nexus/provision.sh` |
|
|
| Liara upstream swap | `mirrors/nexus/add-liara-mirrors.sh`, `update-docker-upstream.sh` |
|
|
| Nexus compose | `docker-compose.mirror.yml` |
|
|
| Gitea Actions config | Gitea `app.ini` -> `[actions] ENABLED = true` |
|
|
| act_runner config | `/etc/act_runner/config.yaml` (labels live here) |
|
|
| Nexus health check | `curl -s https://mirror.soroushasadi.com/service/rest/v1/status` |
|
|
| View pipeline | `https://git.soroushasadi.com/<user>/<repo>/actions` |
|
|
| Set/rotate `ENV_FILE` | `https://git.soroushasadi.com/<user>/<repo>/settings/secrets` |
|
|
|
|
## Review checklist (apply only the items relevant to this project's stack)
|
|
|
|
**Always check (regardless of stack):**
|
|
- [ ] Every `container.image` and `services.<x>.image` uses `mirror.soroushasadi.com/...`?
|
|
- [ ] Every container job has `options: --add-host=gitea:host-gateway`?
|
|
- [ ] Every checkout step is manual (`git init` + bearer token, or tarball API)?
|
|
- [ ] `concurrency.cancel-in-progress: true` is set?
|
|
|
|
**If .NET is in the stack:**
|
|
- [ ] dotnet restore uses `--configfile` pointing at the Nexus nuget group?
|
|
- [ ] `nuget.docker.config` is present and copied into the Dockerfile?
|
|
|
|
**If Node is in the stack:**
|
|
- [ ] npm install uses `--registry https://mirror.soroushasadi.com/repository/npm-group/`?
|
|
- [ ] `NEXT_PUBLIC_*` envs that affect the build are in `ENV_FILE` before first build?
|
|
|
|
**If Python is in the stack:**
|
|
- [ ] pip uses `--index-url https://mirror.soroushasadi.com/repository/pypi-proxy/simple/`?
|
|
|
|
**If a deploy job exists:**
|
|
- [ ] Deploy job is `runs-on: self-hosted`?
|
|
- [ ] Deploy job has the explicit `PATH:` env line?
|
|
- [ ] Deploy job is gated by `if: github.event_name == 'push' && github.ref == 'refs/heads/<deploy-branch>'`?
|
|
- [ ] Deploy `needs:` lists every CI job?
|
|
- [ ] `ENV_FILE` secret exists on the Gitea repo?
|
|
|
|
**If docker compose is used at deploy time:**
|
|
- [ ] Every compose service that the deploy job health-waits on has a `healthcheck:`?
|
|
- [ ] `--no-deps` used so single-service redeploy doesn't cascade?
|
|
|
|
**If migrations / first-run setup exist:**
|
|
- [ ] `RUN_MIGRATIONS=true` (or equivalent) is in `ENV_FILE`, with a plan to flip it later?
|
|
|
|
**If a domain is wired:**
|
|
- [ ] Caddy overlay included in deploy job?
|
|
- [ ] Ports 80/443 open in server firewall?
|
|
- [ ] All `CORS_ORIGIN_*` and `NEXT_PUBLIC_*` URLs use the domain, not the IP?
|
|
|
|
Items that don't apply to this project should be removed from the per-project checklist, not just left unchecked.
|
|
|
|
---
|
|
|
|
## 🚨 Production Safety Rules (learned from real incidents)
|
|
|
|
These rules exist because each one caused data loss or downtime on a real project.
|
|
Run this checklist **before** any deploy, compose change, or port change.
|
|
|
|
### Before every deploy — data safety
|
|
|
|
```bash
|
|
# 1. Back up the DB before touching the container
|
|
docker cp <container_name>:/data/<app>.db \
|
|
/opt/<project>-backups/<app>-$(date +%Y%m%d-%H%M%S).db
|
|
|
|
# 2. Scan for orphaned volumes before first deploy on a new server
|
|
docker volume ls | grep db_data
|
|
docker volume ls | grep uploads_data
|
|
```
|
|
|
|
**Why:** When `docker compose up` runs with a new project `name:` (or from a
|
|
different working directory), Docker creates **fresh empty volumes** and leaves
|
|
old data orphaned in volumes with a different prefix (e.g. `hostexecutor_db_data`
|
|
instead of `drsousan_db_data`). This caused total data loss on draletaha.ir.
|
|
The only recovery was finding the orphaned volume and copying it back.
|
|
|
|
Restore both volumes together (DB and uploads must be in sync):
|
|
```bash
|
|
docker run --rm -v OLD_db_data:/old -v NEW_db_data:/new alpine sh -c "cp -r /old/. /new/"
|
|
docker run --rm -v OLD_uploads:/old -v NEW_uploads:/new alpine sh -c "cp -r /old/. /new/"
|
|
```
|
|
|
|
**Never use** `docker compose down -v` — it deletes named volumes permanently.
|
|
|
|
---
|
|
|
|
### Before every deploy — container conflict
|
|
|
|
`docker compose up --force-recreate` is unreliable on some Docker versions and
|
|
will fail with "container name already in use." Always use explicit stop + rm:
|
|
|
|
```bash
|
|
# ✅ Reliable
|
|
docker stop <container_name> 2>/dev/null || true
|
|
docker rm <container_name> 2>/dev/null || true
|
|
docker compose up -d --no-deps <service>
|
|
|
|
# ❌ Unreliable — do not use
|
|
docker compose up -d --force-recreate <service>
|
|
```
|
|
|
|
---
|
|
|
|
### Before every deploy — rollback tag
|
|
|
|
Tag the running image before replacing it so you can roll back instantly:
|
|
|
|
```bash
|
|
CURRENT=$(docker inspect <container_name> --format='{{.Config.Image}}')
|
|
docker tag "$CURRENT" <registry>/<project>:rollback
|
|
```
|
|
|
|
If the new container fails its health check, roll back:
|
|
```bash
|
|
docker stop <container_name> && docker rm <container_name>
|
|
docker run -d --name <container_name> <registry>/<project>:rollback
|
|
```
|
|
|
|
---
|
|
|
|
### Before changing ports in docker-compose.yml
|
|
|
|
```bash
|
|
# Check what's already listening on the host
|
|
ss -tlnp | grep LISTEN
|
|
docker ps --format "table {{.Names}}\t{{.Ports}}"
|
|
```
|
|
|
|
Never assume a port is free. A port conflict silently breaks other services
|
|
running on the same host.
|
|
|
|
---
|
|
|
|
### Scope every compose command to a single service
|
|
|
|
```bash
|
|
# ✅ Touches only this service
|
|
docker compose up -d --no-deps <service>
|
|
|
|
# ❌ Stops ALL services in the compose file — kills unrelated containers
|
|
docker compose down
|
|
docker compose restart
|
|
```
|
|
|
|
Never use bare `docker compose down` in a CI/CD workflow or on a production
|
|
server that runs multiple projects.
|
|
|
|
---
|
|
|
|
### CI/CD workflow safety checklist
|
|
|
|
When editing `.gitea/workflows/ci-cd.yml`:
|
|
|
|
- [ ] **Backup step** runs before Deploy (`docker cp .../app.db /opt/backups/...`)
|
|
- [ ] **Deploy step** uses `docker stop || true && docker rm || true` — no `--force-recreate`
|
|
- [ ] **No** `docker compose down` anywhere in the workflow
|
|
- [ ] **Rollback tag** applied before deploy
|
|
- [ ] **Health check** loop after deploy, exits non-zero on timeout
|
|
- [ ] **Port** in deploy matches `HOST_PORT` env var — verified not already taken
|
|
- [ ] **Prune** step only removes dangling images for THIS project
|
|
|
|
---
|
|
|
|
### Incident log
|
|
|
|
| When | What broke | Root cause | Fix |
|
|
|---|---|---|---|
|
|
| 2026-06 | All DB + uploads lost on draletaha.ir | New `name: drsousan` in compose created fresh volumes; data sat in `hostexecutor_db_data` | Restored from orphaned volume; added pre-deploy backup step to CI |
|
|
| 2026-06 | Deploy failed twice ("container name in use") | `--no-deps` then `--force-recreate` both fail when container already exists | Replaced with explicit `stop + rm + up` |
|
|
| Earlier | Port conflict broke another service on same host | Port assumed free without checking | Added port audit (`ss -tlnp`) before mapping new ports |
|
|
| Earlier | Unrelated containers stopped on redeploy | Bare `docker compose down` used in workflow | Rule: always scope to `--no-deps <service>` |
|
|
| Earlier | No rollback possible after bad deploy | Old container removed before new one was verified healthy | Added rollback tag step + health check gate |
|