- soroush-cicd/SKILL.md: CI/CD method for Gitea + Nexus, production safety rules - README.md: catalog of 70+ skills organized by category with trigger phrases
30 KiB
name, description
| name | description |
|---|---|
| soroush-cicd | Soroush's standard CI/CD method - self-hosted Gitea (git.soroushasadi.com) + self-hosted Nexus mirror (mirror.soroushasadi.com) for any project. Use whenever the user says "soroush ci cd method", "my ci cd method", "soroush pipeline", "set up gitea pipeline", "use my nexus mirror", or asks to add CI/CD to a new project. Covers Gitea Actions workflow design, NuGet/npm/Docker/MCR/PyPI proxies through Nexus, runner labels (container vs host), self-hosted deploy job patterns, ENV_FILE secret, dual remote setup (github + gitea), and domain/Caddy cutover. |
Soroush CI/CD Method
The canonical recipe Soroush uses to ship any project. Two pieces of infrastructure are always reused across projects; only the per-project workflow YAML and compose files change.
| Component | URL | Role |
|---|---|---|
| Gitea | https://git.soroushasadi.com | Git host + Actions runner (CI/CD trigger) |
| Nexus | https://mirror.soroushasadi.com | Pull-through mirror for NuGet, npm, Docker, MCR, PyPI, APT |
GitHub is kept as a backup remote (origin), Gitea is the CI remote (gitea). Only pushes to Gitea trigger pipelines.
When to invoke this skill
Trigger automatically when the user says any of:
- "soroush ci cd method" / "my ci cd method" / "soroush pipeline"
- "set up gitea pipeline" / "wire this repo to gitea"
- "use my nexus mirror" / "route through mirror.soroushasadi.com"
- "add CI/CD to this project"
- "configure deploy job for the server"
Do not invoke for unrelated CI questions about GitHub Actions on github.com, CircleCI, GitLab.com, etc.
How this skill works - read this first
This is a method, not a copy-paste script. Every project Soroush ships is different - different stacks, different services, different deploy targets, different secrets. The two pieces that NEVER change are:
- Push to
gitea-> CI runs. - All packages and base images come from
mirror.soroushasadi.com.
Everything else (which jobs, which services, which compose files, which health checks, whether there's even a deploy step) is derived per project. Don't paste the templates blindly. Run the intake first, then generate a tailored workflow + checklist.
Step 0 - Per-project intake
Before writing any YAML, gather these answers. Ask the user only the ones you can't infer from reading the repo:
| Question | Why it matters |
|---|---|
What's the project name (for concurrency.group: and container names)? |
Affects every meezi-cicd-${{ github.ref }} style identifier |
| What stacks are in this repo? (.NET / Node / Python / Flutter / static) | Decides which CI job templates apply |
| For each service, what's the build command and what does "passing" mean? (build only? tests? tsc? lint?) | Defines each CI job's steps: |
| Does the API need a real Postgres/Redis/Mongo during tests, or are mocks fine? | Adds services: block + healthchecks vs not |
| Is there a deploy target at all, or is this CI-only? (e.g. library, mobile app) | Decide whether to include the deploy job |
| If deploy: same server as Gitea, or remote? Docker compose, k8s, plain systemd, or a static upload? | Picks the deploy job pattern (host runner vs SSH vs rsync) |
Which services need NEXT_PUBLIC_* (or other build-time) env vars? |
Those must be in ENV_FILE BEFORE first build |
| What external services need secrets? (payment, SMS, email, S3, etc.) | Defines the ENV_FILE template |
| Is there already a domain, or IP-only for now? | Decides whether to wire Caddy and HTTPS now or later |
| Which Node / .NET / Python version per service? | Sets the exact mirror image tag |
| Are there migrations or one-shot init steps? | Decides RUN_MIGRATIONS flag + first-deploy ordering |
Once these are answered, build a project-specific checklist (see Step 0.5 below) and only then generate files.
Step 0.5 - Build the per-project checklist
The checklist below ("First-time bring-up checklist") is a superset - it covers the maximum case. For each project, prune items that don't apply and add items that are unique to that project. Example:
- Static Next.js site with no backend? Drop the Postgres/Redis, drop
RUN_MIGRATIONS, drop the API health-wait loop, keep tsc + deploy. - .NET-only API with no frontend? Drop all Node jobs, keep dotnet + postgres service.
- Library / SDK project? Drop the entire deploy job; keep only build + test on PR.
- Project where Gitea Actions deploys to a DIFFERENT server than Gitea itself? Replace the
self-hosted:hostdeploy job with an SSH-based deploy job. - Mobile Flutter project? Replace dotnet/node images with
mirror.soroushasadi.com/cirrusci/flutter:<ver>; deploy job uploads artifacts instead ofdocker compose up.
Present the tailored checklist to the user BEFORE writing files, so they can confirm or adjust. Format it as a numbered todo so it's actionable.
Mental model
Developer machine
| git push origin <branch> -> GitHub backup, no CI
| git push gitea <branch> -> https://git.soroushasadi.com (TRIGGERS CI)
v
Gitea Actions (act_runner registered with two labels)
|
+-- "CI" jobs runs-on: ubuntu-latest
| container: mirror.soroushasadi.com/<image>
| all package managers point at Nexus groups
|
+-- "deploy" job runs-on: self-hosted (label "host")
shells into docker compose on the server
reads ENV_FILE secret -> writes .env
Two runner labels are required on the act_runner config:
| Label | Runs where | Used for |
|---|---|---|
ubuntu-latest:docker://node:20-alpine |
Inside a Docker container | build / test / type-check |
self-hosted:host |
Directly on the server shell | the deploy job |
The :host suffix on the second label is what lets the deploy job call docker compose against the host's docker daemon.
Nexus repositories - the four that every project uses
Provisioned once on the server via mirrors/nexus/provision.sh (idempotent):
| Nexus repo | Type | Upstream | Consumed by |
|---|---|---|---|
nuget-group |
NuGet group | nuget-proxy -> api.nuget.org |
dotnet restore in CI + Docker |
npm-group |
npm group | npm-proxy -> registry.npmjs.org |
npm install in CI + Docker |
docker-hub-proxy |
Docker proxy | Docker Hub (or Liara mirror) | mirror.soroushasadi.com/node:..., postgres, redis... |
mcr-proxy |
Docker proxy | mcr.microsoft.com | mirror.soroushasadi.com/dotnet/sdk:..., aspnet |
Optional but useful when projects need them:
| Nexus repo | Type | Upstream | Use |
|---|---|---|---|
pypi-proxy |
PyPI proxy | Liara | pip install |
ubuntu-proxy |
APT proxy | Liara (jammy) | apt-get in Dockerfiles |
ubuntu-security-proxy |
APT proxy | Liara (jammy-security) | apt-get security updates |
The *-group repos are what clients talk to; they hide upstream fallback logic (Liara primary, Runflare/direct fallback). Never point clients at a *-proxy directly - always at the group.
Host docker daemon mirror entry
So that any docker pull on the server (including outside CI) goes through Nexus, drop this into /etc/docker/daemon.json and restart docker:
{ "registry-mirrors": ["https://mirror.soroushasadi.com"] }
CI jobs still reference mirror.soroushasadi.com/<image> explicitly so they work regardless of the runner's docker config.
The workflow file - .gitea/workflows/ci-cd.yml
Skeleton
name: CI/CD
on:
push: { branches: [main] }
pull_request:{ branches: [main] }
concurrency:
group: <project>-cicd-${{ github.ref }}
cancel-in-progress: true
jobs:
# one or more CI jobs (build / test / typecheck) per service
# one deploy job at the bottom that needs: all CI jobs
concurrency.cancel-in-progress: true so rapid pushes don't stack deploys.
CI job - .NET (template)
api-build:
name: "CI - API (dotnet build + test)"
runs-on: ubuntu-latest
container:
image: mirror.soroushasadi.com/dotnet/sdk:<version>
options: --add-host=gitea:host-gateway
services: # optional: integration DB/redis
postgres:
image: mirror.soroushasadi.com/postgres:16-alpine
env: { POSTGRES_DB: app_test, POSTGRES_USER: app, POSTGRES_PASSWORD: test_pass }
options: --health-cmd pg_isready --health-interval 5s --health-timeout 5s --health-retries 10
redis:
image: mirror.soroushasadi.com/redis:7-alpine
options: --health-cmd "redis-cli ping" --health-interval 5s --health-timeout 3s --health-retries 10
steps:
- name: Checkout
env:
TOKEN: ${{ github.token }}
REF: ${{ github.ref }}
run: |
git init
git remote add origin "${{ github.server_url }}/${{ github.repository }}.git"
git config http.extraheader "Authorization: Bearer ${TOKEN}"
git fetch --depth=1 origin "${REF}"
git checkout FETCH_HEAD
- name: Write NuGet config
run: |
cat > /tmp/nuget.ci.config << 'EOF'
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
<clear />
<add key="nexus"
value="https://mirror.soroushasadi.com/repository/nuget-group/index.json"
protocolVersion="3" />
</packageSources>
</configuration>
EOF
- name: Restore
run: dotnet restore src/<Project>/<Project>.csproj --configfile /tmp/nuget.ci.config
env: { DOTNET_CLI_TELEMETRY_OPTOUT: 1 }
- name: Build
run: dotnet build src/<Project>/<Project>.csproj --no-restore -c Release
- name: Test
run: dotnet test --no-build -c Release --logger "console;verbosity=minimal"
env:
ConnectionStrings__DefaultConnection: "Host=postgres;Port=5432;Database=app_test;Username=app;Password=test_pass"
ConnectionStrings__Redis: "redis:6379"
CI job - Node / Next.js (template)
web-check:
name: "CI - Web (tsc)"
runs-on: ubuntu-latest
container:
image: mirror.soroushasadi.com/node:20-alpine
options: --add-host=gitea:host-gateway
steps:
- name: Checkout
env:
TOKEN: ${{ github.token }}
SHA: ${{ github.sha }}
run: |
wget -q --header "Authorization: Bearer ${TOKEN}" \
"${{ github.server_url }}/api/v1/repos/${{ github.repository }}/archive/${SHA}.tar.gz" \
-O /tmp/repo.tar.gz
tar -xzf /tmp/repo.tar.gz --strip-components=1
- name: Install
working-directory: web/<app>
run: npm install --legacy-peer-deps --ignore-scripts \
--registry https://mirror.soroushasadi.com/repository/npm-group/
- name: TypeScript check
working-directory: web/<app>
run: npx tsc --noEmit
The two checkout variants both work; the tarball one is faster when git history isn't needed. Either way DON'T rely on actions/checkout@v4 - on self-hosted Gitea it's not guaranteed to be available.
Deploy job (always self-hosted)
deploy:
name: "Deploy - all services"
runs-on: self-hosted
env:
# act runner host mode starts with minimal PATH - extend so docker/snap are found
PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
needs:
- api-build
- web-check
# ... every CI job
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
timeout-minutes: 40
steps:
- name: Checkout
env:
TOKEN: ${{ github.token }}
REF: ${{ github.ref }}
run: |
git init
git remote add origin "${{ github.server_url }}/${{ github.repository }}.git"
git config http.extraheader "Authorization: Bearer ${TOKEN}"
git fetch --depth=1 origin "${REF}"
git checkout FETCH_HEAD
- name: Write .env
run: printf '%s' "$ENV_FILE" > .env
env: { ENV_FILE: ${{ secrets.ENV_FILE }} }
- name: Build images
run: docker compose build --parallel <svc1> <svc2> ...
env: { DOCKER_BUILDKIT: 1, COMPOSE_DOCKER_CLI_BUILD: 1 }
- name: Start services
run: docker compose up -d --no-deps <svc1> <svc2> ...
- name: Wait for API healthy
run: |
for i in $(seq 1 24); do
STATUS=$(docker inspect --format='{{.State.Health.Status}}' <api-container> 2>/dev/null || echo "missing")
echo " [$i/24] $STATUS"
[ "$STATUS" = "healthy" ] && echo "OK <api-container> healthy" && break
[ "$i" = "24" ] && echo "TIMEOUT <api-container>" && docker compose logs --tail=40 <api> && exit 1
sleep 5
done
- name: Prune old images
if: success()
run: docker image prune -f
Things that bite if you forget them:
runs-on: self-hosted(NOTubuntu-latest) for deploy.- The explicit
PATH:env var - act runners strip PATH and won't find docker. --no-depskeeps a one-service redeploy from cascading restart on databases.- Every compose service that the health-wait loop checks MUST define a
healthcheck:. NEXT_PUBLIC_*env vars are baked at Next.js build time - changes to them inENV_FILEonly take effect after the next CI rebuild.
Dockerfile patterns
.NET image - NuGet through Nexus
Copy nuget.docker.config into the repo with:
<?xml version="1.0" encoding="utf-8"?>
<configuration>
<packageSources>
<clear />
<add key="nexus"
value="https://mirror.soroushasadi.com/repository/nuget-group/index.json"
protocolVersion="3" />
</packageSources>
<config>
<add key="http_retry_count" value="8" />
<add key="http_retry_delay_milliseconds" value="1000" />
</config>
</configuration>
Then in Dockerfile:
FROM mirror.soroushasadi.com/dotnet/sdk:<version> AS build
WORKDIR /src
COPY nuget.docker.config /tmp/nuget.config
COPY src/ .
RUN dotnet restore <Project>.csproj --configfile /tmp/nuget.config
RUN dotnet publish <Project>.csproj -c Release -o /out --no-restore
FROM mirror.soroushasadi.com/dotnet/aspnet:<version>
WORKDIR /app
COPY --from=build /out ./
ENTRYPOINT ["dotnet", "<Project>.dll"]
Node image - npm through Nexus
FROM mirror.soroushasadi.com/node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm install --legacy-peer-deps --ignore-scripts \
--registry https://mirror.soroushasadi.com/repository/npm-group/
COPY . .
RUN npm run build
FROM mirror.soroushasadi.com/node:20-alpine
WORKDIR /app
COPY --from=build /app/.next ./.next
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/package.json ./
EXPOSE 3000
CMD ["npm", "start"]
Python image - pip through Nexus
FROM mirror.soroushasadi.com/python:3.12-slim
RUN pip config set global.index-url https://mirror.soroushasadi.com/repository/pypi-proxy/simple/ \
&& pip config set global.trusted-host mirror.soroushasadi.com
COPY requirements.txt .
RUN pip install -r requirements.txt
Git remote setup on developer machine
# In a fresh repo, after `git init`:
git remote add origin https://github.com/<user>/<repo>.git
git remote add gitea https://git.soroushasadi.com/<user>/<repo>.git
git remote -v
# origin https://github.com/... (fetch+push, GitHub backup)
# gitea https://git.soroushasadi.com/... (fetch+push, CI/CD)
Daily flow:
git push origin main # GitHub backup, no CI runs
git push gitea main # Gitea triggers CI + deploy on main
For a new project: create the repo in Gitea UI first (or via gh-style API), then add the gitea remote locally.
Secrets
One secret rules them all: ENV_FILE. Set at:
https://git.soroushasadi.com/<user>/<repo>/settings/secrets
The deploy job writes it verbatim to .env, which docker compose reads. Contents are project-specific but always include:
ASPNETCORE_ENVIRONMENT=Production(for .NET projects)RUN_MIGRATIONS=trueon first deploy,falseafter- Connection strings (
DB_CONNECTION_STRING, etc.) JWT_KEY- generate withopenssl rand -hex 32NEXT_PUBLIC_*URLs (baked at build time - require CI rerun if changed)CORS_ORIGIN_*for every front-end origin- Provider keys (payment gateway, SMS, etc.)
- Host port mappings (
*_PORTwhile pre-domain)
To rotate any secret: edit ENV_FILE in Gitea, then push any commit to trigger a redeploy.
Default bring-up checklist (superset - PRUNE per project)
This is the maximum case (full-stack app with .NET + Node + Postgres + Redis + deploy to the same server). For every actual project, drop items that don't apply, rephrase items that need different commands, and add items unique to that project. The result is the per-project checklist you should hand the user.
Server one-time setup (skip entirely if Soroush already runs Gitea + Nexus on the target server):
- Docker + docker compose v2 + Gitea + Nexus + act_runner installed.
/etc/docker/daemon.jsonhas the Nexus mirror entry.- Nexus
provision.shhas been run; the four standard groups exist with anonymous read. - act_runner registered with both labels (
ubuntu-latest:docker://...andself-hosted:host).
Per-project setup (adjust per intake answers):
- Repo created on Gitea, mirrored from GitHub (or vice versa).
.gitea/workflows/ci-cd.ymlcommitted - jobs match the services we identified in intake.- Dockerfiles committed for each service that needs an image. (Drop if pure static / library.)
docker-compose.ymlcommitted; every service that the deploy job health-waits on has ahealthcheck:. (Drop if no Docker deploy.)nuget.docker.configcommitted (only if .NET is in the stack)..npmrcor inline--registryflag in CI (only if Node is in the stack).ENV_FILEsecret set on the Gitea repo with every key the deploy job anddocker-compose.ymlreference. (Drop if CI-only project.)- Developer machine has both
originandgitearemotes. - First push:
git push gitea <branch>. Watchhttps://git.soroushasadi.com/<user>/<repo>/actions.
Project-specific items to consider adding:
- Migration runner step /
RUN_MIGRATIONS=trueflag (databases only) - Caddy + Let's Encrypt overlay (when domain is ready)
- One-time seed/import step on first deploy
- Cron job or scheduled job hooks
- External webhook URLs that need to be registered after deploy
- Object storage bucket creation
- DNS A records list
Expected first-run time depends on stack: ~3 min (static site) to ~15 min (full-stack with cold Nexus cache). Subsequent runs are fast.
Adding a new service to an existing pipeline
- Add a new CI job using the Node/.NET/Python template above.
- Add the new job name to
deploy.needs:. - Add the service to
docker-compose.ymlwith ahealthcheck:. - Add a
docker compose buildstep anddocker compose up -dstep in the deploy job. - Add a health-wait loop if it's an API.
- Push to gitea.
Domain / HTTPS cutover (when DNS is ready)
When subdomains resolve to the server:
- Update
ENV_FILE: swap IP-basedNEXT_PUBLIC_*/CORS_*forhttps://*.<domain>, drop the*_PORThost-port vars. - Add
-f docker-compose.caddy.ymlto everydocker composeinvocation in the deploy job. ufw allow 80 && ufw allow 443on the server.- Push to gitea. Caddy issues Let's Encrypt certs on first run; no certbot/manual renewal needed.
A typical docker-compose.caddy.yml overlay defines a caddy service publishing 80/443 with a Caddyfile that reverse-proxies each subdomain to the corresponding internal service.
Troubleshooting
| Symptom | Cause | Fix |
|---|---|---|
| Job hangs at "Pulling image" | Runner can't reach mirror.soroushasadi.com | DNS / Nexus down. curl -s https://mirror.soroushasadi.com/service/rest/v1/status |
dotnet restore returns 401/403 |
Nexus anonymous read disabled | Re-run provision.sh (enables anon access + realms) |
npm install extremely slow first run |
First fetch through npm-proxy is cold |
Wait; subsequent runs hit cached blobs |
Deploy: docker: command not found |
Runner PATH stripped | Confirm env: PATH: line in deploy job |
Deploy: permission denied ... /var/run/docker.sock |
Runner user not in docker group |
usermod -aG docker <runner-user> and restart act_runner |
| Health-wait times out | Service has no healthcheck: defined |
Add HEALTHCHECK in Dockerfile or healthcheck: in compose |
NEXT_PUBLIC_* URL didn't change in browser |
Vars baked at Next.js build time | Push a commit to trigger image rebuild |
| Deploy ran but old code still serving | Container not recreated | Use docker compose up -d --force-recreate <svc> or rebuild image |
| Two pushes only deployed once | concurrency.cancel-in-progress: true cancelled the earlier run |
Expected. Sequence pushes if both must deploy. |
| Gitea checkout returns 401 | Token scope changed or runner re-registered | Workflow uses ${{ github.token }}; re-register runner if compromised |
act_runner won't start |
Token expired | Generate a fresh runner registration token in Gitea admin |
Files / commands cheat sheet
| Thing | Where |
|---|---|
| Workflow | .gitea/workflows/ci-cd.yml |
| NuGet config for CI | nuget.mirror.config (root) |
| NuGet config for Docker builds | nuget.docker.config (root) |
| Host docker mirror entry | /etc/docker/daemon.json |
| Nexus provisioning | mirrors/nexus/provision.sh |
| Liara upstream swap | mirrors/nexus/add-liara-mirrors.sh, update-docker-upstream.sh |
| Nexus compose | docker-compose.mirror.yml |
| Gitea Actions config | Gitea app.ini -> [actions] ENABLED = true |
| act_runner config | /etc/act_runner/config.yaml (labels live here) |
| Nexus health check | curl -s https://mirror.soroushasadi.com/service/rest/v1/status |
| View pipeline | https://git.soroushasadi.com/<user>/<repo>/actions |
Set/rotate ENV_FILE |
https://git.soroushasadi.com/<user>/<repo>/settings/secrets |
Review checklist (apply only the items relevant to this project's stack)
Always check (regardless of stack):
- Every
container.imageandservices.<x>.imageusesmirror.soroushasadi.com/...? - Every container job has
options: --add-host=gitea:host-gateway? - Every checkout step is manual (
git init+ bearer token, or tarball API)? concurrency.cancel-in-progress: trueis set?
If .NET is in the stack:
- dotnet restore uses
--configfilepointing at the Nexus nuget group? nuget.docker.configis present and copied into the Dockerfile?
If Node is in the stack:
- npm install uses
--registry https://mirror.soroushasadi.com/repository/npm-group/? NEXT_PUBLIC_*envs that affect the build are inENV_FILEbefore first build?
If Python is in the stack:
- pip uses
--index-url https://mirror.soroushasadi.com/repository/pypi-proxy/simple/?
If a deploy job exists:
- Deploy job is
runs-on: self-hosted? - Deploy job has the explicit
PATH:env line? - Deploy job is gated by
if: github.event_name == 'push' && github.ref == 'refs/heads/<deploy-branch>'? - Deploy
needs:lists every CI job? ENV_FILEsecret exists on the Gitea repo?
If docker compose is used at deploy time:
- Every compose service that the deploy job health-waits on has a
healthcheck:? --no-depsused so single-service redeploy doesn't cascade?
If migrations / first-run setup exist:
RUN_MIGRATIONS=true(or equivalent) is inENV_FILE, with a plan to flip it later?
If a domain is wired:
- Caddy overlay included in deploy job?
- Ports 80/443 open in server firewall?
- All
CORS_ORIGIN_*andNEXT_PUBLIC_*URLs use the domain, not the IP?
Items that don't apply to this project should be removed from the per-project checklist, not just left unchecked.
🚨 Production Safety Rules (learned from real incidents)
These rules exist because each one caused data loss or downtime on a real project. Run this checklist before any deploy, compose change, or port change.
Before every deploy — data safety
# 1. Back up the DB before touching the container
docker cp <container_name>:/data/<app>.db \
/opt/<project>-backups/<app>-$(date +%Y%m%d-%H%M%S).db
# 2. Scan for orphaned volumes before first deploy on a new server
docker volume ls | grep db_data
docker volume ls | grep uploads_data
Why: When docker compose up runs with a new project name: (or from a
different working directory), Docker creates fresh empty volumes and leaves
old data orphaned in volumes with a different prefix (e.g. hostexecutor_db_data
instead of drsousan_db_data). This caused total data loss on draletaha.ir.
The only recovery was finding the orphaned volume and copying it back.
Restore both volumes together (DB and uploads must be in sync):
docker run --rm -v OLD_db_data:/old -v NEW_db_data:/new alpine sh -c "cp -r /old/. /new/"
docker run --rm -v OLD_uploads:/old -v NEW_uploads:/new alpine sh -c "cp -r /old/. /new/"
Never use docker compose down -v — it deletes named volumes permanently.
Before every deploy — container conflict
docker compose up --force-recreate is unreliable on some Docker versions and
will fail with "container name already in use." Always use explicit stop + rm:
# ✅ Reliable
docker stop <container_name> 2>/dev/null || true
docker rm <container_name> 2>/dev/null || true
docker compose up -d --no-deps <service>
# ❌ Unreliable — do not use
docker compose up -d --force-recreate <service>
Before every deploy — rollback tag
Tag the running image before replacing it so you can roll back instantly:
CURRENT=$(docker inspect <container_name> --format='{{.Config.Image}}')
docker tag "$CURRENT" <registry>/<project>:rollback
If the new container fails its health check, roll back:
docker stop <container_name> && docker rm <container_name>
docker run -d --name <container_name> <registry>/<project>:rollback
Before changing ports in docker-compose.yml
# Check what's already listening on the host
ss -tlnp | grep LISTEN
docker ps --format "table {{.Names}}\t{{.Ports}}"
Never assume a port is free. A port conflict silently breaks other services running on the same host.
Scope every compose command to a single service
# ✅ Touches only this service
docker compose up -d --no-deps <service>
# ❌ Stops ALL services in the compose file — kills unrelated containers
docker compose down
docker compose restart
Never use bare docker compose down in a CI/CD workflow or on a production
server that runs multiple projects.
CI/CD workflow safety checklist
When editing .gitea/workflows/ci-cd.yml:
- Backup step runs before Deploy (
docker cp .../app.db /opt/backups/...) - Deploy step uses
docker stop || true && docker rm || true— no--force-recreate - No
docker compose downanywhere in the workflow - Rollback tag applied before deploy
- Health check loop after deploy, exits non-zero on timeout
- Port in deploy matches
HOST_PORTenv var — verified not already taken - Prune step only removes dangling images for THIS project
Incident log
| When | What broke | Root cause | Fix |
|---|---|---|---|
| 2026-06 | All DB + uploads lost on draletaha.ir | New name: drsousan in compose created fresh volumes; data sat in hostexecutor_db_data |
Restored from orphaned volume; added pre-deploy backup step to CI |
| 2026-06 | Deploy failed twice ("container name in use") | --no-deps then --force-recreate both fail when container already exists |
Replaced with explicit stop + rm + up |
| Earlier | Port conflict broke another service on same host | Port assumed free without checking | Added port audit (ss -tlnp) before mapping new ports |
| Earlier | Unrelated containers stopped on redeploy | Bare docker compose down used in workflow |
Rule: always scope to --no-deps <service> |
| Earlier | No rollback possible after bad deploy | Old container removed before new one was verified healthy | Added rollback tag step + health check gate |