Files
Soroush Asadi 136ed700dd init: add soroush-cicd skill + full skills catalog README
- soroush-cicd/SKILL.md: CI/CD method for Gitea + Nexus, production safety rules
- README.md: catalog of 70+ skills organized by category with trigger phrases
2026-06-02 08:57:17 +03:30

30 KiB

name, description
name description
soroush-cicd Soroush's standard CI/CD method - self-hosted Gitea (git.soroushasadi.com) + self-hosted Nexus mirror (mirror.soroushasadi.com) for any project. Use whenever the user says "soroush ci cd method", "my ci cd method", "soroush pipeline", "set up gitea pipeline", "use my nexus mirror", or asks to add CI/CD to a new project. Covers Gitea Actions workflow design, NuGet/npm/Docker/MCR/PyPI proxies through Nexus, runner labels (container vs host), self-hosted deploy job patterns, ENV_FILE secret, dual remote setup (github + gitea), and domain/Caddy cutover.

Soroush CI/CD Method

The canonical recipe Soroush uses to ship any project. Two pieces of infrastructure are always reused across projects; only the per-project workflow YAML and compose files change.

Component URL Role
Gitea https://git.soroushasadi.com Git host + Actions runner (CI/CD trigger)
Nexus https://mirror.soroushasadi.com Pull-through mirror for NuGet, npm, Docker, MCR, PyPI, APT

GitHub is kept as a backup remote (origin), Gitea is the CI remote (gitea). Only pushes to Gitea trigger pipelines.

When to invoke this skill

Trigger automatically when the user says any of:

  • "soroush ci cd method" / "my ci cd method" / "soroush pipeline"
  • "set up gitea pipeline" / "wire this repo to gitea"
  • "use my nexus mirror" / "route through mirror.soroushasadi.com"
  • "add CI/CD to this project"
  • "configure deploy job for the server"

Do not invoke for unrelated CI questions about GitHub Actions on github.com, CircleCI, GitLab.com, etc.

How this skill works - read this first

This is a method, not a copy-paste script. Every project Soroush ships is different - different stacks, different services, different deploy targets, different secrets. The two pieces that NEVER change are:

  1. Push to gitea -> CI runs.
  2. All packages and base images come from mirror.soroushasadi.com.

Everything else (which jobs, which services, which compose files, which health checks, whether there's even a deploy step) is derived per project. Don't paste the templates blindly. Run the intake first, then generate a tailored workflow + checklist.

Step 0 - Per-project intake

Before writing any YAML, gather these answers. Ask the user only the ones you can't infer from reading the repo:

Question Why it matters
What's the project name (for concurrency.group: and container names)? Affects every meezi-cicd-${{ github.ref }} style identifier
What stacks are in this repo? (.NET / Node / Python / Flutter / static) Decides which CI job templates apply
For each service, what's the build command and what does "passing" mean? (build only? tests? tsc? lint?) Defines each CI job's steps:
Does the API need a real Postgres/Redis/Mongo during tests, or are mocks fine? Adds services: block + healthchecks vs not
Is there a deploy target at all, or is this CI-only? (e.g. library, mobile app) Decide whether to include the deploy job
If deploy: same server as Gitea, or remote? Docker compose, k8s, plain systemd, or a static upload? Picks the deploy job pattern (host runner vs SSH vs rsync)
Which services need NEXT_PUBLIC_* (or other build-time) env vars? Those must be in ENV_FILE BEFORE first build
What external services need secrets? (payment, SMS, email, S3, etc.) Defines the ENV_FILE template
Is there already a domain, or IP-only for now? Decides whether to wire Caddy and HTTPS now or later
Which Node / .NET / Python version per service? Sets the exact mirror image tag
Are there migrations or one-shot init steps? Decides RUN_MIGRATIONS flag + first-deploy ordering

Once these are answered, build a project-specific checklist (see Step 0.5 below) and only then generate files.

Step 0.5 - Build the per-project checklist

The checklist below ("First-time bring-up checklist") is a superset - it covers the maximum case. For each project, prune items that don't apply and add items that are unique to that project. Example:

  • Static Next.js site with no backend? Drop the Postgres/Redis, drop RUN_MIGRATIONS, drop the API health-wait loop, keep tsc + deploy.
  • .NET-only API with no frontend? Drop all Node jobs, keep dotnet + postgres service.
  • Library / SDK project? Drop the entire deploy job; keep only build + test on PR.
  • Project where Gitea Actions deploys to a DIFFERENT server than Gitea itself? Replace the self-hosted:host deploy job with an SSH-based deploy job.
  • Mobile Flutter project? Replace dotnet/node images with mirror.soroushasadi.com/cirrusci/flutter:<ver>; deploy job uploads artifacts instead of docker compose up.

Present the tailored checklist to the user BEFORE writing files, so they can confirm or adjust. Format it as a numbered todo so it's actionable.

Mental model

Developer machine
    |  git push origin <branch>   -> GitHub backup, no CI
    |  git push gitea <branch>    -> https://git.soroushasadi.com (TRIGGERS CI)
    v
Gitea Actions (act_runner registered with two labels)
    |
    +-- "CI" jobs    runs-on: ubuntu-latest
    |                  container: mirror.soroushasadi.com/<image>
    |                  all package managers point at Nexus groups
    |
    +-- "deploy" job runs-on: self-hosted   (label "host")
                       shells into docker compose on the server
                       reads ENV_FILE secret -> writes .env

Two runner labels are required on the act_runner config:

Label Runs where Used for
ubuntu-latest:docker://node:20-alpine Inside a Docker container build / test / type-check
self-hosted:host Directly on the server shell the deploy job

The :host suffix on the second label is what lets the deploy job call docker compose against the host's docker daemon.

Nexus repositories - the four that every project uses

Provisioned once on the server via mirrors/nexus/provision.sh (idempotent):

Nexus repo Type Upstream Consumed by
nuget-group NuGet group nuget-proxy -> api.nuget.org dotnet restore in CI + Docker
npm-group npm group npm-proxy -> registry.npmjs.org npm install in CI + Docker
docker-hub-proxy Docker proxy Docker Hub (or Liara mirror) mirror.soroushasadi.com/node:..., postgres, redis...
mcr-proxy Docker proxy mcr.microsoft.com mirror.soroushasadi.com/dotnet/sdk:..., aspnet

Optional but useful when projects need them:

Nexus repo Type Upstream Use
pypi-proxy PyPI proxy Liara pip install
ubuntu-proxy APT proxy Liara (jammy) apt-get in Dockerfiles
ubuntu-security-proxy APT proxy Liara (jammy-security) apt-get security updates

The *-group repos are what clients talk to; they hide upstream fallback logic (Liara primary, Runflare/direct fallback). Never point clients at a *-proxy directly - always at the group.

Host docker daemon mirror entry

So that any docker pull on the server (including outside CI) goes through Nexus, drop this into /etc/docker/daemon.json and restart docker:

{ "registry-mirrors": ["https://mirror.soroushasadi.com"] }

CI jobs still reference mirror.soroushasadi.com/<image> explicitly so they work regardless of the runner's docker config.

The workflow file - .gitea/workflows/ci-cd.yml

Skeleton

name: CI/CD

on:
  push:        { branches: [main] }
  pull_request:{ branches: [main] }

concurrency:
  group: <project>-cicd-${{ github.ref }}
  cancel-in-progress: true

jobs:
  # one or more CI jobs (build / test / typecheck) per service
  # one deploy job at the bottom that needs: all CI jobs

concurrency.cancel-in-progress: true so rapid pushes don't stack deploys.

CI job - .NET (template)

api-build:
  name: "CI - API (dotnet build + test)"
  runs-on: ubuntu-latest
  container:
    image: mirror.soroushasadi.com/dotnet/sdk:<version>
    options: --add-host=gitea:host-gateway
  services:                      # optional: integration DB/redis
    postgres:
      image: mirror.soroushasadi.com/postgres:16-alpine
      env: { POSTGRES_DB: app_test, POSTGRES_USER: app, POSTGRES_PASSWORD: test_pass }
      options: --health-cmd pg_isready --health-interval 5s --health-timeout 5s --health-retries 10
    redis:
      image: mirror.soroushasadi.com/redis:7-alpine
      options: --health-cmd "redis-cli ping" --health-interval 5s --health-timeout 3s --health-retries 10
  steps:
    - name: Checkout
      env:
        TOKEN: ${{ github.token }}
        REF:   ${{ github.ref }}
      run: |
        git init
        git remote add origin "${{ github.server_url }}/${{ github.repository }}.git"
        git config http.extraheader "Authorization: Bearer ${TOKEN}"
        git fetch --depth=1 origin "${REF}"
        git checkout FETCH_HEAD

    - name: Write NuGet config
      run: |
        cat > /tmp/nuget.ci.config << 'EOF'
        <?xml version="1.0" encoding="utf-8"?>
        <configuration>
          <packageSources>
            <clear />
            <add key="nexus"
                 value="https://mirror.soroushasadi.com/repository/nuget-group/index.json"
                 protocolVersion="3" />
          </packageSources>
        </configuration>
        EOF

    - name: Restore
      run: dotnet restore src/<Project>/<Project>.csproj --configfile /tmp/nuget.ci.config
      env: { DOTNET_CLI_TELEMETRY_OPTOUT: 1 }

    - name: Build
      run: dotnet build src/<Project>/<Project>.csproj --no-restore -c Release

    - name: Test
      run: dotnet test --no-build -c Release --logger "console;verbosity=minimal"
      env:
        ConnectionStrings__DefaultConnection: "Host=postgres;Port=5432;Database=app_test;Username=app;Password=test_pass"
        ConnectionStrings__Redis: "redis:6379"

CI job - Node / Next.js (template)

web-check:
  name: "CI - Web (tsc)"
  runs-on: ubuntu-latest
  container:
    image: mirror.soroushasadi.com/node:20-alpine
    options: --add-host=gitea:host-gateway
  steps:
    - name: Checkout
      env:
        TOKEN: ${{ github.token }}
        SHA:   ${{ github.sha }}
      run: |
        wget -q --header "Authorization: Bearer ${TOKEN}" \
          "${{ github.server_url }}/api/v1/repos/${{ github.repository }}/archive/${SHA}.tar.gz" \
          -O /tmp/repo.tar.gz
        tar -xzf /tmp/repo.tar.gz --strip-components=1

    - name: Install
      working-directory: web/<app>
      run: npm install --legacy-peer-deps --ignore-scripts \
             --registry https://mirror.soroushasadi.com/repository/npm-group/

    - name: TypeScript check
      working-directory: web/<app>
      run: npx tsc --noEmit

The two checkout variants both work; the tarball one is faster when git history isn't needed. Either way DON'T rely on actions/checkout@v4 - on self-hosted Gitea it's not guaranteed to be available.

Deploy job (always self-hosted)

deploy:
  name: "Deploy - all services"
  runs-on: self-hosted
  env:
    # act runner host mode starts with minimal PATH - extend so docker/snap are found
    PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
  needs:
    - api-build
    - web-check
    # ... every CI job
  if: github.event_name == 'push' && github.ref == 'refs/heads/main'
  timeout-minutes: 40

  steps:
    - name: Checkout
      env:
        TOKEN: ${{ github.token }}
        REF:   ${{ github.ref }}
      run: |
        git init
        git remote add origin "${{ github.server_url }}/${{ github.repository }}.git"
        git config http.extraheader "Authorization: Bearer ${TOKEN}"
        git fetch --depth=1 origin "${REF}"
        git checkout FETCH_HEAD

    - name: Write .env
      run: printf '%s' "$ENV_FILE" > .env
      env: { ENV_FILE: ${{ secrets.ENV_FILE }} }

    - name: Build images
      run: docker compose build --parallel <svc1> <svc2> ...
      env: { DOCKER_BUILDKIT: 1, COMPOSE_DOCKER_CLI_BUILD: 1 }

    - name: Start services
      run: docker compose up -d --no-deps <svc1> <svc2> ...

    - name: Wait for API healthy
      run: |
        for i in $(seq 1 24); do
          STATUS=$(docker inspect --format='{{.State.Health.Status}}' <api-container> 2>/dev/null || echo "missing")
          echo "  [$i/24] $STATUS"
          [ "$STATUS" = "healthy" ] && echo "OK <api-container> healthy" && break
          [ "$i" = "24" ] && echo "TIMEOUT <api-container>" && docker compose logs --tail=40 <api> && exit 1
          sleep 5
        done

    - name: Prune old images
      if: success()
      run: docker image prune -f

Things that bite if you forget them:

  • runs-on: self-hosted (NOT ubuntu-latest) for deploy.
  • The explicit PATH: env var - act runners strip PATH and won't find docker.
  • --no-deps keeps a one-service redeploy from cascading restart on databases.
  • Every compose service that the health-wait loop checks MUST define a healthcheck:.
  • NEXT_PUBLIC_* env vars are baked at Next.js build time - changes to them in ENV_FILE only take effect after the next CI rebuild.

Dockerfile patterns

.NET image - NuGet through Nexus

Copy nuget.docker.config into the repo with:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <clear />
    <add key="nexus"
         value="https://mirror.soroushasadi.com/repository/nuget-group/index.json"
         protocolVersion="3" />
  </packageSources>
  <config>
    <add key="http_retry_count" value="8" />
    <add key="http_retry_delay_milliseconds" value="1000" />
  </config>
</configuration>

Then in Dockerfile:

FROM mirror.soroushasadi.com/dotnet/sdk:<version> AS build
WORKDIR /src
COPY nuget.docker.config /tmp/nuget.config
COPY src/ .
RUN dotnet restore <Project>.csproj --configfile /tmp/nuget.config
RUN dotnet publish <Project>.csproj -c Release -o /out --no-restore

FROM mirror.soroushasadi.com/dotnet/aspnet:<version>
WORKDIR /app
COPY --from=build /out ./
ENTRYPOINT ["dotnet", "<Project>.dll"]

Node image - npm through Nexus

FROM mirror.soroushasadi.com/node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm install --legacy-peer-deps --ignore-scripts \
      --registry https://mirror.soroushasadi.com/repository/npm-group/
COPY . .
RUN npm run build

FROM mirror.soroushasadi.com/node:20-alpine
WORKDIR /app
COPY --from=build /app/.next ./.next
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/package.json ./
EXPOSE 3000
CMD ["npm", "start"]

Python image - pip through Nexus

FROM mirror.soroushasadi.com/python:3.12-slim
RUN pip config set global.index-url https://mirror.soroushasadi.com/repository/pypi-proxy/simple/ \
 && pip config set global.trusted-host mirror.soroushasadi.com
COPY requirements.txt .
RUN pip install -r requirements.txt

Git remote setup on developer machine

# In a fresh repo, after `git init`:
git remote add origin https://github.com/<user>/<repo>.git
git remote add gitea  https://git.soroushasadi.com/<user>/<repo>.git

git remote -v
# origin   https://github.com/...     (fetch+push, GitHub backup)
# gitea    https://git.soroushasadi.com/...  (fetch+push, CI/CD)

Daily flow:

git push origin main      # GitHub backup, no CI runs
git push gitea  main      # Gitea triggers CI + deploy on main

For a new project: create the repo in Gitea UI first (or via gh-style API), then add the gitea remote locally.

Secrets

One secret rules them all: ENV_FILE. Set at:

https://git.soroushasadi.com/<user>/<repo>/settings/secrets

The deploy job writes it verbatim to .env, which docker compose reads. Contents are project-specific but always include:

  • ASPNETCORE_ENVIRONMENT=Production (for .NET projects)
  • RUN_MIGRATIONS=true on first deploy, false after
  • Connection strings (DB_CONNECTION_STRING, etc.)
  • JWT_KEY - generate with openssl rand -hex 32
  • NEXT_PUBLIC_* URLs (baked at build time - require CI rerun if changed)
  • CORS_ORIGIN_* for every front-end origin
  • Provider keys (payment gateway, SMS, etc.)
  • Host port mappings (*_PORT while pre-domain)

To rotate any secret: edit ENV_FILE in Gitea, then push any commit to trigger a redeploy.

Default bring-up checklist (superset - PRUNE per project)

This is the maximum case (full-stack app with .NET + Node + Postgres + Redis + deploy to the same server). For every actual project, drop items that don't apply, rephrase items that need different commands, and add items unique to that project. The result is the per-project checklist you should hand the user.

Server one-time setup (skip entirely if Soroush already runs Gitea + Nexus on the target server):

  1. Docker + docker compose v2 + Gitea + Nexus + act_runner installed.
  2. /etc/docker/daemon.json has the Nexus mirror entry.
  3. Nexus provision.sh has been run; the four standard groups exist with anonymous read.
  4. act_runner registered with both labels (ubuntu-latest:docker://... and self-hosted:host).

Per-project setup (adjust per intake answers):

  1. Repo created on Gitea, mirrored from GitHub (or vice versa).
  2. .gitea/workflows/ci-cd.yml committed - jobs match the services we identified in intake.
  3. Dockerfiles committed for each service that needs an image. (Drop if pure static / library.)
  4. docker-compose.yml committed; every service that the deploy job health-waits on has a healthcheck:. (Drop if no Docker deploy.)
  5. nuget.docker.config committed (only if .NET is in the stack).
  6. .npmrc or inline --registry flag in CI (only if Node is in the stack).
  7. ENV_FILE secret set on the Gitea repo with every key the deploy job and docker-compose.yml reference. (Drop if CI-only project.)
  8. Developer machine has both origin and gitea remotes.
  9. First push: git push gitea <branch>. Watch https://git.soroushasadi.com/<user>/<repo>/actions.

Project-specific items to consider adding:

  • Migration runner step / RUN_MIGRATIONS=true flag (databases only)
  • Caddy + Let's Encrypt overlay (when domain is ready)
  • One-time seed/import step on first deploy
  • Cron job or scheduled job hooks
  • External webhook URLs that need to be registered after deploy
  • Object storage bucket creation
  • DNS A records list

Expected first-run time depends on stack: ~3 min (static site) to ~15 min (full-stack with cold Nexus cache). Subsequent runs are fast.

Adding a new service to an existing pipeline

  1. Add a new CI job using the Node/.NET/Python template above.
  2. Add the new job name to deploy.needs:.
  3. Add the service to docker-compose.yml with a healthcheck:.
  4. Add a docker compose build step and docker compose up -d step in the deploy job.
  5. Add a health-wait loop if it's an API.
  6. Push to gitea.

Domain / HTTPS cutover (when DNS is ready)

When subdomains resolve to the server:

  1. Update ENV_FILE: swap IP-based NEXT_PUBLIC_* / CORS_* for https://*.<domain>, drop the *_PORT host-port vars.
  2. Add -f docker-compose.caddy.yml to every docker compose invocation in the deploy job.
  3. ufw allow 80 && ufw allow 443 on the server.
  4. Push to gitea. Caddy issues Let's Encrypt certs on first run; no certbot/manual renewal needed.

A typical docker-compose.caddy.yml overlay defines a caddy service publishing 80/443 with a Caddyfile that reverse-proxies each subdomain to the corresponding internal service.

Troubleshooting

Symptom Cause Fix
Job hangs at "Pulling image" Runner can't reach mirror.soroushasadi.com DNS / Nexus down. curl -s https://mirror.soroushasadi.com/service/rest/v1/status
dotnet restore returns 401/403 Nexus anonymous read disabled Re-run provision.sh (enables anon access + realms)
npm install extremely slow first run First fetch through npm-proxy is cold Wait; subsequent runs hit cached blobs
Deploy: docker: command not found Runner PATH stripped Confirm env: PATH: line in deploy job
Deploy: permission denied ... /var/run/docker.sock Runner user not in docker group usermod -aG docker <runner-user> and restart act_runner
Health-wait times out Service has no healthcheck: defined Add HEALTHCHECK in Dockerfile or healthcheck: in compose
NEXT_PUBLIC_* URL didn't change in browser Vars baked at Next.js build time Push a commit to trigger image rebuild
Deploy ran but old code still serving Container not recreated Use docker compose up -d --force-recreate <svc> or rebuild image
Two pushes only deployed once concurrency.cancel-in-progress: true cancelled the earlier run Expected. Sequence pushes if both must deploy.
Gitea checkout returns 401 Token scope changed or runner re-registered Workflow uses ${{ github.token }}; re-register runner if compromised
act_runner won't start Token expired Generate a fresh runner registration token in Gitea admin

Files / commands cheat sheet

Thing Where
Workflow .gitea/workflows/ci-cd.yml
NuGet config for CI nuget.mirror.config (root)
NuGet config for Docker builds nuget.docker.config (root)
Host docker mirror entry /etc/docker/daemon.json
Nexus provisioning mirrors/nexus/provision.sh
Liara upstream swap mirrors/nexus/add-liara-mirrors.sh, update-docker-upstream.sh
Nexus compose docker-compose.mirror.yml
Gitea Actions config Gitea app.ini -> [actions] ENABLED = true
act_runner config /etc/act_runner/config.yaml (labels live here)
Nexus health check curl -s https://mirror.soroushasadi.com/service/rest/v1/status
View pipeline https://git.soroushasadi.com/<user>/<repo>/actions
Set/rotate ENV_FILE https://git.soroushasadi.com/<user>/<repo>/settings/secrets

Review checklist (apply only the items relevant to this project's stack)

Always check (regardless of stack):

  • Every container.image and services.<x>.image uses mirror.soroushasadi.com/...?
  • Every container job has options: --add-host=gitea:host-gateway?
  • Every checkout step is manual (git init + bearer token, or tarball API)?
  • concurrency.cancel-in-progress: true is set?

If .NET is in the stack:

  • dotnet restore uses --configfile pointing at the Nexus nuget group?
  • nuget.docker.config is present and copied into the Dockerfile?

If Node is in the stack:

  • npm install uses --registry https://mirror.soroushasadi.com/repository/npm-group/?
  • NEXT_PUBLIC_* envs that affect the build are in ENV_FILE before first build?

If Python is in the stack:

  • pip uses --index-url https://mirror.soroushasadi.com/repository/pypi-proxy/simple/?

If a deploy job exists:

  • Deploy job is runs-on: self-hosted?
  • Deploy job has the explicit PATH: env line?
  • Deploy job is gated by if: github.event_name == 'push' && github.ref == 'refs/heads/<deploy-branch>'?
  • Deploy needs: lists every CI job?
  • ENV_FILE secret exists on the Gitea repo?

If docker compose is used at deploy time:

  • Every compose service that the deploy job health-waits on has a healthcheck:?
  • --no-deps used so single-service redeploy doesn't cascade?

If migrations / first-run setup exist:

  • RUN_MIGRATIONS=true (or equivalent) is in ENV_FILE, with a plan to flip it later?

If a domain is wired:

  • Caddy overlay included in deploy job?
  • Ports 80/443 open in server firewall?
  • All CORS_ORIGIN_* and NEXT_PUBLIC_* URLs use the domain, not the IP?

Items that don't apply to this project should be removed from the per-project checklist, not just left unchecked.


🚨 Production Safety Rules (learned from real incidents)

These rules exist because each one caused data loss or downtime on a real project. Run this checklist before any deploy, compose change, or port change.

Before every deploy — data safety

# 1. Back up the DB before touching the container
docker cp <container_name>:/data/<app>.db \
  /opt/<project>-backups/<app>-$(date +%Y%m%d-%H%M%S).db

# 2. Scan for orphaned volumes before first deploy on a new server
docker volume ls | grep db_data
docker volume ls | grep uploads_data

Why: When docker compose up runs with a new project name: (or from a different working directory), Docker creates fresh empty volumes and leaves old data orphaned in volumes with a different prefix (e.g. hostexecutor_db_data instead of drsousan_db_data). This caused total data loss on draletaha.ir. The only recovery was finding the orphaned volume and copying it back.

Restore both volumes together (DB and uploads must be in sync):

docker run --rm -v OLD_db_data:/old   -v NEW_db_data:/new   alpine sh -c "cp -r /old/. /new/"
docker run --rm -v OLD_uploads:/old   -v NEW_uploads:/new   alpine sh -c "cp -r /old/. /new/"

Never use docker compose down -v — it deletes named volumes permanently.


Before every deploy — container conflict

docker compose up --force-recreate is unreliable on some Docker versions and will fail with "container name already in use." Always use explicit stop + rm:

# ✅ Reliable
docker stop <container_name> 2>/dev/null || true
docker rm   <container_name> 2>/dev/null || true
docker compose up -d --no-deps <service>

# ❌ Unreliable — do not use
docker compose up -d --force-recreate <service>

Before every deploy — rollback tag

Tag the running image before replacing it so you can roll back instantly:

CURRENT=$(docker inspect <container_name> --format='{{.Config.Image}}')
docker tag "$CURRENT" <registry>/<project>:rollback

If the new container fails its health check, roll back:

docker stop <container_name> && docker rm <container_name>
docker run -d --name <container_name> <registry>/<project>:rollback

Before changing ports in docker-compose.yml

# Check what's already listening on the host
ss -tlnp | grep LISTEN
docker ps --format "table {{.Names}}\t{{.Ports}}"

Never assume a port is free. A port conflict silently breaks other services running on the same host.


Scope every compose command to a single service

# ✅ Touches only this service
docker compose up -d --no-deps <service>

# ❌ Stops ALL services in the compose file — kills unrelated containers
docker compose down
docker compose restart

Never use bare docker compose down in a CI/CD workflow or on a production server that runs multiple projects.


CI/CD workflow safety checklist

When editing .gitea/workflows/ci-cd.yml:

  • Backup step runs before Deploy (docker cp .../app.db /opt/backups/...)
  • Deploy step uses docker stop || true && docker rm || true — no --force-recreate
  • No docker compose down anywhere in the workflow
  • Rollback tag applied before deploy
  • Health check loop after deploy, exits non-zero on timeout
  • Port in deploy matches HOST_PORT env var — verified not already taken
  • Prune step only removes dangling images for THIS project

Incident log

When What broke Root cause Fix
2026-06 All DB + uploads lost on draletaha.ir New name: drsousan in compose created fresh volumes; data sat in hostexecutor_db_data Restored from orphaned volume; added pre-deploy backup step to CI
2026-06 Deploy failed twice ("container name in use") --no-deps then --force-recreate both fail when container already exists Replaced with explicit stop + rm + up
Earlier Port conflict broke another service on same host Port assumed free without checking Added port audit (ss -tlnp) before mapping new ports
Earlier Unrelated containers stopped on redeploy Bare docker compose down used in workflow Rule: always scope to --no-deps <service>
Earlier No rollback possible after bad deploy Old container removed before new one was verified healthy Added rollback tag step + health check gate