Files

T

Soroush Asadi 136ed700dd init: add soroush-cicd skill + full skills catalog README

- soroush-cicd/SKILL.md: CI/CD method for Gitea + Nexus, production safety rules
- README.md: catalog of 70+ skills organized by category with trigger phrases

2026-06-02 08:57:17 +03:30

30 KiB

Raw Permalink Blame History

name, description

name	description
soroush-cicd	Soroush's standard CI/CD method - self-hosted Gitea (git.soroushasadi.com) + self-hosted Nexus mirror (mirror.soroushasadi.com) for any project. Use whenever the user says "soroush ci cd method", "my ci cd method", "soroush pipeline", "set up gitea pipeline", "use my nexus mirror", or asks to add CI/CD to a new project. Covers Gitea Actions workflow design, NuGet/npm/Docker/MCR/PyPI proxies through Nexus, runner labels (container vs host), self-hosted deploy job patterns, ENV_FILE secret, dual remote setup (github + gitea), and domain/Caddy cutover.

name

description

soroush-cicd

Soroush's standard CI/CD method - self-hosted Gitea (git.soroushasadi.com) + self-hosted Nexus mirror (mirror.soroushasadi.com) for any project. Use whenever the user says "soroush ci cd method", "my ci cd method", "soroush pipeline", "set up gitea pipeline", "use my nexus mirror", or asks to add CI/CD to a new project. Covers Gitea Actions workflow design, NuGet/npm/Docker/MCR/PyPI proxies through Nexus, runner labels (container vs host), self-hosted deploy job patterns, ENV_FILE secret, dual remote setup (github + gitea), and domain/Caddy cutover.

Soroush CI/CD Method

The canonical recipe Soroush uses to ship any project. Two pieces of infrastructure are always reused across projects; only the per-project workflow YAML and compose files change.

Component	URL	Role
Gitea	https://git.soroushasadi.com	Git host + Actions runner (CI/CD trigger)
Nexus	https://mirror.soroushasadi.com	Pull-through mirror for NuGet, npm, Docker, MCR, PyPI, APT

GitHub is kept as a backup remote (origin), Gitea is the CI remote (gitea). Only pushes to Gitea trigger pipelines.

When to invoke this skill

Trigger automatically when the user says any of:

"soroush ci cd method" / "my ci cd method" / "soroush pipeline"
"set up gitea pipeline" / "wire this repo to gitea"
"use my nexus mirror" / "route through mirror.soroushasadi.com"
"add CI/CD to this project"
"configure deploy job for the server"

Do not invoke for unrelated CI questions about GitHub Actions on github.com, CircleCI, GitLab.com, etc.

How this skill works - read this first

This is a method, not a copy-paste script. Every project Soroush ships is different - different stacks, different services, different deploy targets, different secrets. The two pieces that NEVER change are:

Push to gitea -> CI runs.
All packages and base images come from mirror.soroushasadi.com.

Everything else (which jobs, which services, which compose files, which health checks, whether there's even a deploy step) is derived per project. Don't paste the templates blindly. Run the intake first, then generate a tailored workflow + checklist.

Step 0 - Per-project intake

Before writing any YAML, gather these answers. Ask the user only the ones you can't infer from reading the repo:

Question	Why it matters
What's the project name (for `concurrency.group:` and container names)?	Affects every `meezi-cicd-${{ github.ref }}` style identifier
What stacks are in this repo? (.NET / Node / Python / Flutter / static)	Decides which CI job templates apply
For each service, what's the build command and what does "passing" mean? (build only? tests? tsc? lint?)	Defines each CI job's `steps:`
Does the API need a real Postgres/Redis/Mongo during tests, or are mocks fine?	Adds `services:` block + healthchecks vs not
Is there a deploy target at all, or is this CI-only? (e.g. library, mobile app)	Decide whether to include the `deploy` job
If deploy: same server as Gitea, or remote? Docker compose, k8s, plain systemd, or a static upload?	Picks the deploy job pattern (host runner vs SSH vs rsync)
Which services need `NEXT_PUBLIC_*` (or other build-time) env vars?	Those must be in `ENV_FILE` BEFORE first build
What external services need secrets? (payment, SMS, email, S3, etc.)	Defines the `ENV_FILE` template
Is there already a domain, or IP-only for now?	Decides whether to wire Caddy and HTTPS now or later
Which Node / .NET / Python version per service?	Sets the exact mirror image tag
Are there migrations or one-shot init steps?	Decides `RUN_MIGRATIONS` flag + first-deploy ordering

Once these are answered, build a project-specific checklist (see Step 0.5 below) and only then generate files.

Step 0.5 - Build the per-project checklist

The checklist below ("First-time bring-up checklist") is a superset - it covers the maximum case. For each project, prune items that don't apply and add items that are unique to that project. Example:

Static Next.js site with no backend? Drop the Postgres/Redis, drop RUN_MIGRATIONS, drop the API health-wait loop, keep tsc + deploy.
.NET-only API with no frontend? Drop all Node jobs, keep dotnet + postgres service.
Library / SDK project? Drop the entire deploy job; keep only build + test on PR.
Project where Gitea Actions deploys to a DIFFERENT server than Gitea itself? Replace the self-hosted:host deploy job with an SSH-based deploy job.
Mobile Flutter project? Replace dotnet/node images with mirror.soroushasadi.com/cirrusci/flutter:<ver>; deploy job uploads artifacts instead of docker compose up.

Present the tailored checklist to the user BEFORE writing files, so they can confirm or adjust. Format it as a numbered todo so it's actionable.

Mental model

Developer machine
    |  git push origin <branch>   -> GitHub backup, no CI
    |  git push gitea <branch>    -> https://git.soroushasadi.com (TRIGGERS CI)
    v
Gitea Actions (act_runner registered with two labels)
    |
    +-- "CI" jobs    runs-on: ubuntu-latest
    |                  container: mirror.soroushasadi.com/<image>
    |                  all package managers point at Nexus groups
    |
    +-- "deploy" job runs-on: self-hosted   (label "host")
                       shells into docker compose on the server
                       reads ENV_FILE secret -> writes .env

Two runner labels are required on the act_runner config:

Label	Runs where	Used for
`ubuntu-latest:docker://node:20-alpine`	Inside a Docker container	build / test / type-check
`self-hosted:host`	Directly on the server shell	the `deploy` job

The :host suffix on the second label is what lets the deploy job call docker compose against the host's docker daemon.

Nexus repositories - the four that every project uses

Provisioned once on the server via mirrors/nexus/provision.sh (idempotent):

Nexus repo	Type	Upstream	Consumed by
`nuget-group`	NuGet group	`nuget-proxy` -> api.nuget.org	`dotnet restore` in CI + Docker
`npm-group`	npm group	`npm-proxy` -> registry.npmjs.org	`npm install` in CI + Docker
`docker-hub-proxy`	Docker proxy	Docker Hub (or Liara mirror)	`mirror.soroushasadi.com/node:...`, postgres, redis...
`mcr-proxy`	Docker proxy	mcr.microsoft.com	`mirror.soroushasadi.com/dotnet/sdk:...`, aspnet

Optional but useful when projects need them:

Nexus repo	Type	Upstream	Use
`pypi-proxy`	PyPI proxy	Liara	`pip install`
`ubuntu-proxy`	APT proxy	Liara (jammy)	`apt-get` in Dockerfiles
`ubuntu-security-proxy`	APT proxy	Liara (jammy-security)	`apt-get` security updates

The *-group repos are what clients talk to; they hide upstream fallback logic (Liara primary, Runflare/direct fallback). Never point clients at a *-proxy directly - always at the group.

Host docker daemon mirror entry

So that any docker pull on the server (including outside CI) goes through Nexus, drop this into /etc/docker/daemon.json and restart docker:

{ "registry-mirrors": ["https://mirror.soroushasadi.com"] }

CI jobs still reference mirror.soroushasadi.com/<image> explicitly so they work regardless of the runner's docker config.

The workflow file - `.gitea/workflows/ci-cd.yml`

Skeleton

name: CI/CD

on:
  push:        { branches: [main] }
  pull_request:{ branches: [main] }

concurrency:
  group: <project>-cicd-${{ github.ref }}
  cancel-in-progress: true

jobs:
  # one or more CI jobs (build / test / typecheck) per service
  # one deploy job at the bottom that needs: all CI jobs

concurrency.cancel-in-progress: true so rapid pushes don't stack deploys.

CI job - .NET (template)

api-build:
  name: "CI - API (dotnet build + test)"
  runs-on: ubuntu-latest
  container:
    image: mirror.soroushasadi.com/dotnet/sdk:<version>
    options: --add-host=gitea:host-gateway
  services:                      # optional: integration DB/redis
    postgres:
      image: mirror.soroushasadi.com/postgres:16-alpine
      env: { POSTGRES_DB: app_test, POSTGRES_USER: app, POSTGRES_PASSWORD: test_pass }
      options: --health-cmd pg_isready --health-interval 5s --health-timeout 5s --health-retries 10
    redis:
      image: mirror.soroushasadi.com/redis:7-alpine
      options: --health-cmd "redis-cli ping" --health-interval 5s --health-timeout 3s --health-retries 10
  steps:
    - name: Checkout
      env:
        TOKEN: ${{ github.token }}
        REF:   ${{ github.ref }}
      run: |
        git init
        git remote add origin "${{ github.server_url }}/${{ github.repository }}.git"
        git config http.extraheader "Authorization: Bearer ${TOKEN}"
        git fetch --depth=1 origin "${REF}"
        git checkout FETCH_HEAD

    - name: Write NuGet config
      run: |
        cat > /tmp/nuget.ci.config << 'EOF'
        <?xml version="1.0" encoding="utf-8"?>
        <configuration>
          <packageSources>
            <clear />
            <add key="nexus"
                 value="https://mirror.soroushasadi.com/repository/nuget-group/index.json"
                 protocolVersion="3" />
          </packageSources>
        </configuration>
        EOF

    - name: Restore
      run: dotnet restore src/<Project>/<Project>.csproj --configfile /tmp/nuget.ci.config
      env: { DOTNET_CLI_TELEMETRY_OPTOUT: 1 }

    - name: Build
      run: dotnet build src/<Project>/<Project>.csproj --no-restore -c Release

    - name: Test
      run: dotnet test --no-build -c Release --logger "console;verbosity=minimal"
      env:
        ConnectionStrings__DefaultConnection: "Host=postgres;Port=5432;Database=app_test;Username=app;Password=test_pass"
        ConnectionStrings__Redis: "redis:6379"

CI job - Node / Next.js (template)

web-check:
  name: "CI - Web (tsc)"
  runs-on: ubuntu-latest
  container:
    image: mirror.soroushasadi.com/node:20-alpine
    options: --add-host=gitea:host-gateway
  steps:
    - name: Checkout
      env:
        TOKEN: ${{ github.token }}
        SHA:   ${{ github.sha }}
      run: |
        wget -q --header "Authorization: Bearer ${TOKEN}" \
          "${{ github.server_url }}/api/v1/repos/${{ github.repository }}/archive/${SHA}.tar.gz" \
          -O /tmp/repo.tar.gz
        tar -xzf /tmp/repo.tar.gz --strip-components=1

    - name: Install
      working-directory: web/<app>
      run: npm install --legacy-peer-deps --ignore-scripts \
             --registry https://mirror.soroushasadi.com/repository/npm-group/

    - name: TypeScript check
      working-directory: web/<app>
      run: npx tsc --noEmit

The two checkout variants both work; the tarball one is faster when git history isn't needed. Either way DON'T rely on actions/checkout@v4 - on self-hosted Gitea it's not guaranteed to be available.

Deploy job (always self-hosted)

deploy:
  name: "Deploy - all services"
  runs-on: self-hosted
  env:
    # act runner host mode starts with minimal PATH - extend so docker/snap are found
    PATH: /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin
  needs:
    - api-build
    - web-check
    # ... every CI job
  if: github.event_name == 'push' && github.ref == 'refs/heads/main'
  timeout-minutes: 40

  steps:
    - name: Checkout
      env:
        TOKEN: ${{ github.token }}
        REF:   ${{ github.ref }}
      run: |
        git init
        git remote add origin "${{ github.server_url }}/${{ github.repository }}.git"
        git config http.extraheader "Authorization: Bearer ${TOKEN}"
        git fetch --depth=1 origin "${REF}"
        git checkout FETCH_HEAD

    - name: Write .env
      run: printf '%s' "$ENV_FILE" > .env
      env: { ENV_FILE: ${{ secrets.ENV_FILE }} }

    - name: Build images
      run: docker compose build --parallel <svc1> <svc2> ...
      env: { DOCKER_BUILDKIT: 1, COMPOSE_DOCKER_CLI_BUILD: 1 }

    - name: Start services
      run: docker compose up -d --no-deps <svc1> <svc2> ...

    - name: Wait for API healthy
      run: |
        for i in $(seq 1 24); do
          STATUS=$(docker inspect --format='{{.State.Health.Status}}' <api-container> 2>/dev/null || echo "missing")
          echo "  [$i/24] $STATUS"
          [ "$STATUS" = "healthy" ] && echo "OK <api-container> healthy" && break
          [ "$i" = "24" ] && echo "TIMEOUT <api-container>" && docker compose logs --tail=40 <api> && exit 1
          sleep 5
        done

    - name: Prune old images
      if: success()
      run: docker image prune -f

Things that bite if you forget them:

runs-on: self-hosted (NOT ubuntu-latest) for deploy.
The explicit PATH: env var - act runners strip PATH and won't find docker.
--no-deps keeps a one-service redeploy from cascading restart on databases.
Every compose service that the health-wait loop checks MUST define a healthcheck:.
NEXT_PUBLIC_* env vars are baked at Next.js build time - changes to them in ENV_FILE only take effect after the next CI rebuild.

Dockerfile patterns

.NET image - NuGet through Nexus

Copy nuget.docker.config into the repo with:

<?xml version="1.0" encoding="utf-8"?>
<configuration>
  <packageSources>
    <clear />
    <add key="nexus"
         value="https://mirror.soroushasadi.com/repository/nuget-group/index.json"
         protocolVersion="3" />
  </packageSources>
  <config>
    <add key="http_retry_count" value="8" />
    <add key="http_retry_delay_milliseconds" value="1000" />
  </config>
</configuration>

Then in Dockerfile:

FROM mirror.soroushasadi.com/dotnet/sdk:<version> AS build
WORKDIR /src
COPY nuget.docker.config /tmp/nuget.config
COPY src/ .
RUN dotnet restore <Project>.csproj --configfile /tmp/nuget.config
RUN dotnet publish <Project>.csproj -c Release -o /out --no-restore

FROM mirror.soroushasadi.com/dotnet/aspnet:<version>
WORKDIR /app
COPY --from=build /out ./
ENTRYPOINT ["dotnet", "<Project>.dll"]

Node image - npm through Nexus

FROM mirror.soroushasadi.com/node:20-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm install --legacy-peer-deps --ignore-scripts \
      --registry https://mirror.soroushasadi.com/repository/npm-group/
COPY . .
RUN npm run build

FROM mirror.soroushasadi.com/node:20-alpine
WORKDIR /app
COPY --from=build /app/.next ./.next
COPY --from=build /app/node_modules ./node_modules
COPY --from=build /app/package.json ./
EXPOSE 3000
CMD ["npm", "start"]

Python image - pip through Nexus

FROM mirror.soroushasadi.com/python:3.12-slim
RUN pip config set global.index-url https://mirror.soroushasadi.com/repository/pypi-proxy/simple/ \
 && pip config set global.trusted-host mirror.soroushasadi.com
COPY requirements.txt .
RUN pip install -r requirements.txt

Git remote setup on developer machine

# In a fresh repo, after `git init`:
git remote add origin https://github.com/<user>/<repo>.git
git remote add gitea  https://git.soroushasadi.com/<user>/<repo>.git

git remote -v
# origin   https://github.com/...     (fetch+push, GitHub backup)
# gitea    https://git.soroushasadi.com/...  (fetch+push, CI/CD)

Daily flow:

git push origin main      # GitHub backup, no CI runs
git push gitea  main      # Gitea triggers CI + deploy on main

For a new project: create the repo in Gitea UI first (or via gh-style API), then add the gitea remote locally.

Secrets

One secret rules them all: ENV_FILE. Set at:

https://git.soroushasadi.com/<user>/<repo>/settings/secrets

The deploy job writes it verbatim to .env, which docker compose reads. Contents are project-specific but always include:

ASPNETCORE_ENVIRONMENT=Production (for .NET projects)
RUN_MIGRATIONS=true on first deploy, false after
Connection strings (DB_CONNECTION_STRING, etc.)
JWT_KEY - generate with openssl rand -hex 32
NEXT_PUBLIC_* URLs (baked at build time - require CI rerun if changed)
CORS_ORIGIN_* for every front-end origin
Provider keys (payment gateway, SMS, etc.)
Host port mappings (*_PORT while pre-domain)

To rotate any secret: edit ENV_FILE in Gitea, then push any commit to trigger a redeploy.

Default bring-up checklist (superset - PRUNE per project)

This is the maximum case (full-stack app with .NET + Node + Postgres + Redis + deploy to the same server). For every actual project, drop items that don't apply, rephrase items that need different commands, and add items unique to that project. The result is the per-project checklist you should hand the user.

Server one-time setup (skip entirely if Soroush already runs Gitea + Nexus on the target server):

Docker + docker compose v2 + Gitea + Nexus + act_runner installed.
/etc/docker/daemon.json has the Nexus mirror entry.
Nexus provision.sh has been run; the four standard groups exist with anonymous read.
act_runner registered with both labels (ubuntu-latest:docker://... and self-hosted:host).

Per-project setup (adjust per intake answers):

Repo created on Gitea, mirrored from GitHub (or vice versa).
.gitea/workflows/ci-cd.yml committed - jobs match the services we identified in intake.
Dockerfiles committed for each service that needs an image. (Drop if pure static / library.)
docker-compose.yml committed; every service that the deploy job health-waits on has a healthcheck:. (Drop if no Docker deploy.)
nuget.docker.config committed (only if .NET is in the stack).
.npmrc or inline --registry flag in CI (only if Node is in the stack).
ENV_FILE secret set on the Gitea repo with every key the deploy job and docker-compose.yml reference. (Drop if CI-only project.)
Developer machine has both origin and gitea remotes.
First push: git push gitea <branch>. Watch https://git.soroushasadi.com/<user>/<repo>/actions.

Project-specific items to consider adding:

Migration runner step / RUN_MIGRATIONS=true flag (databases only)
Caddy + Let's Encrypt overlay (when domain is ready)
One-time seed/import step on first deploy
Cron job or scheduled job hooks
External webhook URLs that need to be registered after deploy
Object storage bucket creation
DNS A records list

Expected first-run time depends on stack: ~3 min (static site) to ~15 min (full-stack with cold Nexus cache). Subsequent runs are fast.

Adding a new service to an existing pipeline

Add a new CI job using the Node/.NET/Python template above.
Add the new job name to deploy.needs:.
Add the service to docker-compose.yml with a healthcheck:.
Add a docker compose build step and docker compose up -d step in the deploy job.
Add a health-wait loop if it's an API.
Push to gitea.

Domain / HTTPS cutover (when DNS is ready)

When subdomains resolve to the server:

Update ENV_FILE: swap IP-based NEXT_PUBLIC_* / CORS_* for https://*.<domain>, drop the *_PORT host-port vars.
Add -f docker-compose.caddy.yml to every docker compose invocation in the deploy job.
ufw allow 80 && ufw allow 443 on the server.
Push to gitea. Caddy issues Let's Encrypt certs on first run; no certbot/manual renewal needed.

A typical docker-compose.caddy.yml overlay defines a caddy service publishing 80/443 with a Caddyfile that reverse-proxies each subdomain to the corresponding internal service.

Troubleshooting

Symptom	Cause	Fix
Job hangs at "Pulling image"	Runner can't reach mirror.soroushasadi.com	DNS / Nexus down. `curl -s https://mirror.soroushasadi.com/service/rest/v1/status`
`dotnet restore` returns 401/403	Nexus anonymous read disabled	Re-run `provision.sh` (enables anon access + realms)
`npm install` extremely slow first run	First fetch through `npm-proxy` is cold	Wait; subsequent runs hit cached blobs
Deploy: `docker: command not found`	Runner PATH stripped	Confirm `env: PATH:` line in deploy job
Deploy: `permission denied ... /var/run/docker.sock`	Runner user not in `docker` group	`usermod -aG docker <runner-user>` and restart act_runner
Health-wait times out	Service has no `healthcheck:` defined	Add `HEALTHCHECK` in Dockerfile or `healthcheck:` in compose
`NEXT_PUBLIC_*` URL didn't change in browser	Vars baked at Next.js build time	Push a commit to trigger image rebuild
Deploy ran but old code still serving	Container not recreated	Use `docker compose up -d --force-recreate <svc>` or rebuild image
Two pushes only deployed once	`concurrency.cancel-in-progress: true` cancelled the earlier run	Expected. Sequence pushes if both must deploy.
Gitea checkout returns 401	Token scope changed or runner re-registered	Workflow uses `${{ github.token }}`; re-register runner if compromised
`act_runner` won't start	Token expired	Generate a fresh runner registration token in Gitea admin

Files / commands cheat sheet

Thing	Where
Workflow	`.gitea/workflows/ci-cd.yml`
NuGet config for CI	`nuget.mirror.config` (root)
NuGet config for Docker builds	`nuget.docker.config` (root)
Host docker mirror entry	`/etc/docker/daemon.json`
Nexus provisioning	`mirrors/nexus/provision.sh`
Liara upstream swap	`mirrors/nexus/add-liara-mirrors.sh`, `update-docker-upstream.sh`
Nexus compose	`docker-compose.mirror.yml`
Gitea Actions config	Gitea `app.ini` -> `[actions] ENABLED = true`
act_runner config	`/etc/act_runner/config.yaml` (labels live here)
Nexus health check	`curl -s https://mirror.soroushasadi.com/service/rest/v1/status`
View pipeline	`https://git.soroushasadi.com/<user>/<repo>/actions`
Set/rotate `ENV_FILE`	`https://git.soroushasadi.com/<user>/<repo>/settings/secrets`

Review checklist (apply only the items relevant to this project's stack)

Always check (regardless of stack):

Every container.image and services.<x>.image uses mirror.soroushasadi.com/...?
Every container job has options: --add-host=gitea:host-gateway?
Every checkout step is manual (git init + bearer token, or tarball API)?
concurrency.cancel-in-progress: true is set?

If .NET is in the stack:

dotnet restore uses --configfile pointing at the Nexus nuget group?
nuget.docker.config is present and copied into the Dockerfile?

If Node is in the stack:

npm install uses --registry https://mirror.soroushasadi.com/repository/npm-group/?
NEXT_PUBLIC_* envs that affect the build are in ENV_FILE before first build?

If Python is in the stack:

pip uses --index-url https://mirror.soroushasadi.com/repository/pypi-proxy/simple/?

If a deploy job exists:

Deploy job is runs-on: self-hosted?
Deploy job has the explicit PATH: env line?
Deploy job is gated by if: github.event_name == 'push' && github.ref == 'refs/heads/<deploy-branch>'?
Deploy needs: lists every CI job?
ENV_FILE secret exists on the Gitea repo?

If docker compose is used at deploy time:

Every compose service that the deploy job health-waits on has a healthcheck:?
--no-deps used so single-service redeploy doesn't cascade?

If migrations / first-run setup exist:

RUN_MIGRATIONS=true (or equivalent) is in ENV_FILE, with a plan to flip it later?

If a domain is wired:

Caddy overlay included in deploy job?
Ports 80/443 open in server firewall?
All CORS_ORIGIN_* and NEXT_PUBLIC_* URLs use the domain, not the IP?

Items that don't apply to this project should be removed from the per-project checklist, not just left unchecked.

🚨 Production Safety Rules (learned from real incidents)

These rules exist because each one caused data loss or downtime on a real project. Run this checklist before any deploy, compose change, or port change.

Before every deploy — data safety

# 1. Back up the DB before touching the container
docker cp <container_name>:/data/<app>.db \
  /opt/<project>-backups/<app>-$(date +%Y%m%d-%H%M%S).db

# 2. Scan for orphaned volumes before first deploy on a new server
docker volume ls | grep db_data
docker volume ls | grep uploads_data

Why: When docker compose up runs with a new project name: (or from a different working directory), Docker creates fresh empty volumes and leaves old data orphaned in volumes with a different prefix (e.g. hostexecutor_db_data instead of drsousan_db_data). This caused total data loss on draletaha.ir. The only recovery was finding the orphaned volume and copying it back.

Restore both volumes together (DB and uploads must be in sync):

docker run --rm -v OLD_db_data:/old   -v NEW_db_data:/new   alpine sh -c "cp -r /old/. /new/"
docker run --rm -v OLD_uploads:/old   -v NEW_uploads:/new   alpine sh -c "cp -r /old/. /new/"

Never use docker compose down -v — it deletes named volumes permanently.

Before every deploy — container conflict

docker compose up --force-recreate is unreliable on some Docker versions and will fail with "container name already in use." Always use explicit stop + rm:

# ✅ Reliable
docker stop <container_name> 2>/dev/null || true
docker rm   <container_name> 2>/dev/null || true
docker compose up -d --no-deps <service>

# ❌ Unreliable — do not use
docker compose up -d --force-recreate <service>

Before every deploy — rollback tag

Tag the running image before replacing it so you can roll back instantly:

CURRENT=$(docker inspect <container_name> --format='{{.Config.Image}}')
docker tag "$CURRENT" <registry>/<project>:rollback

If the new container fails its health check, roll back:

docker stop <container_name> && docker rm <container_name>
docker run -d --name <container_name> <registry>/<project>:rollback

Before changing ports in docker-compose.yml

# Check what's already listening on the host
ss -tlnp | grep LISTEN
docker ps --format "table {{.Names}}\t{{.Ports}}"

Never assume a port is free. A port conflict silently breaks other services running on the same host.

Scope every compose command to a single service

# ✅ Touches only this service
docker compose up -d --no-deps <service>

# ❌ Stops ALL services in the compose file — kills unrelated containers
docker compose down
docker compose restart

Never use bare docker compose down in a CI/CD workflow or on a production server that runs multiple projects.

CI/CD workflow safety checklist

When editing .gitea/workflows/ci-cd.yml:

Backup step runs before Deploy (docker cp .../app.db /opt/backups/...)
Deploy step uses docker stop || true && docker rm || true — no --force-recreate
No docker compose down anywhere in the workflow
Rollback tag applied before deploy
Health check loop after deploy, exits non-zero on timeout
Port in deploy matches HOST_PORT env var — verified not already taken
Prune step only removes dangling images for THIS project

Incident log

When	What broke	Root cause	Fix
2026-06	All DB + uploads lost on draletaha.ir	New `name: drsousan` in compose created fresh volumes; data sat in `hostexecutor_db_data`	Restored from orphaned volume; added pre-deploy backup step to CI
2026-06	Deploy failed twice ("container name in use")	`--no-deps` then `--force-recreate` both fail when container already exists	Replaced with explicit `stop + rm + up`
Earlier	Port conflict broke another service on same host	Port assumed free without checking	Added port audit (`ss -tlnp`) before mapping new ports
Earlier	Unrelated containers stopped on redeploy	Bare `docker compose down` used in workflow	Rule: always scope to `--no-deps <service>`
Earlier	No rollback possible after bad deploy	Old container removed before new one was verified healthy	Added rollback tag step + health check gate

30 KiB Raw Permalink Blame History