Reprocess deletes+rebuilds aggregated listings, which changes their IDs. Shift/Job
detail pages are indexed and in the sitemap, so churning them would 404 ranked
URLs. «آماده به کار» pages are NoIndex + Disallow, so rebuilding them has zero SEO
impact — and that's where all the duplicate/sprawl problems were.
ReprocessAsync(talentOnly: true) now only deletes/rebuilds TalentListings and
skips non-talent raws (leaving shift/job listings + their RawListing links
untouched). Admin button relabelled «پردازش مجددِ آماده به کارها (امن برای SEO)».
Shifts/jobs self-clean via normal ingestion turnover.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Qualified live applicants and found three problems, all fixed:
- Duplicate cards: one ad fanned out into «پرستار» + «پرستار کودک» (same person).
Applicants now publish ONE listing (no role fan-out); secondary roles → tags.
- Role sprawl: modifiers became roles. Prompt now returns the BASE profession
and pushes age-group/ward/seniority to tags; new roles only for a genuinely
new base profession (تکنسین داروخانه ✓, پرستار کودک ✗).
- Tag/category noise: categories pinned to the 5 fixed groups (+سایر, never
invented); BuildTags drops pay/contact/location/fragment words.
Reprocess action: IngestionService.ReprocessAsync re-runs the current pipeline
over every stored RawListing WITHOUT re-fetching (keeps the raw text, so nothing
is lost to sources only exposing recent posts), deleting the old aggregated
posts and republishing cleanly. Admin dashboard button «پردازش مجددِ آیتمهای
ذخیرهشده» runs it on a background scope; result lands in the run-log.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
New /Admin/Ingested page lists every crawled item with its outcome, filterable by status (همه/در صف/پرچمخورده/منتشرشده/ردشده) with per-status counts and a link to the published shift or the review page. Linked from the run-history header and the admin panel nav. Plus an inline ✕رد (quick-discard) button on each queue/flagged row so admins can audit without opening the review page; full accept/reject stays on /Admin/Review.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Each ingestion run now records an IngestionRun row (found/queued/published/flagged/spam/duplicates + a per-source detail string). Admin → صف آگهیها shows a «تاریخچه جمعآوری» table of the last 15 runs (hover a row for the per-source breakdown), so admins can see how much each source found vs added over time. IngestionSummary gains TotalFetched. Migration: IngestionRuns table.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- AppSetting gains source config: AutoIngestEnabled, IngestIntervalMinutes, Telegram/Bale/Divar enabled+channels/token/queries
- IListingSource.FetchAsync(AppSetting) — sources read config from DB, not IOptions/appsettings; sample source dev-only
- IngestionWorker reads AutoIngest+interval from DB each cycle (toggle at runtime, no redeploy)
- /Admin/Settings gets a 'منابع جمعآوری' section; removed Ingestion env/appsettings + compose env vars
- ENV_FILE shrinks to HOST_PORT + POSTGRES_* + ADMIN_PHONE (AI + sources are all in-admin); migration
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>