Commit Graph

18 Commits

Author SHA1 Message Date
soroush.asadi e6a796ab27 Match crawled listings to existing facilities (fuzzy) before creating new
CI/CD / CI · dotnet build (push) Successful in 1m28s
CI/CD / Deploy · hamkadr (push) Successful in 2m24s
When publishing a scraped listing we now look for a facility we already
have that is exactly or closely the same, and only create a new one when
there is no match — avoiding duplicates like «بیمارستان میلاد» vs «میلاد».

- ListingParser: extract a facility name (keyword + distinctive words) from
  the post and surface it in the parser notes.
- FacilityMatcher: Persian-aware normalization (ي/ك, ZWNJ, punctuation),
  type-word stripping for a "core" name, contains + Levenshtein similarity,
  and FindBest (same-city exact → any-city exact → same-city fuzzy → fuzzy).
- Review (manual publish): auto-select a matching facility or prefill the
  new-facility name; resolve-or-create uses fuzzy match; dropdown preselects.
- IngestionService (auto-publish): reuse FacilityMatcher against a run-wide
  facility list (grows as new ones are created) instead of exact-name only.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 07:14:48 +03:30
soroush.asadi 487c7ca82f [Ingest] Persistent crawl run-log + per-source breakdown on admin queue
CI/CD / CI · dotnet build (push) Has been cancelled
CI/CD / Deploy · hamkadr (push) Has been cancelled
Each ingestion run now records an IngestionRun row (found/queued/published/flagged/spam/duplicates + a per-source detail string). Admin → صف آگهی‌ها shows a «تاریخچه جمع‌آوری» table of the last 15 runs (hover a row for the per-source breakdown), so admins can see how much each source found vs added over time. IngestionSummary gains TotalFetched. Migration: IngestionRuns table.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 06:23:58 +03:30
soroush.asadi 524c66e25e [Admin] VPN/proxy + AI test buttons; fix AI JSON parse crash on null fields
CI/CD / CI · dotnet build (push) Successful in 2m41s
CI/CD / Deploy · hamkadr (push) Failing after 2m56s
Add «تست اتصال VPN/پروکسی» (reaches a filtered site through the proxy and reports connected/latency) and «تست هوش مصنوعی» (sends a sample post through the configured model and shows the verdict + extracted fields) to admin Settings. Fix OpenAiCompatibleAuditor.ParseVerdict: TryGetInt32/64 threw on null/string JSON values (the model commonly returns payAmount/sharePercent as null), which silently failed every audit — now guarded on ValueKind==Number. Verified the real OpenAI key extracts perfectly (approve / role=پرستار / city=تهران / shift=night).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 23:23:02 +03:30
soroush.asadi 0c49b89891 [AI] Route AI calls through the Xray/V2Ray proxy (reach OpenAI from Iran)
CI/CD / CI · dotnet build (push) Successful in 1m46s
CI/CD / Deploy · hamkadr (push) Failing after 1m58s
Add AiUseProxy setting + a toggle in the AI settings section. ScrapeHttpClients.ForAi(settings) returns a proxied HttpClient (reusing IngestProxyUrl, 100s timeout) when AiUseProxy is on, otherwise direct; AI-cache keys are protected from the scrape-client cleanup. OpenAiCompatibleAuditor now uses it, so the AI auditor (e.g. api.openai.com) is reachable through the same Xray sidecar that serves Telegram. Migration adds the column.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 22:55:07 +03:30
soroush.asadi 018c0f0286 [Ingest] Tune parser/validator for real Divar+Medjobs data
CI/CD / CI · dotnet build (push) Successful in 2m53s
CI/CD / Deploy · hamkadr (push) Failing after 2m39s
Analyzed live Divar (POST search) and Medjobs (ad_listing sitemaps) data — both are free Persian text. Tighten the medical-relevance gate (drop generic «استخدام»/«شیفت» that match retail/restaurant ads; add clinical terms: بهیار/اتاق عمل/بیهوشی/رادیولوژی/آزمایشگاه/دیالیز/فوریت/تریاژ/… ) so off-topic Divar jobs get flagged, not treated as medical. Add clinical role synonyms in the heuristic parser (بهیار/کمک‌پرستار/سالمند→پرستار, اتاق عمل→تکنسین اتاق عمل, فوریت→فوریت‌های پزشکی, آزمایشگاه→کارشناس آزمایشگاه, فوق‌تخصص→پزشک متخصص…). Result on live data: Medjobs now yields ~9/30 queue-ready healthcare listings; Divar correctly flags ~72/75 noise for manual review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 22:34:05 +03:30
soroush.asadi 2485173aad [Ingest] Fix Divar: use POST search API (GET was anti-bot blocked)
CI/CD / CI · dotnet build (push) Successful in 1m51s
CI/CD / Deploy · hamkadr (push) Successful in 2m8s
Divar's /v8/web-search GET returns a BLOCKING_VIEW (anti-bot), so the old source pulled nothing useful and could scrape the block message. Switch to the working POST /v8/postlist/w/search with a browser User-Agent and a city-id map (numeric id passthrough; tehran=1 default). Skip responses that are non-2xx or contain BLOCKING_VIEW so the block page is never ingested. Verified locally: fetched 25 real Tehran job posts into the review queue, 0 block messages.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 21:23:36 +03:30
soroush.asadi b1e474ba33 [Ingest] Per-source proxy toggle instead of one global switch
CI/CD / CI · dotnet build (push) Successful in 56s
CI/CD / Deploy · hamkadr (push) Successful in 1m6s
Each ingestion source now decides independently whether to route through the proxy: added TelegramUseProxy/BaleUseProxy/DivarUseProxy/MedjobsUseProxy/WebsitesUseProxy flags (migration). ScrapeHttpClients.For(s, useProxy) takes the source's own flag; a source is proxied only when its flag is on AND a proxy URL is set. Settings 'sources' tab: removed the global enable checkbox, kept the proxy address field, and added an «از پروکسی استفاده شود» checkbox under each source. Old IngestProxyEnabled column kept for compatibility but no longer gates routing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 18:46:48 +03:30
soroush.asadi cea27c8684 [Ingest] Route scraping through an optional V2Ray/Xray proxy (Telegram in Iran)
CI/CD / CI · dotnet build (push) Successful in 53s
CI/CD / Deploy · hamkadr (push) Successful in 1m12s
Telegram and some sources are filtered in Iran. .NET cannot speak vmess/vless/trojan, so add an Xray sidecar (compose service 'xray', behind the 'proxy' profile) that converts the admin's config into a local SOCKS5 proxy (xray:10808). New ScrapeHttpClients provider builds a proxied or direct HttpClient (WebProxy supports socks5/socks4/http) cached per proxy URL; all five ingestion sources (Telegram/Bale/Divar/Medjobs/Websites) now use it. Admin settings gain IngestProxyEnabled + IngestProxyUrl (migration; UI under sources). Added deploy/xray/config.json template + README with vmess/vless/trojan examples.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 17:53:17 +03:30
soroush.asadi 8fad9c1bb6 [Admin] Notification channel toggles (web/SMS/push active-deactive)
CI/CD / CI · dotnet build (push) Successful in 50s
CI/CD / Deploy · hamkadr (push) Successful in 1m1s
Add a 'notification channels' card at the top of admin Settings with three master on/off checkboxes: web/in-app (new WebNotificationsEnabled, default true), SMS (existing SmsEnabled), and Web Push (existing PushEnabled). Removed the duplicate enable checkboxes from the SMS and Push sections so each binds once. NotificationService now gates the in-app + live SSE channel on WebNotificationsEnabled; push self-gates on PushEnabled. Migration defaults the new column to true so existing installs keep web notifications on.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 15:56:40 +03:30
soroush.asadi 0c0449c2b9 [Demo] Add admin demo-mode toggle + generic website ingest source
CI/CD / CI · dotnet build (push) Failing after 1m40s
CI/CD / Deploy · hamkadr (push) Has been skipped
- AppSetting: DemoMode, WebsitesEnabled, WebsiteUrls
- Facility.IsDemo flag; SeedData split into SeedReferenceAsync (always)
  + SeedDemoAsync/ClearDemoAsync (idempotent, toggleable at runtime)
- WebsiteListingSource: scrape any admin-configured URL (og:title + content)
- Admin Settings: seed/clear demo card, demo-mode checkbox, website source
  fields; Program.cs seeds demo when DemoMode on (or in Development)
- EF migration DemoModeAndWebsites

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 13:43:07 +03:30
soroush.asadi a02eb6a985 PWA: installable app (web/win/android/ios) + download/help page + push notifications
CI/CD / CI · dotnet build (push) Successful in 40s
CI/CD / Deploy · hamkadr (push) Successful in 55s
- manifest.webmanifest + service worker (offline shell + push + notificationclick) + PNG icons (192/512/apple) + iOS meta + SW registration → installable everywhere
- /Download page: per-OS install help (web/windows/android/ios), install button (beforeinstallprompt), 'enable notifications' flow, usage guide, Bazaar/TWA note; nav + footer links
- Web Push foundation: WebPushSubscription entity + /push/subscribe (stores), VAPID + push settings in /Admin/Settings, on-device local notification; server broadcast documented (WebPush via Nexus)
- docs/PWA-TWA.md: VAPID keygen, server-push wiring, Bubblewrap→Cafe Bazaar + assetlinks steps
- Verified: manifest/sw/icons served, download page, subscribe stores (200), layout wired

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 11:23:13 +03:30
soroush.asadi 9a92da42e6 Facility location: click-to-pick Neshan map + 'my current location'
CI/CD / CI · dotnet build (push) Successful in 37s
CI/CD / Deploy · hamkadr (push) Successful in 40s
- RegisterFacility: '📍 موقعیت فعلی من' (browser geolocation, always available) + Neshan Leaflet map (click/drag marker → fills lat/lng) when a Neshan web key is set; graceful fallback to manual coords without a key
- AppSetting.NeshanMapKey configured in /Admin/Settings (Google Maps is blocked in Iran); migration
- Verified: location button + inputs render always; map + SDK render once the key is saved

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 10:47:33 +03:30
soroush.asadi 17d38431bf Add SEO sitemap/robots + real SMS OTP (Kavenegar, admin-configured)
CI/CD / CI · dotnet build (push) Successful in 31s
CI/CD / Deploy · hamkadr (push) Successful in 56s
- /sitemap.xml (static pages + open shifts + fresh jobs, respecting expiry) + /robots.txt (blocks /Admin,/Employer); base URL from forwarded request → https://hamkadr.ir in prod
- ISmsSender + KavenegarSmsSender (verify/lookup template, sms/send fallback); SMS settings (enabled/apikey/template/sender) in /Admin/Settings; OtpService.IssueAsync sends SMS and stops revealing the code when enabled (dev still shows it); migration

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 10:27:21 +03:30
soroush.asadi 6d2ad6f87e Hide + archive stale listings (old jobs, expired shifts)
CI/CD / CI · dotnet build (push) Successful in 37s
CI/CD / Deploy · hamkadr (push) Successful in 48s
- ListingPolicy.JobFreshnessDays=30: public /Jobs and home hide jobs older than the cutoff (shifts already require Date>=today)
- ListingArchiver flips stale Open→Expired: shifts past their date, jobs older than the cutoff. Runs at startup and on every IngestionWorker cycle (independent of ingestion being enabled)
- Verified: backdated job dropped off /Jobs (6→5) and was archived to Expired on the sweep

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 09:57:06 +03:30
soroush.asadi e2e26150cb Add medjobs.ir scraper + employer/employee choice at signup
CI/CD / CI · dotnet build (push) Successful in 1m22s
CI/CD / Deploy · hamkadr (push) Successful in 1m37s
- MedjobsListingSource: crawls medjobs.ir sitemaps (ad_listing-sitemapN) → fetches ad pages → title+description → engine (dedupe/parse/validate/publish as SEO job pages). Configured in /Admin/Settings (enable + max ads/run).
- Login/register now asks 'کادر درمان' vs 'کارفرما/مرکز': new accounts get Doctor vs FacilityAdmin role; post-login routes to /Me, /Employer, or /Admin accordingly.
- Verified live: medjobs run fetched real ads into the review queue; employer signup → /Employer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 06:12:10 +03:30
soroush.asadi 3c08c1a265 Move ingestion + Telegram/Bale/Divar config to DB-backed admin settings
CI/CD / CI · dotnet build (push) Successful in 6m22s
CI/CD / Deploy · hamkadr (push) Failing after 3s
- AppSetting gains source config: AutoIngestEnabled, IngestIntervalMinutes, Telegram/Bale/Divar enabled+channels/token/queries
- IListingSource.FetchAsync(AppSetting) — sources read config from DB, not IOptions/appsettings; sample source dev-only
- IngestionWorker reads AutoIngest+interval from DB each cycle (toggle at runtime, no redeploy)
- /Admin/Settings gets a 'منابع جمع‌آوری' section; removed Ingestion env/appsettings + compose env vars
- ENV_FILE shrinks to HOST_PORT + POSTGRES_* + ADMIN_PHONE (AI + sources are all in-admin); migration

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 00:44:11 +03:30
soroush.asadi 36bb165438 Real channel fetch (Telegram/Bale/Divar) + AI-audited automation engine + CI/CD
- Fetch: Telegram via t.me/s, Bale via Bot API, Divar via web-search (HttpClient, config-gated, graceful)
- AI layer: DB-backed AppSetting (mode auto/manual, thresholds, AI endpoint/model/key/prompt/framework, auto-approve); OpenAI-compatible IAiAuditor (self-host/Iranian endpoints; fails safe to manual)
- Pipeline: fetch → dedupe(hash) → parse → validate → AI audit → Discard/Flag/Queue/auto-publish (resolve-or-create facility)
- Admin: /Admin/Settings automation+AI panel; queue shows confidence + AI verdict; flagged section
- CI/CD: Dockerfile, docker-compose.prod.yml, .gitea/workflows/ci-cd.yml, nginx vhost, DEPLOY.md; forwarded headers + /healthz + prod reference-only seed; ports 22/80/443 only

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 17:41:02 +03:30
soroush.asadi 931b7b6ffb Add scrape/ingestion engine + validation, and 24h shift hour-range visualization
Scrape engine (Services/Scraping/): pluggable IListingSource (working sample + Telegram/Divar credential-ready stubs) → IngestionService (content-hash dedupe → parse → validate → review queue) → ListingValidator (completeness score + spam screen) → IngestionWorker (config-gated hosted service). RawListing gains ContentHash/Confidence/ValidationNotes; RawListingStatus.Flagged. Admin /Admin gets run-now, source list, confidence + flagged queue.

Hour-range viz: _HourBar 24h timeline bar (colored by type, overnight wrap) on shift cards, recommendation cards, and detail.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 08:18:19 +03:30