Commit Graph

2 Commits

Author SHA1 Message Date
soroush.asadi 018c0f0286 [Ingest] Tune parser/validator for real Divar+Medjobs data
CI/CD / CI · dotnet build (push) Successful in 2m53s
CI/CD / Deploy · hamkadr (push) Failing after 2m39s
Analyzed live Divar (POST search) and Medjobs (ad_listing sitemaps) data — both are free Persian text. Tighten the medical-relevance gate (drop generic «استخدام»/«شیفت» that match retail/restaurant ads; add clinical terms: بهیار/اتاق عمل/بیهوشی/رادیولوژی/آزمایشگاه/دیالیز/فوریت/تریاژ/… ) so off-topic Divar jobs get flagged, not treated as medical. Add clinical role synonyms in the heuristic parser (بهیار/کمک‌پرستار/سالمند→پرستار, اتاق عمل→تکنسین اتاق عمل, فوریت→فوریت‌های پزشکی, آزمایشگاه→کارشناس آزمایشگاه, فوق‌تخصص→پزشک متخصص…). Result on live data: Medjobs now yields ~9/30 queue-ready healthcare listings; Divar correctly flags ~72/75 noise for manual review.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-07 22:34:05 +03:30
soroush.asadi 931b7b6ffb Add scrape/ingestion engine + validation, and 24h shift hour-range visualization
Scrape engine (Services/Scraping/): pluggable IListingSource (working sample + Telegram/Divar credential-ready stubs) → IngestionService (content-hash dedupe → parse → validate → review queue) → ListingValidator (completeness score + spam screen) → IngestionWorker (config-gated hosted service). RawListing gains ContentHash/Confidence/ValidationNotes; RawListingStatus.Flagged. Admin /Admin gets run-now, source list, confidence + flagged queue.

Hour-range viz: _HourBar 24h timeline bar (colored by type, overnight wrap) on shift cards, recommendation cards, and detail.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-03 08:18:19 +03:30