Commit Graph

4 Commits

Author SHA1 Message Date
soroush.asadi 213af9db48 AI tag/category assignment + phone extraction from web ads
CI/CD / CI · dotnet build (push) Successful in 2m37s
CI/CD / Deploy · hamkadr (push) Successful in 1m11s
AI (when enabled, now that the server proxy is up):
- AiStructured gains phone, personName, yearsExperience, isLicensed.
- The auditor appends an authoritative output-schema to the admin prompt
  so classification stays correct even with an older stored prompt — it
  now classifies kind as shift|job|talent and extracts the contact phone
  and talent details.
- Ingestion publish prefers the AI's tags (kind/role/city/facility/phone +
  talent fields) over the heuristic parser when present.
- Default prompt updated to describe the three kinds + new fields.

Phone extraction from websites (Medjobs / generic sites), where the
number sits behind a "تماس با این آگهی" reveal:
- HtmlUtil.HarvestPhones scans the full markup for tel: links, JSON-LD
  "telephone", data-*phone* attributes, and inline Iranian mobile/landline
  numbers (Persian digits folded), normalized (mobiles 09…, landlines 0…).
- Medjobs + Website sources append harvested numbers to the ad text so the
  parser/AI capture them; manual review then prefills the phone too.
- Parser phone extraction now also captures a landline as a fallback.

Note: if a site loads the number purely via XHR (not in HTML), a
per-source reveal endpoint would be a follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-08 08:11:14 +03:30
soroush.asadi b1e474ba33 [Ingest] Per-source proxy toggle instead of one global switch
CI/CD / CI · dotnet build (push) Successful in 56s
CI/CD / Deploy · hamkadr (push) Successful in 1m6s
Each ingestion source now decides independently whether to route through the proxy: added TelegramUseProxy/BaleUseProxy/DivarUseProxy/MedjobsUseProxy/WebsitesUseProxy flags (migration). ScrapeHttpClients.For(s, useProxy) takes the source's own flag; a source is proxied only when its flag is on AND a proxy URL is set. Settings 'sources' tab: removed the global enable checkbox, kept the proxy address field, and added an «از پروکسی استفاده شود» checkbox under each source. Old IngestProxyEnabled column kept for compatibility but no longer gates routing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 18:46:48 +03:30
soroush.asadi cea27c8684 [Ingest] Route scraping through an optional V2Ray/Xray proxy (Telegram in Iran)
CI/CD / CI · dotnet build (push) Successful in 53s
CI/CD / Deploy · hamkadr (push) Successful in 1m12s
Telegram and some sources are filtered in Iran. .NET cannot speak vmess/vless/trojan, so add an Xray sidecar (compose service 'xray', behind the 'proxy' profile) that converts the admin's config into a local SOCKS5 proxy (xray:10808). New ScrapeHttpClients provider builds a proxied or direct HttpClient (WebProxy supports socks5/socks4/http) cached per proxy URL; all five ingestion sources (Telegram/Bale/Divar/Medjobs/Websites) now use it. Admin settings gain IngestProxyEnabled + IngestProxyUrl (migration; UI under sources). Added deploy/xray/config.json template + README with vmess/vless/trojan examples.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 17:53:17 +03:30
soroush.asadi e2e26150cb Add medjobs.ir scraper + employer/employee choice at signup
CI/CD / CI · dotnet build (push) Successful in 1m22s
CI/CD / Deploy · hamkadr (push) Successful in 1m37s
- MedjobsListingSource: crawls medjobs.ir sitemaps (ad_listing-sitemapN) → fetches ad pages → title+description → engine (dedupe/parse/validate/publish as SEO job pages). Configured in /Admin/Settings (enable + max ads/run).
- Login/register now asks 'کادر درمان' vs 'کارفرما/مرکز': new accounts get Doctor vs FacilityAdmin role; post-login routes to /Me, /Employer, or /Admin accordingly.
- Verified live: medjobs run fetched real ads into the review queue; employer signup → /Employer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-04 06:12:10 +03:30