Extract Iranian salary shorthand (X تومان = millions) + pay backfill
Parser: most jobs read «توافقی» because the amount extractor only saw 6–10 digit numbers, missing the way Iranian ads actually state pay — «۱۵ تومان»، «۴۰ تا ۵۰ تومان»، «۲۰ میلیون»، «۲۰م» all mean MILLIONS of toman. Add colloquial detection (1–3 digit number + تومان/م/میلیون → ×1,000,000, lower bound of a range), guarded so it never matches dates/hours or a long literal-toman figure. Also: a stated amount now wins over «توافقی» (ads often say a number AND «… بقیه توافقی»). Backfill: BackfillPayAsync re-parses existing aggregated jobs/talent that have no salary and fills it in place (no AI, no ID/URL change) — wired into the post-ingest auto-cleanup and exposed as an admin button. Existing «توافقی» listings with a stated number get their salary; genuinely-negotiable ads stay توافقی. Also improves the baseSalary in JobPosting rich results. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
@@ -139,6 +139,15 @@ public class IndexModel : PageModel
|
||||
return RedirectToPage();
|
||||
}
|
||||
|
||||
/// <summary>Fill missing salary on existing aggregated listings from the stored text (now reading
|
||||
/// Iranian «X تومان» = millions shorthand). In place — no AI, no ID/URL change.</summary>
|
||||
public async Task<IActionResult> OnPostBackfillPayAsync()
|
||||
{
|
||||
var n = await _ingest.BackfillPayAsync();
|
||||
IngestMessage = $"حقوق برای {n} آگهیِ «توافقی» که در متن مبلغ داشت (مثل «۴۰ تا ۵۰ تومان») استخراج و ثبت شد. بدون تغییر شناسه/آدرس.";
|
||||
return RedirectToPage();
|
||||
}
|
||||
|
||||
/// <summary>
|
||||
/// In-place cleanup of existing aggregated jobs/shifts: ARCHIVE (hide, keep the row) only the
|
||||
/// out-of-scope ones (domestic-helper / promotional / spam) per the current validator, plus
|
||||
|
||||
Reference in New Issue
Block a user