LLM Reasoning 相关度: 6/10

VietJobs: A Vietnamese Job Advertisement Dataset

Hieu Pham Dinh, Hung Nguyen Huy, Mo El-Haj
arXiv: 2603.05262v1 发布: 2026-03-05 更新: 2026-03-05

AI 摘要

发布了首个大规模越南语招聘广告数据集,并评估了LLM在招聘任务上的表现。

主要贡献

  • 构建并发布了大规模越南语招聘广告数据集VietJobs
  • 在VietJobs上评估了多个LLM在职位分类和薪资预测任务上的表现
  • 为越南语NLP和劳动力市场分析提供了新基准

方法论

构建数据集后,在职位分类和薪资预测任务上,使用了few-shot和finetune设置评估了多种LLM,并分析了结果。

原文摘要

VietJobs is the first large-scale, publicly available corpus of Vietnamese job advertisements, comprising 48,092 postings and over 15 million words collected from all 34 provinces and municipalities across Vietnam. The dataset provides extensive linguistic and structured information, including job titles, categories, salaries, skills, and employment conditions, covering 16 occupational domains and multiple employment types (full-time, part-time, and internship). Designed to support research in natural language processing and labour market analytics, VietJobs captures substantial linguistic, regional, and socio-economic diversity. We benchmark several generative large language models (LLMs) on two core tasks: job category classification and salary estimation. Instruction-tuned models such as Qwen2.5-7B-Instruct and Llama-SEA-LION-v3-8B-IT demonstrate notable gains under few-shot and fine-tuned settings, while highlighting challenges in multilingual and Vietnamese-specific modelling for structured labour market prediction. VietJobs establishes a new benchmark for Vietnamese NLP and offers a valuable foundation for future research on recruitment language, socio-economic representation, and AI-driven labour market analysis. All code and resources are available at: https://github.com/VinNLP/VietJobs.

标签

Vietnamese NLP Labor Market Analysis Job Advertisement LLM Evaluation

arXiv 分类

cs.CL