Welo Data works with technology companies to provide datasets that are high-quality, ethically sourced, relevant, diverse, and scalable to supercharge their AI models. As a Welocalize brand, WeloData leverages over 25 years of experience in partnering with the world’s most innovative companies and brings together a curated global community of over 500,000 AI training and domain experts to offer services that span:
ANNOTATION & LABELLING: Transcription, summarization, image and video classification and labeling.
ENHANCING LLMs: Prompt engineering, SFT, RLHF, red teaming and adversarial model training, model output ranking.
DATA COLLECTION & GENERATION: From institutional languages to remote field audio collection.
RELEVANCE & INTENT: Culturally nuanced and aware, ranking, relevance, and evaluation to train models for search, ads, and LLM output.
Want to join our Welo Data team? We bring practical, applied AI expertise to projects. We have both strong academic experience and a deep working knowledge of state-of-the-art AI tools, frameworks, and best practices. Help us elevate our clients' Data at Welo Data.
Project Overview
We are seeking experienced bilingual evaluators to support a multilingual AI safety project focused on evaluating model responses across culturally specific prompt-image datasets.
This project involves applying a structured safety rubric to assess AI-generated responses for appropriateness, safety, and reliability within the target locale’s cultural context.
Each language stream will process approximately 1,000 prompt-image pairs. Every item will receive two independent evaluations, with arbitration applied in cases of disagreement. Evaluations will primarily be documented in English, with a defined in-language sample.
Project Details
Location: Remote – Singapore
Team: Welo Data – AI Services
Engagement Type: Freelance – Remote
Start Date:
Duration: 2-3 weeks
Weekly Commitment: 20–40 hours per week
Schedule Options:
• 4 hours per day, Monday–Friday
OR
• 2 hours per day, Monday–Friday + 10 weekend hours
Hourly rate: 28 USD
Responsibilities
- Evaluate AI-generated responses using a structured safety rubric
- Complete two independent evaluations per item
- Provide concise, well-structured rationales in English
- Participate in calibration sessions
- Support arbitration when evaluation discrepancies occur
- Maintain quality and throughput targets during the evaluation window
Qualifications
- Fluency in the target language and English
- Deep cultural understanding of the target locale
- Strong written English skills for documentation and rationales
- Prior experience in safety evaluation, policy review, content moderation, or rubric-based assessment preferred
- Ability to apply detailed guidelines consistently
- Strong analytical skills and attention to nuance
- Reliable availability during the production window
- Priority may be given to contributors who previously worked on prompt-image or similar evaluation projects to reduce onboarding time and maintain continuity.
Disclaimer: This role involves working with explicit and sensitive content. Applicants should be comfortable working with adult material in a professional capacity. Please apply only if you fully understand and are prepared for the nature of this role.