Responsibilities

Assess the performance of AI coding agents in real-world scenarios
Create authentic task environments using datasets and files in Czech to test true multilingual capabilities
Identify cases where AI systems fail to understand or generate correct output in Czech
Help develop strong reference solutions and write precise, deterministic verification scripts, using rubric-based evaluation only when essential
Review execution logs and adjust task difficulty levels from Easy to Very Hard using standard Terminal-Bench configurations across different model types
Engage in a four-stage human quality assurance process—task creation, human review, calibration review, and audit—combined with automated LLM checks to maintain fairness, linguistic correctness, and benchmark reliability

Benefits

Receive competitive compensation
Enjoy engaging and meaningful work
Contribute to advancements in artificial intelligence and language technology
Connect with professionals in a collaborative, global community
Apply through a simplified process designed for skilled contributors

Compensation

Paid work

Work Arrangement

Remote

Team

Global, multilingual team

Target Languages

Spanish, German, Czech, Turkish, Arabic (Egyptian), Korean, Japanese, Hausa, Hindi, Marathi

Work Flexibility

Work on varied projects remotely, at times convenient for you

Payment Terms

Receive timely and equitable payments

Application Requirements

Submit your CV in English

Hiring Process

AI and automated tools may assist in screening résumés, scoring assessments, and analyzing interviews
Final hiring decisions are made by human reviewers
Candidates can choose to opt out of AI-assisted hiring by contacting recruiting@lilt.com
The company follows fair, inclusive, and transparent hiring practices

Equal Opportunity Employer

Does not discriminate based on race, religion, color, national origin, ancestry, sex, sexual orientation, gender identity, age, disability, medical condition, genetic characteristics, veteran status, marital status, pregnancy, or other protected categories

Not applicable

LILT is hiring an AI Benchmark Engineer - Native Language Specialist | Czech

Responsibilities

Benefits

Compensation

Work Arrangement

Team

Target Languages

Work Flexibility

Payment Terms

Application Requirements

Hiring Process

Equal Opportunity Employer

Similar Jobs

AI Software Developer

Machine Learning Evaluation Specialist - Remote

Data Scientist | Health

Talent Pool - Internship

Senior Product Analyst

Senior Data Scientist

Related Articles

Become an AI Developer: Your Career Guide