We are seeking a skilled software engineer fluent in Turkish to help develop and evaluate AI benchmarks that rigorously test multilingual language model performance. In this role, you will design and implement high-signal tasks in Turkish that challenge models to operate effectively without defaulting to English as a fallback. Your work will directly inform how AI systems handle real-world linguistic complexity.

Key Responsibilities

Develop and validate tasks for Terminal-Bench that assess AI behavior in Turkish-language environments
Create authentic, non-trivial challenges using native-language datasets and file structures
Identify failure modes in AI responses through targeted prompting and translation analysis
Construct deterministic verifier scripts to ensure consistent evaluation
Adjust task difficulty based on execution outcomes across model variants
Participate in a multi-stage quality assurance process including peer review and calibration audits
Maintain linguistic precision and technical correctness across all benchmark components

Qualifications

You should have substantial experience in software engineering and a deep command of Turkish, including its grammatical structure, stylistic registers, and cultural nuances. Proficiency in Python, shell scripting, and command-line workflows is essential. Familiarity with encoding issues, Unicode handling, locale-specific formatting, and text processing pipelines is required. You must also be comfortable working independently in a fully remote, asynchronous environment.

Preferred candidates will have a background at recognized technology firms or elite engineering institutions. The ideal candidate combines strong technical implementation skills with a nuanced understanding of linguistic behavior in computational settings.

Work Environment

This is a freelance, remote position with complete scheduling flexibility. There are no mandatory check-ins or fixed hours—work when and how much suits you. You’ll collaborate with a global network of language specialists and engineers, contributing to foundational AI evaluation systems. Tasks are designed to be intellectually engaging and technically rigorous, offering opportunities to expand your expertise and professional connections.

Compensation is competitive, with prompt payment processing and no administrative overhead. All work contributes to advancing the reliability and inclusivity of human-AI interaction across languages.

LILT is hiring an AI Benchmark Engineer | Native Language Specialist - Turkish - Remote

Key Responsibilities

Qualifications

Work Environment

Similar Jobs

Data Engineer (Data Infrastructure & Business Insights)

Staff Machine Learning Engineer

Sr. AI Data Engineer (UK Remote)

Data Analyst, Product Line Management - Contract

Generative AI Associate (English)

Machine Learning Engineer Intern 2026/27

Related Articles

Become an AI Developer: Your Career Guide

LILT is hiring an AI Benchmark Engineer | Native Language Specialist - Turkish - Remote

Key Responsibilities

Qualifications

Work Environment

Similar Jobs

Data Engineer ﻿(Data Infrastructure & Business Insights)

Staff Machine Learning Engineer

Sr. AI Data Engineer (UK Remote)

Data Analyst, Product Line Management - Contract

Generative AI Associate (English)

Machine Learning Engineer Intern 2026/27

Related Articles

Become an AI Developer: Your Career Guide

Data Engineer (Data Infrastructure & Business Insights)