LILT is seeking experienced native Turkish-speaking software engineers to design, build, and validate a rigorous evaluation suite of Terminal-Bench tasks that test the limits of large language models on multilingual software challenges. This remote, freelance opportunity focuses on measuring how AI handles Turkish language nuances in coding environments, including locale-specific edge cases and encoding behaviors.
What You'll Do
- Evaluate Coding Agents through task engineering
- Build realistic task environments using datasets and files in your native language, ensuring assets remain in the target language to genuinely measure multilingual handling
- Identify failure points where AI does not work in your native language through prompting and translation
- Support the development of robust solutions with reference implementations
- Write highly reliable, deterministic verifier scripts using rubric-based judging only when strictly necessary
- Analyze execution logs and calibrate task difficulty (Easy to Very Hard) using standard Terminal-Bench run configurations against various model tiers (Haiku, Sonnet, Opus)
- Participate in a rigorous, 4-layer human quality control process (creation, human review, calibration review, and audit) alongside automated LLM-based checks to ensure fairness, grammatical accuracy, and benchmark integrity
What We're Looking For
- 5+ years of industry experience in software engineering
- Proven track record at leading technology companies and/or graduation from top-tier engineering universities
- Native or near-native fluency in Turkish, with a deep understanding of its grammar, register, and phrasing rules
- High English proficiency
- Strong proficiency in Python, standard shell scripting, and data processing
- Extensive experience with Terminal/CLI-based development workflows
- Working familiarity with coding agents
- Deep technical understanding of multilingual text processing pitfalls, including encoding/decoding robustness and Unicode normalization
- Deep technical understanding of multilingual text processing pitfalls, including locale-dependent conventions (collation, casing, non-Gregorian dates)
- Deep technical understanding of multilingual text processing pitfalls, including text I/O, toolchain interoperability, and safe string operations
- For specific languages: Bidirectional/RTL handling, font fallbacks, and rendering/typography in UI or artifacts
Technical Stack
- Python
- shell scripting
- data processing
- Terminal/CLI
- coding agents
- Unicode
- locale handling
- text encoding
Benefits & Compensation
- Work remotely with full schedule flexibility — work when you want, as much or as little as you want
- No fixed hours, no check-ins, no micromanaging
- Get paid quickly and fairly with competitive rates and prompt payments
- No chasing invoices
- Work on projects that actually matter in AI and language technology
- Be part of a global community of linguists, subject matter experts, and language professionals advancing human knowledge
- Access diverse, innovative projects that expand your portfolio and sharpen your skills across industries and domains
- Have fun doing what you love — bring your language skills to life on impactful projects
Work Mode
Fully remote, work from anywhere, any time
LILT is an equal opportunity employer. We extend equal opportunity to all individuals without regard to race, religion, color, national origin, ancestry, sex, sexual orientation, gender identity, age, physical or mental disability, medical condition, genetic characteristics, veteran or marital status, pregnancy, or any other classification protected by applicable local, state or federal laws. We are committed to fair employment and the elimination of discriminatory practices.




