Berlin, Germany Hybrid 60000-100000

ellamind GmbH is hiring an AI Engineer (all genders)

As an AI Engineer, you will play a central role in shaping how AI applications are tested, measured, and improved. Your work will focus on creating intelligent evaluation systems that assess the quality, safety, and consistency of large language model outputs across diverse use cases and industries.

What You’ll Do

Design and implement automated evaluation frameworks that analyze LLM responses for accuracy, safety, and alignment with user requirements.
Integrate with multiple LLM providers and manage scalable, reliable connections that support production-level workloads.
Develop testing workflows that allow teams to compare model versions, run batch evaluations, and track performance trends over time.
Optimize evaluation pipelines to balance speed, cost, and precision, enabling efficient yet thorough testing cycles.
Build infrastructure for prompt analysis and optimization, including A/B testing, failure pattern detection, and data-driven improvement loops.
Architect systems capable of processing thousands of test cases concurrently, ensuring stability and consistent results.
Research and apply new methodologies in AI evaluation, from scoring techniques to behavioral analysis, to raise the standard of what automated testing can achieve.
Experiment with emerging models and capabilities, integrating breakthroughs that enhance our platform’s effectiveness.
Ensure all systems meet enterprise reliability standards, with comprehensive monitoring, error resilience, and quality controls.

What We’re Looking For

You have strong experience in Python development and have built production-grade AI systems. You understand the nuances of working with LLM APIs—handling retries, parsing variable outputs, and ensuring robustness in real-world conditions. Your background includes distributed systems, async processing, and API design, and you think critically about model behavior and system reliability.

You care deeply about making AI systems more predictable, measurable, and trustworthy. You’re comfortable working across the stack, from backend services to evaluation logic, and thrive in environments where technical ownership and impact are central.

Technical Environment

Our stack is built around Python, with Django and FastAPI for backend services, PostgreSQL for data storage, and Docker and Kubernetes for orchestration. We work extensively with OpenAI, Anthropic, and local LLMs, and use CI/CD pipelines to maintain rapid, reliable deployment cycles.

Work Setup

This is a hybrid role requiring at least three days per week in Berlin or Bremen. Onboarding includes an initial period at our Bremen headquarters. Fluency in English (minimum B2) is required; German language skills are a plus.

What We Offer

Competitive compensation and a Virtual Stock Option Program (VSOP), giving you a direct stake in the company’s growth.
End-to-end ownership of features that shape the future of AI validation.
Opportunities to collaborate closely with AI researchers and fullstack engineers.
Immediate visibility into how your work improves real teams’ ability to deploy trustworthy AI.
Fast access to the latest AI tools and technologies.

Our Culture

We value technical depth, initiative, and a genuine drive to solve hard problems in AI evaluation. We believe diverse perspectives strengthen our work and welcome applicants from all backgrounds. If you’re motivated to build systems that make AI more reliable and effective, this role offers a unique chance to shape foundational infrastructure in the field.

Required Skills

PythonLLM APIsOpenAIDjangoFastAPIPostgreSQLDockerKubernetesCI/CDDistributed SystemsAPI DesignData Modeling

About company

We build the infrastructure enterprises need to develop, evaluate, deploy, and monitor AI agents, safely and at scale.

Our products are built for organizations that cannot afford uncertainty. Compliance, security, and sovereignty from day one. We provide enterprise-grade tools for AI agent evaluation, optimization, and deployment with full data sovereignty, EU-first architecture, and automated compliance — from prototype to production.

The platform covers the full agent lifecycle: simulate, evaluate, deploy, and monitor — ensuring trust, security, and regulatory readiness, including compliance with the EU AI Act.

All jobs at ellamind GmbH Visit website

Job Details

Department Engineering

Category Data & ML

Posted 9 months ago