Remote (Global)

AssemblyAI is hiring a Senior Software Engineer, AI Data

About the Role

AssemblyAI is looking for a Senior Software Engineer, AI Data to build robust, scalable systems that power the AI data platform. In this role, you will drive technical execution, own significant features, and work on high-impact projects that directly influence the ability to train and evaluate AI models at scale.

What You'll Do

  • Design scalable, future-proof data platforms optimized for AI research workloads.
  • Build efficient self-serve data processing pipelines leveraging GCP's advanced services.
  • Implement cost-effective storage and monitoring solutions for ML at scale.
  • Create flexible training resource management with intelligent queuing.
  • Optimize resource allocation for maximum training efficiency.
  • Participate in an on-call rotation to ensure system reliability.
  • Lead adoption of cutting-edge ML tools and frameworks, continuously evaluating and integrating best-in-class solutions.
  • Streamline existing workflows while introducing new tooling that further reduces complexity.
  • Enhance tooling and documentation to accelerate team velocity.
  • Implement guardrails for cost, quality, and performance.
  • Identify and eliminate technical bottlenecks in the data processing and training pipelines.

What We're Looking For

  • 5+ years of professional software engineering experience.
  • Strong proficiency in Python and SQL with demonstrated ability to write production-quality code.
  • Solid understanding of software engineering fundamentals: data structures and algorithms, system design and architectural patterns, testing strategies (unit, integration, end-to-end), code review practices and technical collaboration.
  • Experience with RESTful APIs and distributed systems concepts.
  • Experience with containerization (Docker) and basic cloud infrastructure.
  • Track record of delivering high-quality software in a team environment.
  • Ability to thrive in a startup environment with changing priorities and rapid iteration.

Nice to Have

  • Experience with GCP services (BigQuery, GCS, Cloud Run, GKE).
  • Familiarity with distributed processing frameworks (Apache Beam, PySpark).
  • Experience with workflow orchestration tools (Airflow, Prefect, Dagster).
  • Understanding of ML/AI infrastructure and data pipelines.
  • Experience with monitoring and observability tools (Datadog).
  • Experience working with researchers directly.
  • Background in data engineering roles.

Technical Stack

  • Languages: Python, SQL
  • Infrastructure: Docker, GCP
  • GCP Services: BigQuery, GCS, Cloud Run, GKE
  • Frameworks & Tools: Apache Beam, PySpark, Airflow, Prefect, Dagster, Datadog

Team & Environment

You will be part of the AI Data team, collaborating with researchers, platform engineers, and other stakeholders.

Benefits & Compensation

  • Competitive equity grants.
  • 100% employer-paid benefits.
  • Germany/Ireland: €141,267 – €184,512; United Kingdom: £117,159 – £153,024 + equity.

Work Mode

This is a fully remote position open to candidates based in Germany, Ireland, or the United Kingdom.

We’re committed to creating a space where our employees can bring their full selves to work and have equal opportunity to succeed. No matter your race, gender identity or expression, sexual orientation, religion, origin, ability, age, veteran status, if joining this mission speaks to you, we encourage you to apply.

Required Skills
PythonSQLDockerGCPBigQueryGCSApache BeamPySparkData EngineeringData PipelinesETLCloud RunGKEDistributed Systems
Got hired remotely?

Get paid like a professional

Remote clients expect company invoices, not personal PayPal requests. Glopay forms an EU partnership that makes you look legitimate while you stay independent.

Professional invoices with EU company details
Compliance handled automatically
Withdraw to any bank account
Income reports for easy tax filing
Create free account
Free signup • 5 min setup
About company
AssemblyAI

AssemblyAI builds at the forefront of Speech AI, creating powerful models for speech-to-text and speech understanding available through a straightforward API. They serve over 200,000 developers and 5,000 paying customers, powering more than 2 billion end-user experiences daily.

Visit website
Job Details
Category data
Posted 3 months ago