Costa Rica Remote (Global) Full-time

Databricks is hiring a Sr. IT Site Reliability Software Engineer

Responsibilities

  • Design and implement cloud-based infrastructure on AWS or Azure using Infrastructure as Code tools such as Terraform or Pulumi.
  • Enhance system reliability, performance, and scalability to maintain high availability and low latency for essential IT services.
  • Develop and manage CI/CD pipelines using platforms like GitHub Actions, supporting both hosted and self-hosted runners for specialized build needs.
  • Ensure all new internal applications are built with security, logging, monitoring, and alerting capabilities enabled from the start.
  • Develop internal AI-driven tools and automation scripts to improve developer productivity and operational efficiency.
  • Support incident management by analyzing data, refining response workflows, and building dashboards to track service health.
  • Take part in on-call rotations, leading fast resolution of production outages and technical issues.
  • Lead post-incident reviews to determine root causes and implement long-term engineering fixes.
  • Work closely with Security, Engineering, and Support teams to deliver measurable business impact.

Work Arrangement

Remote (Worldwide)

Compliance

If job responsibilities require access to export-controlled technology or source code, the employer may choose whether to apply for a U.S. government license. The employer may decline to proceed with a candidate based solely on this factor.

About company
Databricks
Databricks is the data and AI company. More than 10,000 organizations worldwide rely on the Databricks Data Intelligence Platform to unify and democratize data, analytics and AI.
All jobs at Databricks Visit website
Job Details
Department IT Infrastructure and Operations
Category infrastructure
Posted 5 hours ago