Responsibilities
- Steward core platform services: Implement container orchestration, service mesh, ingress, and secrets management at scale.
- Cross-functional partnership: Collaborate with Product, Engineering, Data, and Security to deliver external and internal value.
- Harden reliability: Improve observability (logging, metrics, tracing), and automated remediation to increase availability and latency.
- Automate everything: Use infrastructure-as-code and configuration management to make systems and processes repeatable, auditable, and secure.
- Scale cost-effectively: Optimize cluster utilization, autoscaling, and storage/networking to balance performance, reliability, and spend.
- Level-up developer experience: Build internal tooling, templates, and golden paths that reduce cognitive load and time-to-first-deploy for product teams.
- On-call & incident response: Participate in a sustainable on-call rotation, drive post-mortems, eliminate toil, and reduce MTTR via automation.
- Enable fast, safe delivery: Evolve CI/CD pipelines (build/test/release), and environment strategies (dev/stage/prod).
- AI: You instinctively build using agentic tools (Claude Code, Codex, etc) and are invested in pushing the boundaries of what is possible with agentic development
Requirements
- 5+ years of experience in software engineering with a focus on infrastructure, DevOps, and/or platform engineering
- Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience. We believe great talent comes from many paths. We consider a combination of education, training and relevant professional experience when evaluating candidates.
- Team focused mindset, with solid collaboration and communication skills, with a focus on enabling others.
- Pragmatic problem-solver who communicates clearly, documents well, and thrives in fast-moving, high-ownership environments.
- Experience working with cloud infrastructure, specifically Kubernetes.
- Understanding of observability: metrics, logs, traces, and building actionable alerts/SLOs.
- Familiarity with infrastructure-as-code tools.
- Some programming experience in at least one modern programming language.
- Awareness of security fundamentals: IAM, workload identity, network policies, encryption, and secrets management.
Nice to Have
- Open source contributions
- Experience with - Company transitioning from startup to high-growth
- Google Cloud Platform
- Terraform
- Python, Go, and/or JavaScript (TypeScript)
- Building and managing CI/CD systems and developer tooling
Work Arrangement
Remote (Worldwide)
Additional Information
- You must have an eligible work permit in the USA to be considered for this position.
- Remote work: AcuityMD is committed to supporting full-remote flexibility for employees in the US.
- Flexible PTO: Generous time off and flexible hours give you the freedom to do your best work.
- Paid Health, Dental, and Vision Plans: We offer 100% paid health, dental, and vision plans for all employees and 75% paid for our employees' dependents.
- Home Office Stipend: $1,000 to invest in remote office equipment and WiFi reimbursement.
- Optional Team Retreats: We meet in-person multiple times per year for co-working and social gatherings.
- Parental Leave: 8-16 weeks of fully-paid, flexible parental leave.
- Learning Budget: Reimbursements for relevant learning and up-skilling opportunities.