About the Role
Role details below.
Responsibilities
- Provision and manage multi-tenant and dedicated customer environments on AWS using DuploCloud, ensuring isolation, scalability, and reliability
- Design, optimize, and maintain GitHub Actions pipelines to enable fast, reliable daily deployments with tight feedback loops
- Deploy, scale, and troubleshoot production Kubernetes clusters, managing autoscaling policies, resource allocation, and health monitoring
- Own PostgreSQL operations including provisioning, migrations, replication, performance tuning, and disaster recovery strategies
- Build and maintain reproducible, version-controlled infrastructure using Terraform, Docker, and Kubernetes manifests
- Partner with SecOps and CISO leadership to implement encryption policies, IAM configurations, VPC segmentation, logging, monitoring, and compliance-driven infrastructure controls across AWS and GovCloud
- Establish dashboards, alerts, and runbooks to maintain high availability and reduce incident frequency and recovery time
- Leverage AI-augmented workflows to generate infrastructure modules, refine deployment scripts, and accelerate operational improvements while maintaining rigorous validation standards
- Evaluate and integrate new tooling, improve provisioning strategies, and evolve infrastructure to support scale, compliance, and long-term system resilience
Requirements
- 7+ years of professional experience in DevOps, infrastructure engineering, or site reliability engineering with hands-on production ownership
- Bachelor’s degree in Computer Science or related field (or equivalent practical experience)
- Deep expertise in AWS (EC2, EKS/ECS, RDS, S3, IAM, VPC, CloudWatch) including experience in commercial and GovCloud environments
- Strong Kubernetes experience managing clusters in production environments
- Hands-on PostgreSQL operational experience including migrations, backup/recovery, replication, and performance tuning
- Experience designing and maintaining CI/CD pipelines, particularly GitHub Actions
- Infrastructure as code proficiency with Terraform and containerization tools such as Docker
- Strong systems thinking and ability to design scalable, secure, and maintainable infrastructure platforms
- Excellent written and verbal communication skills
- Comfort operating in autonomous, fast-paced environments
- A service-oriented mindset focused on enabling engineering teams to ship quickly and safely
Nice to Have
- Experience with DuploCloud or similar tenant management platforms
- Experience building infrastructure for PLM, PDM, or hardware/manufacturing industry software
- Background supporting on-premises or compliance-driven deployments
- Familiarity with event-driven systems (NATS, Redis, Kafka)
- Experience with observability tools such as Datadog, PostHog, or Sentry
- Knowledge of compliance frameworks such as SOC 2, FedRAMP, or ITAR
- Experience implementing zero-downtime deployment strategies
Additional Information
- This is a hybrid role based in Los Angeles, CA (3 days per week in office)
- Engineers leverage AI-powered development environments to orchestrate tasks, structure infrastructure context, and accelerate delive