Responsibilities
- Learn by reading and writing designs, documentation, runbooks, and industry literature
- Partner with development teams to design and implement reliable and resilient services
- Build infrastructure automation that’s easy to use by other teams
- Develop observability processes, reports, and tooling to diagnose performance and stability issues
- Eliminate toil by automating manual processes
- Ensure we exceed our compliance and security commitments
- Act in an ethical and professional manner
Requirements
- 5+ years of experience in Software Engineer, SRE or DevOps roles
- Strong written and verbal communication skills (We use Slack, Notion, and Github)
- Experience with Infrastructure as Code (We use Terraform and AWS)
- Experience with containers and container orchestration tools (We use ECS)
- Experience with authoring and maintaining code (We use Bash, Python, and Golang)
- Experience with using and helping others with observability tools and techniques (We use Datadog)
- Love for the Oxford comma (We use, love, and respect it)
Nice to Have
- Experience with cloud cost management and FinOps
- Experience in building, maintaining, and operating SaaS or Web based applications
- Experience with distributed system principles their application
- Experience building and operating multi-region or cell based applications
- Experience with managing cloud vendor relationships
- Experience with compliance and regulated environments (We use SOC2 and HIPAA)
Benefits
- (US-ONLY) 100% of medical, dental, and vision covered including 75% for dependents
- vacation days and quarterly mental health days so you can recharge
- (US-ONLY) 401k plan to participate in and save towards the future
- Apple products to help you do your best work
- Resource Groups (ERGs) to support and celebrate the shared identities and life experiences of communities within CaptivateIQ.
- ERGs directly support our company-wide DEI goals as a space for developing and retaining diverse talent
Additional Information
- Participate in an on-call rotation to provide after-hours support, ensuring timely resolution of critical issues and maintaining system uptime.
