Jobgether is hiring a Senior Site Reliability Engineer to ensure the reliability, scalability, and performance of our critical security infrastructure. In this fully remote role in India, you will lead initiatives for operational excellence across mission-critical platforms like intrusion detection and DDoS mitigation, defining service-level objectives and driving automation.

What You'll Do

Own reliability outcomes for security platforms, defining SLIs/SLOs, error budgets, alerting, dashboards, and runbooks.
Architect and implement high availability, capacity planning, and disaster recovery for IDS/IPS, DDoS mitigation, and supporting services.
Design zero/minimal-downtime maintenance and upgrade strategies for OS, firmware, and signature updates.
Automate deployments, configuration, and compliance using SaltStack, Python, and Infrastructure as Code practices.
Operate and optimize a heterogeneous stack including IPS, DDoS, HAProxy, Nginx, Juniper, and Palo Alto systems.
Lead incident response in a 24/7 on-call rotation, act as incident commander, and drive blameless postmortems with durable fixes.
Reduce operational toil through self-service tooling, automated health checks, and reliability reviews including game days and chaos testing.
Maintain audit-ready operations aligned with compliance standards; create and update SOPs, operational documentation, and architectural diagrams.
Mentor and provide technical guidance to contractors and junior engineers while collaborating with cross-functional teams.

What We're Looking For

5+ years of experience in site reliability, production operations, or platform engineering supporting large-scale, mission-critical systems.
Expert-level knowledge of SaltStack (or similar configuration management tools) for automation and deployments.
Strong Linux administration skills with deep understanding of TCP/IP, routing, load balancing (L4–L7), and network security concepts.
Proficiency in Python for automation, integrations, and operational tooling.
Experience with observability tools: Icinga, Grafana, InfluxDB, and rsyslog pipelines.
Familiarity with Git-based workflows, CI/CD pipelines, and Infrastructure as Code concepts.
Proven effectiveness in 24/7 operations environments with on-call responsibilities and incident management experience.
Excellent technical writing, documentation, and mentoring skills.
Bachelor’s degree in Computer Science, Information Technology, or related field.

Nice to Have

Hands-on experience with IDS/IPS and DDoS platforms (TrendMicro TippingPoint, Suricata, NetScout/Arbor), HAProxy/Nginx administration, and network devices (Juniper, Palo Alto).
Industry certifications such as Security+, CISSP, Linux+.

Technical Stack

Automation & Config: SaltStack, Python
Platforms & Networks: Linux, IPS, DDoS, HAProxy, Nginx, Juniper, Palo Alto
Observability: Icinga, Grafana, InfluxDB, rsyslog
Tools & Processes: Git, CI/CD

Benefits & Compensation

Competitive salary with performance incentives
Comprehensive health and family-friendly benefits, including parental leave
Flexible remote working arrangements
Retirement savings and equity opportunities (varies by role)
Paid time off and bonus eligibility
Professional development support and mentorship
Collaborative and inclusive team culture embracing diversity and entrepreneurship