Responsibilities
- Investigate and diagnose production issues, bugs, and system failures
- Debug backend services, APIs, and distributed systems
- Troubleshoot issues in data ingestion pipelines, ETL workflows, and computation processes
- Use logs, monitoring tools, queries, and debugging techniques to analyze system behavior and isolate likely root causes
- Investigate ambiguous technical issues across services, data flows, and infrastructure components
- Perform root cause analysis (RCA) on production incidents and contribute to follow-up actions
- Communicate investigation status, impact, and next steps clearly during active incidents
- Coordinate with internal engineering teams when issues span multiple systems or require escalation
- Collaborate with engineers to implement fixes and improve system stability
- Identify recurring operational issues and propose improvements to increase system reliability and reduce manual effort
- Document troubleshooting processes, root causes, and common operational workflows
- Develop a broad understanding of MerQube’s systems and financial data infrastructure
Requirements
- Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience
- 3+ years of experience debugging, maintaining, or building production systems using at least one backend programming language such as Python, Go, C++, or Java
- Experience working with backend systems or data processing pipelines
- Strong debugging and analytical problem-solving skills
- Experience working with SQL and data systems
- Ability to investigate complex and sometimes ambiguous technical issues across multiple services and system layers
- Ability to prioritize effectively during incidents and make sound escalation decisions
- Strong written and verbal communication skills, with the ability to provide clear incident updates and collaborate effectively across teams
- Ability to document findings, root causes, and operational workflows clearly and concisely
- Interest in learning about financial systems and data processing
Nice to Have
- Experience working with ETL or data pipelines
- Familiarity with workflow orchestration tools such as Airflow
- Experience with distributed systems or microservices architectures
- Familiarity with cloud environments such as AWS or GCP
- Experience debugging production systems in live operational environments
- Familiarity with tools such as Docker, Kubernetes, or other containerized environments
- Experience with monitoring, logging, or observability tools
- Exposure to financial markets or financial data
Benefits
- Competitive compensation packages
- Comprehensive full-time benefits, including medical, dental, vision, and more
- Flexible working arrangements, including opportunities to work remotely
- A community-first environment
- A strong focus on health, wellness, and work-life balance
- Opportunities to learn, develop, and grow
- PTO, holiday, and sick time
Work Arrangement
Hybrid
Additional Information
- This role requires alignment with Pacific Time (PT) operating hours in order to support internal teams and production systems effectively. For team members based in India, this corresponds primarily to evening and night-time working hours.
- This role includes participation in a light on-call rotation focused on responding to critical production incidents. Most work takes place during regular business hours in support of internal teams. During on-call periods, engineers are expected to respond to high-priority incidents (P1 issues) in a timely manner by triaging the issue, driving investigation, and either resolving it directly or coordinating with the appropriate engineering teams. The on-call rotation is designed to provide coverage for critical incidents and is not intended to require continuous off-hours work.


