What You'll Do
Lead the development and execution of performance engineering strategies for large-scale, cloud-native telecom platforms. Guide a distributed network of SDETs across feature teams, ensuring performance testing is integrated into sprint cycles using tools like k6.
Design and maintain backend performance frameworks in Python, Java, or TypeScript, focusing on API, database, and infrastructure performance. Implement production traffic replay using GoReplay and generate complex, multi-tenant test data to simulate real-world load conditions.
Own end-to-end performance validation, including load, stress, endurance, scalability, and resilience testing. Apply chaos engineering techniques using AWS Fault Injection Simulator to test system robustness under failure conditions.
Validate real-time communication systems and event-driven architectures built on Kafka and RabbitMQ under production-like loads. Integrate performance testing into CI/CD pipelines to enable automated feedback and prevent performance regressions in pull requests.
Use observability platforms such as Grafana, CloudWatch, Coralogix, and PMM to monitor system behavior, analyze performance data, and troubleshoot bottlenecks. Define performance benchmarks, SLAs, SLOs, and error budgets that align with business objectives.
Collaborate with architecture and data teams to evaluate system designs, validate database solutions like MySQL and Tungsten, and recommend performance-driven improvements. Analyze and optimize performance across both legacy monoliths and modern microservices environments.
Requirements
- Proven experience in backend performance engineering with hands-on coding in Python, Java, or TypeScript.
- Track record of leading performance testing initiatives in AWS-hosted, cloud-native environments.
- Experience leading or mentoring distributed SDET teams, including code reviews and framework adoption.
- Strong knowledge of API performance testing, integration validation, and non-functional requirements at scale.
- Familiarity with distributed systems, message brokers (Kafka, RabbitMQ), and microservices architectures.
- Solid understanding of system design principles, including concurrency, throughput, latency, and fault tolerance.
- Experience with performance tools such as k6, JMeter, or Gatling for large-scale test design and analysis.
- Hands-on use of traffic replay tools like GoReplay to mirror production behavior.
- Proficiency with AWS services including EC2, ECS, RDS, and API Gateway for performance optimization.
- Experience integrating performance tests into CI/CD systems like Jenkins or GitHub Actions.
- Skilled in using observability tools including Grafana, Coralogix, PMM, and CloudWatch for diagnostics.
- Background in Agile environments with the ability to translate performance data into engineering actions.
Technical Stack
Primary tools and technologies include k6, Python, Java, TypeScript, JMeter, Gatling, GoReplay, AWS EC2, ECS, RDS, API Gateway, Fault Injection Simulator, Kafka, RabbitMQ, MySQL, Tungsten, Grafana, CloudWatch, Coralogix, VictoriaMetrics, PMM, Jenkins, GitHub Actions, and RESTful APIs.


