Responsibilities
- Design and implement scalable AI service architectures in cloud environments (Azure preferred; AWS or GCP acceptable)
- Build event-driven systems using queues and messaging platforms (e.g., Azure Service Bus, RabbitMQ, SQS) to support asynchronous AI workloads.
- Implement event streaming and real-time processing pipelines (e.g., Kafka, Azure Event Hubs, Pub/Sub, Kinesis).
- Architect, maintain, and scale containerized AI services deployed to Kubernetes, with emphasis on Azure Kubernetes Service (AKS).
- Design orchestration layers that manage model calls, downstream services, retries, rate limits, and failure handling.
- Optimize system performance under load, including horizontal scaling, autoscaling policies, resource management, and cost control.
- Implement WebSocket or real-time client communication patterns for interactive AI applications.
- Contribute to infrastructure-as-code and CI/CD practices for AI service deployment, collaborating with CloudOps, DevOps, and application engineering teams to ensure reliability, availability, and operational standards are met.
- Partner with Product and business stakeholders to translate projected traffic, adoption, and growth targets into scalable technical architectures and capacity plans and debug production level issues as needed.
Requirements
- 3-5 years of software engineering experience with strong fundamentals in object-oriented programming, design patterns, and distributed system design.
- Professional experience in Python, C#, Java, or a similar language used in production systems.
- Strong hands-on experience with containerization (Docker) and Kubernetes-based orchestration (AKS preferred).
- Experience integrating AI/LLM workloads into enterprise-grade distributed systems.
Nice to Have
- Experience designing APIs and backend systems that support high concurrency and real-time interactions.
- Experience designing event-driven architectures using messaging systems (Azure Service Bus, RabbitMQ, SQS).
- Experience implementing event streaming systems (Kafka, Azure Event Hubs, Pub/Sub, Kinesis).
- Experience deploying AI systems in cloud environments (AWS, Azure, GCP).
- Experience in Databricks (model serving endpoints, ML Flow)


