Responsibilities
- Design and develop production-grade systems that power intelligent automation, agentic workflows, and large-scale retrieval services.
- Design, build, and maintain AI-powered services and APIs, leveraging LLMs (OpenAI, Anthropic, Qwen, OSS models) and custom ML models.
- Develop an enterprise-grade agentic framework that enables orchestration, retrieval, and collaboration between multiple AI agents.
- Implement and optimize knowledge retrieval systems and agentic search capabilities using vector databases such as Qdrant and ElasticSearch.
- Write well-structured, efficient, and testable Python code for production services, experimentation, and internal developer tools.
- Build and maintain shared Python libraries and SDKs used across multiple applications and microservices.
- Collaborate with cross-functional teams on architecture, internal protocols, and API standards to ensure consistency and reliability across the platform.
- Develop and enhance monitoring, validation, and observability for production-grade AI solutions.
- Drive the full software development lifecycle - from design and implementation to deployment, monitoring, and continuous improvement.
- Identify and resolve performance bottlenecks, reliability issues, and scaling challenges in complex, data-intensive environments.
- Participate in code reviews and technical discussions, mentoring other engineers and contributing to a culture of excellence.
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field, or equivalent practical experience.
- 5+ years of experience as a Software Engineer, with strong proficiency in Python.
- Proven track record of building and maintaining production-grade systems using Python.
- Strong understanding of distributed systems, API design, and data-driven architectures.
- Experience with relational and non-relational databases (PostgreSQL, Elastic, Qdrant, or similar).
- Familiarity with AI/ML system design, including LLM integration and evaluation pipelines.
- Knowledge of DevOps and observability practices (CI/CD, monitoring, metrics, and model validation).
Nice to Have
- Experience working with multiple LLM providers (OpenAI, Anthropic, Qwen, open-source models).
- Background in developer platforms or AI infrastructure services.
- Familiarity with vector databases, semantic retrieval, and knowledge graph architectures.
- Exposure to Langfuse, LiteLLM, LangChain, or similar frameworks.
- Experience developing enterprise-scale SaaS or distributed backend systems.
- Contributions to open-source projects in Python, AI, or infrastructure engineering.