Requirements
- Master's or higher degree in Computer Science, Computer Engineering, or a comparable field, or equivalent professional background.
- Minimum of five years of hands-on experience developing Python libraries, with expertise in continuous integration tools such as GitHub Actions, integration testing, performance benchmarking, and code profiling.
- Strong working knowledge of large language models and retrieval-augmented generation workflows, including prompt design and frameworks like LangChain and LlamaIndex.
- In-depth familiarity with data science and machine learning libraries in Python, including RAPIDS, Pandas, numpy, scikit-learn, XGBoost, Numba, and PyTorch.
- Experience with distributed computing systems such as Dask, Apache Spark, or Ray for scaling data workflows.
- Demonstrable involvement in open-source software projects, with public contributions visible on GitHub.
Nice to Have
- Active participation in the data science field through published research, conference presentations, or technical blogging.
- Background in building and maintaining production-grade data pipelines, particularly those involving SQL.
- Hands-on experience with software distribution methods including pip, conda, and Docker container images.
- Knowledge of container orchestration tools such as Docker-Compose and Kubernetes, as well as cloud deployment platforms.
- Understanding of parallel computing techniques, with exposure to CUDA C++ development.
Benefits
- Competitive compensation and a comprehensive benefits offering.
- Eligibility for equity awards and additional benefits based on role and location.
Compensation
Competitive salaries and a generous benefits package.
Required (6)
- Master's or higher degree in Computer Science, Computer Engineering, or a comparable field, or equivalent professional background.
- Minimum of five years of hands-on experience developing Python libraries, with expertise in continuous integration tools such as GitHub Actions, integration testing, performance benchmarking, and code profiling.
- Strong working knowledge of large language models and retrieval-augmented generation workflows, including prompt design and frameworks like LangChain and LlamaIndex.
- In-depth familiarity with data science and machine learning libraries in Python, including RAPIDS, Pandas, numpy, scikit-learn, XGBoost, Numba, and PyTorch.
- Experience with distributed computing systems such as Dask, Apache Spark, or Ray for scaling data workflows.
- Demonstrable involvement in open-source software projects, with public contributions visible on GitHub.
Preferred (5)
- Active participation in the data science field through published research, conference presentations, or technical blogging.
- Background in building and maintaining production-grade data pipelines, particularly those involving SQL.
- Hands-on experience with software distribution methods including pip, conda, and Docker container images.
- Knowledge of container orchestration tools such as Docker-Compose and Kubernetes, as well as cloud deployment platforms.
- Understanding of parallel computing techniques, with exposure to CUDA C++ development.
