Responsibilities
- Train and refine large language models using supervised fine-tuning techniques.
- Collaborate with open-source model architectures such as LLaMA, Mistral, Qwen, and comparable variants.
- Develop LoRA and Q-LoRA pipelines to enable efficient model adaptation.
- Design and enhance data preprocessing systems, including tokenization and long-context management.
- Extend and utilize Hugging Face Transformers and Datasets libraries for training and inference tasks.
- Process structured and semi-structured data formats, including XML and XSD files.
- Implement document parsing for Microsoft Office file formats using libraries like python-docx and OpenXML.
- Create end-to-end Retrieval-Augmented Generation (RAG) systems for accurate, document-based question answering.
- Construct and manage vector databases and embedding workflows using tools such as FAISS, Chroma, Weaviate, or pgvector.
- Improve retrieval performance through hybrid search, re-ranking, and domain-optimized chunking strategies.
- Develop and integrate Model Context Protocol (MCP) servers to connect LLMs with tools, APIs, and external data sources.
- Design agent workflows that use MCP for secure, auditable access to internal systems and contextual data.
- Deploy and operate models in fully offline and air-gapped environments.
- Apply model optimization and quantization methods including GGUF, GPTQ, AWQ, and bitsandbytes.
- Build and maintain inference infrastructure using frameworks such as vLLM, TGI, and Ollama.
- Maximize GPU efficiency through CUDA, cuDNN, and VRAM-aware batching techniques.
- Maintain local CI/CD pipelines for machine learning models without reliance on cloud services.
- Manage on-premise model registries, version control, and artifact storage.
- Ensure RAG and MCP components function reliably in disconnected or restricted network settings.
- Develop Python backend services to support machine learning training and inference operations.
- Interact with relational databases like Postgres and MySQL, as well as vector databases for RAG storage.
- Utilize Docker and Git to support consistent development and deployment processes.
- Implement CI/CD workflows using Azure DevOps, including local runners where needed.
Work Arrangement
Remote (Worldwide)