Automate data classification to resolve existing issues and integrate classification processes into data workflows in collaboration with the Governance Lead.
Develop production-grade AI agents that extend beyond prototype examples to support real-world operational use cases on the agentic platform.
Define data requirements for the agentic platform, specifying necessary data inputs, formats, and quality standards in coordination with the Principal AI Engineer.
Build FastAPI-based services that wrap LLM APIs and support version-controlled prompt templates.
Design structured prompts for classification and briefing tasks that produce validated JSON outputs including tags, confidence scores, and source references.
Maintain prompt templates in configuration files to allow updates without requiring code changes.
Log all LLM interactions with details including input hash, model version, output result, response time, and token usage for monitoring and auditing.
Implement fallback strategies to ensure system resilience when LLM APIs are inaccessible or degraded.
Conduct regular evaluation of model outputs using precision and recall metrics against human-labeled samples, then refine prompts based on results.

Competitive salary and benefits package.

Hybrid work model with flexibility for remote and on-site collaboration.

Collaborative engineering team focused on delivering scalable and robust AI solutions in production environments.

Data classification automation: implementing automated classification to remediate current failures, embedding classification into data pipelines alongside the Governance Lead
Operational AI agents: building production agents on top of the agentic platform — going beyond the sample agents the external partner delivers into real operational workflows
Agentic platform data contracts: defining what data the platform needs, in what format, with what quality guarantees — working with the Principal AI Engineer
AI service implementation: FastAPI service around LLM APIs with versioned prompt templates
Classification and briefing prompts: structured prompts returning validated JSON with tags, confidence levels, source attribution
Prompt versioning: templates in configuration, editable without code changes
Observability: every LLM call logged with input hash, model version, output, latency, token count
Fallback logic: graceful degradation when LLM APIs are unavailable
Quality evaluation: running precision/recall evaluations against human reviewer samples, reporting results, iterating prompts

Available for qualified candidates requiring sponsorship.

BlackStone eIT is hiring an AI Engineer (Applied)