Responsibilities
- Design and manage three separate data processing workflows: time-based job ingestion, event-triggered course handling, and recurring knowledge graph construction, each with unique activation rules and cost management
- Produce and manage semantic embeddings using Amazon Bedrock's Titan v2 model, store them in MongoDB Atlas Vector Search, and fine-tune similarity thresholds to maintain accurate matching
- Develop and sustain a knowledge graph that connects jobs, courses, skills, and industries using FP-Growth association rule mining and archetype-to-SOC code alignment
- Create and enhance a two-phase discovery and matching API hosted on AWS Lambda, starting with vector search followed by in-depth eligibility evaluation and LLM-based result re-ranking
- Optimize Fargate Spot instance sizing and implement fault-tolerant processing loops that handle interruptions gracefully, ensuring cost efficiency as data grows
- Sustain and refine daily job scrapers from various sources and develop institutional data scrapers supported by reliable HTML normalization pipelines
Compensation
Competitive salary and equity package
Work Arrangement
Remote-friendly with team coordination across time zones
Team
Small, high-impact engineering team focused on AI-driven data systems
Technologies
AWS Lambda, Fargate, Amazon Bedrock (Titan v2), MongoDB Atlas Vector Search, FP-Growth algorithm, SOC code mapping, HTML parsing and cleaning tools
Data Scale
Processing millions of job and course records with growing data volume requiring resilient and scalable pipeline design
Performance Goals
Maintain low-latency API responses, ensure high accuracy in semantic matching, and minimize infrastructure spend per processed record
Available for qualified candidates