Responsibilities
- Architect, build, and operate scalable backend services for a media intelligence platform, with a focus on clean, maintainable, and production-ready systems.
- Own critical backend components end to end, from system design and API contracts through implementation, deployment, monitoring, and iteration.
- Drive architectural decisions across APIs, processing pipelines, distributed compute, storage, search, observability, cloud infrastructure, and model-serving workflows.
- Design data models and storage patterns for media assets, generated metadata, embeddings, processing jobs, model outputs, search indexes, and audit trails.
- Design high-throughput media ingestion and processing pipelines for large volumes of video, audio, image, and text content.
- Build distributed, event-driven workflows for media processing using queues and pub/sub systems such as SQS, Kafka, Pub/Sub, or equivalent technologies.
- Implement reliable asynchronous processing patterns, including retries, idempotency, dead-letter queues, backpressure handling, and fault-tolerant job execution.
- Lead the development and optimization of metadata extraction, content analysis, scene detection, transcription, embedding generation, and multimodal AI inference workflows.
- Integrate and optimize AI/ML services within backend workflows, including model APIs, embedding pipelines, OCR, speech-to-text, scene analysis, multimodal inference, batching, caching, and fallback strategies.
- Collaborate with ML engineers, data scientists, or external model providers to benchmark models, compare quality/latency trade-offs, and safely roll out model upgrades.
- Optimize AI/ML inference workflows for latency, throughput, reliability, and cost across both real-time and batch-processing paths.
- Work with model-serving systems such as vLLM, Triton, TGI, SageMaker, Vertex AI, or custom inference services to improve batching, concurrency, warmup behavior, timeout handling, autoscaling, and GPU utilization.
- Evaluate and apply practical model optimization techniques such as quantization, model distillation, batching, caching, prompt optimization, and routing to smaller or cheaper models where appropriate.
- Design and maintain vector search and indexing systems using technologies such as Pinecone, Weaviate, Qdrant, Elastic Vectors, FAISS, pgvector, or similar tools.
- Build retrieval workflows that support semantic search, similarity matching, duplicate detection, media discovery, and structured metadata search.
- Monitor model and system performance in production, including API latency, queue depth, processing time, model error rates, GPU utilization, confidence distributions, drift signals, and cost per processed item.
- Deploy and operate systems on AWS, GCP, Azure, or equivalent cloud platforms, including compute, storage, networking, queues, model-serving infrastructure, and monitoring systems.
- Ensure system reliability through logging, metrics, tracing, alerting, dashboards, operational runbooks, and incident-response best practices.
- Collaborate with product, design, data, and ML teams to deliver media-rich, AI-powered product features.
- Mentor junior and mid-level engineers, support technical planning, review designs, and raise engineering quality across the team.
- Participate in code reviews, documentation, technical planning, and continuous improvement of engineering practices.
- Ensure code quality through testing, peer review, clear documentation, and maintainable implementation patterns.
Requirements
- Excellent English communication skills.
Work Arrangement
Remote (Worldwide)
Additional Information
- Role requires excellent English communication skills.