Responsibilities
- Architect, build, and operate scalable backend services for a media intelligence platform with a focus on clean, maintainable, and production-ready systems.
- Own critical backend components end to end, from system design and API contracts through implementation, deployment, monitoring, and iteration.
- Drive architectural decisions across APIs, processing pipelines, distributed compute, storage, search, observability, cloud infrastructure, and model-serving workflows.
- Design data models and storage patterns for media assets, generated metadata, embeddings, processing jobs, model outputs, search indexes, and audit trails.
- Design high-throughput media ingestion and processing pipelines for large volumes of video, audio, image, and text content.
- Build distributed, event-driven workflows for media processing using queues and pub/sub systems such as SQS, Kafka, Pub/Sub, or equivalent technologies.
- Implement reliable asynchronous processing patterns, including retries, idempotency, dead-letter queues, backpressure handling, and fault-tolerant job execution.
- Lead the development and optimization of metadata extraction, content analysis, scene detection, transcription, embedding generation, and multimodal AI inference workflows.
- Integrate and optimize AI/ML services within backend workflows, including model APIs, embedding pipelines, OCR, speech-to-text, scene analysis, multimodal inference, batching, caching, and fallback strategies.
- Collaborate with ML engineers, data scientists, or external model providers to benchmark models, compare quality/latency trade-offs, and safely roll out model upgrades.
- Optimize AI/ML inference workflows for latency, throughput, reliability, and cost across both real-time and batch-processing paths.
- Work with model-serving systems such as vLLM, Triton, TGI, SageMaker, Vertex AI, or custom inference services to improve batching, concurrency, warmup behavior, timeout handling, autoscaling, and GPU utilization.
- Evaluate and apply practical model optimization techniques such as quantization, model distillation, batching, caching, prompt optimization, and routing to smaller or cheaper models where appropriate.
- Design and maintain vector search and indexing systems using technologies such as Pinecone, Weaviate, Qdrant, Elastic Vectors, FAISS, pgvector, or similar tools.
- Build retrieval workflows that support semantic search, similarity matching, duplicate detection, media discovery, and structured metadata search.
- Monitor model and system performance in production, including API latency, queue depth, processing time, model error rates, GPU utilization, confidence distributions, drift signals, and cost per processed item.
- Deploy and operate systems on AWS, GCP, Azure, or equivalent cloud platforms, including compute, storage, networking, queues, model-serving infrastructure, and monitoring systems.
- Ensure system reliability through logging, metrics, tracing, alerting, dashboards, operational runbooks, and incident-response best practices.
- Collaborate with product, design, data, and ML teams to deliver media-rich, AI-powered product features.
- Mentor junior and mid-level engineers, support technical planning, review designs, and raise engineering quality across the team.
- Participate in code reviews, documentation, technical planning, and continuous improvement of engineering practices.
- Ensure code quality through testing, peer review, clear documentation, and maintainable implementation patterns.
Requirements
- Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
- 5-7+ years of backend engineering experience, ideally building scalable distributed systems, media platforms, data pipelines, or high-throughput backend services.
- Prior experience owning major backend modules end to end, including architecture, implementation, deployment, monitoring, and production operations.
- 3+ years of experience integrating AI/ML inference systems into backend workflows, including model APIs, embedding pipelines, OCR, speech-to-text, scene detection, or multimodal model outputs.
- Hands-on experience creating AI-powered processing pipelines for image, video, audio, or text analysis.
- Practical experience with production model optimization, especially for image, video, embedding, or multimodal models, including batching, caching, quantization, prompt optimization, routing strategies, latency reduction, and cost optimization.
- Strong expertise in Python and/or Node.js.
- Deep understanding of building scalable RESTful APIs and backend architectures.
- Experience with HuggingFace transformers ecosystem and deep learning frameworks such as PyTorch and TensorFlow.
- Strong experience with SQL/NoSQL databases, schema design, and data modeling.
Nice to Have
- Prior experience with vector search, semantic search, media retrieval, or similarity-matching systems.
- Experience mentoring engineers, leading technical discussions, and influencing architectural decisions across backend, infrastructure, and AI/ML workflows.
- Preferred exposure to distributed systems, microservices, asynchronous processing, and event-driven patterns with SQS, Pub/Sub, Kafka, or other queueing/pub-sub systems.
- Experience deploying production systems on AWS, GCP, or similar cloud platforms.
- Knowledge of infrastructure patterns (compute, storage, networking, observability).
- AI/ML Integration: Experience orchestrating embedding generation, scene detection, OCR, speech-to-text, image classification, video analysis, and multimodal model integrations.
- Experience optimizing inference workflows for latency, throughput, reliability, and cost.
- Experience working with scalable and optimized inference settings, including tuning sampling parameters, managing output-length formats, and configuring reasoning-related behaviors.
- Familiarity with practical model optimization techniques such as batching, caching, quantization, model distillation, prompt optimization, fallback routing, and use of smaller models where appropriate.
- Experience working with model-serving systems such as vLLM, Triton, TGI, SageMaker, Vertex AI, or custom inference services is preferred.
- Experience working with LLM and Multi-modal evaluation and benchmarking frameworks and domain-specific benchmarks with the ability to interpret results and optimize model performance accordingly.
- Preferred understanding of distributed systems, scaling patterns, and performance engineering.
- Ability to design modular, maintainable, and efficient architectures.
- Experience with API versioning, modularization, and designing long-running workflows.
- Understanding of performance bottlenecks and low-latency backend patterns.
Work Arrangement
Remote (Worldwide)
Additional Information
- Excellent English communication skills required.
- Recruitment scams warning: Apply only through official channels at https://tether.recruitee.com/.
- All communication from Tether will come from emails ending in @tether.to or @tether.io.
- Tether does not conduct interviews over WhatsApp, Telegram, or SMS.
- Tether will never request payment or financial details during the hiring process.