Design and operate the data backbone for advanced AI systems in a remote-first environment. This role is central to building and scaling data infrastructure that powers retrieval-augmented generation (RAG), agent memory, knowledge graphs, and AI observability on AWS.
What You’ll Do
- Build and maintain scalable pipelines for ingesting and integrating structured, unstructured, and multi-modal data from databases, documents, APIs, and event streams—ensuring data is secure, governed, and ready for AI use.
- Develop retrieval systems using vector databases, hybrid search (vector + keyword), and graph-based knowledge models, with a focus on optimizing data structure for high-quality results.
- Design and manage agent memory pipelines that capture AI system signals—traces, feedback, evaluations, and usage patterns—to support observability and ongoing model improvement.
- Implement data quality, validation, and monitoring frameworks across production workflows, enforcing access controls, lineage tracking, and compliance with security standards.
- Optimize AI data pipelines for reliability, performance, and cost-efficiency, applying resilient processing patterns and observability practices.
- Support evaluation frameworks for AI systems, enabling benchmarking, offline testing, and long-term performance tracking of retrieval and agent behaviors.
- Lead platform improvements by defining clear data contracts between services, reducing dependencies, and enhancing developer experience across AI systems.
What We’re Looking For
- Proven experience in software engineering with a focus on building maintainable, testable, and end-to-end systems.
- Hands-on work with distributed systems—async processing, retry logic, failure handling, and eventual consistency.
- Production experience with AI data pipelines: embeddings, retrieval workflows, or feedback data processing.
- Familiarity with vector databases, search platforms, or graph-based storage solutions.
- Experience in RAG, hybrid search, or embedding pipelines.
- Knowledge of agent frameworks, agent memory, or orchestration of tool-using AI agents.
- Background in observability data systems—traces, logs, metrics, or feedback loops for AI evaluation.
- Cloud experience on AWS, particularly with storage, messaging, and orchestration services.
- Proficiency with infrastructure as code (CDK preferred; CloudFormation or Terraform acceptable).
- Strong communication and collaboration skills, with a track record of mentoring and improving team engineering practices.
- Fluency in English, both written and verbal.
Preferred Qualifications
- Experience with JVM-based data processing (Java, Kotlin, Scala) and Spark workloads.
- Knowledge of schema evolution and data contract management—versioning, backfills, compatibility.
- Operational focus on pipeline reliability: replay safety, dead-letter queues, reconciliation, and data lineage.
- Experience with CDK, CloudFormation, or Terraform for infrastructure automation.
Work Environment
This is a 100% remote position for candidates based in Colombia, offering a flexible schedule and strong work-life balance. You’ll work within a global SaaS environment that values innovation, independence, and accountability. The role includes a competitive salary above market average, indefinite contract status, and comprehensive benefits including health coverage, life insurance, home office and internet allowances, and a training budget to support your growth.
You’ll join a culture that encourages creativity, continuous learning, and strategic thinking, with opportunities for mentorship, recognition awards, and vacation upgrades after five years of service. We value diverse perspectives and are committed to equitable practices across hiring and team development.

