Responsibilities
- Design, build, and optimize data pipelines for Agentic and Generative AI systems, enabling context retrieval, multi-step reasoning, and adaptive knowledge updates.
- Develop and manage knowledge bases, vector stores, and graph databases to organize and retrieve information across diverse regulatory, product, and supplier domains.
- Engineer retrieval-augmented reasoning (RAQ/RAG) pipelines, integrating embedding generation, contextual chunking, and retrieval orchestration for LLM-driven agents.
- Collaborate cross-functionally with AI/ML, MLOps, Data, and Product teams to define data ingestion, transformation, and retrieval strategies aligned with evolving AI agent capabilities.
- Implement and automate workflows for ingestion of structured and unstructured content (documents, emails, APIs, metadata) into searchable, continuously enriched data stores.
- Design feedback and reinforcement loops that allow AI agents to validate, correct, and refine their knowledge sources over time.
- Ensure data quality, consistency, and traceability through schema validation, metadata tagging, and lineage tracking within knowledge and vector systems.
- Integrate monitoring and observability to measure retrieval performance, coverage, and model-data alignment for deployed agents.
- Collaborate with data governance and security teams to enforce compliance, access control, and Responsible AI data handling standards.
- Document schemas, pipelines, and data models to ensure reproducibility, knowledge sharing, and long-term maintainability.
- Stay at the forefront of AI data innovation, evaluating new technologies in graph reasoning, embedding architectures, autonomous data agents, and memory frameworks.
- Be familiar with corporate security policies and follow the guidance set out by processes and procedures of Assent.