LVT is looking for a Principal Vector Data Engineer to lead technical development at the intersection of AI and digital health. You will architect vector embedding pipelines and foundation models that support longitudinal data integration and therapeutic R&D innovation.
What You'll Do
- Lead the design, development, and optimization of vector embedding models for diverse biomedical modalities including clinical, regulatory, imaging, and digital health data.
- Architect scalable, compliant embedding pipelines using modern vector database technologies.
- Establish robust quality-control frameworks for mobile-captured images and convert pixel-level data into high-fidelity vector representations.
- Drive the adaptation of state-of-the-art academic methods into production-ready, GxP-aware foundation models.
- Oversee multimodal data integration efforts to enable semantic search, retrieval-augmented analysis, and clinical insight generation.
- Collaborate with data scientists, clinicians, engineering teams, and regulatory partners to ensure models and pipelines align with GxP, clinical governance, and documentation standards.
- Contribute to digital biomarker discovery and predictive modeling for neurodegenerative, neuropsychiatric, oncologic, and immunologic conditions.
- Mentor junior engineers and contribute to technical roadmap planning, architectural reviews, and AI strategy development.
What We're Looking For
- MS/PhD in Computer Science, Electrical Engineering, Biomedical Engineering, or a related discipline.
- 3+ years of experience in multimodal ML, vector representation learning, biomedical signal processing, or large-scale embedding systems.
- Expertise in Python, PyTorch/TensorFlow, Hugging Face, and multimodal embedding architectures such as CLIP, MedCLIP, BioBERT, and TimeSformer.
- Hands-on experience with vector indexing and search systems like FAISS, Pinecone, Weaviate, Milvus, Odrant, and Chroma.
- Familiarity with sentence-transformers, LangChain, or LlamaIndex for semantic search and RAG workflows.
- Understanding of clinical trial data structures, longitudinal monitoring, GxP system requirements, and compliant data lifecycle management.
Technical Stack
- Programming: Python
- ML Frameworks: PyTorch, TensorFlow, Hugging Face
- Vector Databases: FAISS, Pinecone, Weaviate, Milvus, Chroma, Odrant
- Tools: sentence-transformers, LangChain, LlamaIndex
Work Mode
This is a hybrid position based in Cornellà de Llobregat, Barcelona, Spain or Madrid, Spain.
LVT is an equal opportunity employer.




