Clarivate is hiring a Senior Data Scientist (NLP) to design and implement large-scale AI-enabled solutions that modernize our content delivery systems. You will specialize in Natural Language Processing (NLP) and modern retrieval-augmented generation (RAG) architectures, focusing on text processing pipelines, indexing, vectorization, prompting, fine-tuning, and context management.
What You'll Do
- Design scalable NLP workflows for text ingestion, cleaning, normalization, and tokenization.
- Implement and maintain robust indexing systems and vector databases for semantic search and retrieval.
- Develop reusable prompting strategies and lead fine-tuning initiatives for LLMs tailored to business tasks.
- Build dynamic knowledge systems and agentic workflows using LangChain and LangGraph.
- Integrate advanced RAG architectures like VRAG and GraphRAG to enrich information retrieval.
- Conduct benchmark testing and model evaluations to improve accuracy, efficiency, and scalability of NLP systems.
- Collaborate with engineering, product, and research stakeholders to deliver integrated AI-driven features.
- Mentor junior data scientists, guide best practices, and drive innovation across AI projects.
What We're Looking For
- Bachelor’s degree in Computer Science, Data Science, Computational Linguistics, or a related field.
- At least 5 years of hands-on experience in data science, focused on natural language processing (NLP).
- At least 5 years of experience using Python, with expertise in NLP libraries such as LangChain, LangGraph, or other “Lang”-based toolkits.
- Proven experience in model development and applying machine learning techniques to real-world problems.
Nice to Have
- Expertise in retrieval-based LLM workflows (RAG, VRAG, GraphRAG).
- Deep understanding of embedding models, semantic search, and vector stores (e.g., FAISS, Pinecone).
- Experience with document loaders and text splitters/document splitting strategies.
- Familiarity with MLOps practices and production-level deployment of AI pipelines.
- Experience with cloud platforms (e.g., AWS, Azure, or GCP).
- Experience applying Graph Neural Networks (GNNs) to retrieval-enhanced generation.
- Knowledge of LangSmith and vector orchestration platforms.
- Familiarity with multilingual NLP and cross-lingual embeddings.
- Exposure to real-time knowledge graphs and stream-based RAG systems.
- A Master’s or PhD in a technical field (Computer Science, Data Science, etc.).
Technical Stack
- Python
- LangChain
- LangGraph
- FAISS
- Pinecone
- AWS
- Azure
- GCP
Team & Environment
This role is part of the Life Sciences & Healthcare (LS&H) segment under the Content Technology team. You will work closely with the VP of Content Technology, Solutions Architects, and internal SMEs, reporting directly to the VP of AI, Content.
Benefits & Compensation
- Medical insurance
- Dental insurance
- Prescription drug coverage
- Life insurance
- 401k with match
- Long term disability coverage
- Vacation
- Sick time
- Volunteer time
- Discount programs
- Annual salary range: $117,000 - $147,000 USD
Work Mode
This is a global position open to candidates in the US.
At Clarivate, we are committed to providing equal employment opportunities for all qualified persons with respect to hiring, compensation, promotion, training, and other terms, conditions, and privileges of employment. We comply with applicable laws and regulations governing non-discrimination in all locations.




