Elsevier is looking for a Data Scientist to play a pivotal role in developing and deploying Generative AI models and solutions. You will be responsible for building, testing, and maintaining our Generative AI, RAG, and NLP solutions, engaging in the entire life cycle of data science projects.
What You'll Do
- Collect data, perform analysis, develop models, and present findings to stakeholders.
- Create production-ready Python packages for data science pipeline components and coordinate their deployment.
- Design, develop, and deploy Generative AI models and solutions that meet specific business needs.
- Optimize and customize Retrieval Augmented Generation (RAG) pipelines.
- Ingest, preprocess, and transform large-scale multilingual content for model input.
- Build Agentic RAG systems using tools like LangChain, AutoGen, or Haystack.
- Fine-tune large language models (LLMs) and transformer models to enhance accuracy.
- Implement guardrails and evaluation mechanisms to ensure responsible and ethical AI usage.
- Conduct rigorous testing and evaluation of AI models to ensure performance and reliability.
- Integrate data science components and maintain pipeline robustness against model drift.
- Establish reporting processes and develop automatic re-training strategies.
- Work collaboratively with cross-functional teams to integrate AI solutions into products.
- Mentor junior data scientists and contribute to team knowledge-sharing.
- Stay current with the latest advancements in AI, machine learning, and NLP.
What We're Looking For
- A Bachelor's or Master’s in Computer Science, Data Science, Artificial Intelligence, or a related field.
- 3+ years of applied experience in data science, with a focus on Generative AI, NLP, and machine learning.
- Proficiency in Python for data analysis, model development, and deployment.
- Strong experience with transformer models and fine-tuning techniques for LLMs.
- Proficiency in Generative AI technologies, including using LLMs via API, evaluation tools, and prompt engineering.
- Knowledge and practical implementation of various RAG pipelines.
- Experience with advanced algorithms in deep learning, neural networks, reinforcement learning, and transfer learning.
- Familiarity with traditional ML algorithms for model building, validation, and testing.
- Understanding of AI ethics, guardrail implementation, and evaluation metrics.
- Familiarity with cloud platforms for model deployment and production-ready pipelines.
- Proficiency in data visualization tools and techniques.
- Experience with version control systems, Jira, and working in an Agile environment.
- Proficient in using *nix systems, open-source software, Jupyter Notebook, and cloud computing.
- Excellent problem-solving and analytical skills with strong attention to detail.
- Strong communication skills and ability to work effectively in a team.
Nice to Have
- Experience with end-to-end model deployment, leveraging AI agents, Model Context Protocol (MCP), and cloud platforms like AWS (including Bedrock) or Azure.
- Experience in Java.
Technical Stack
- Python, Generative AI, Machine Learning, NLP
- Transformer models, LLMs, RAG
- LangChain, AutoGen, Haystack, MCP
- AWS, AWS Bedrock, Azure
- GitLab, GitHub, Jira, Jupyter Notebook
Team & Environment
You will be part of the Data Operations team, collaborating closely with Subject Matter Experts (SMEs) and the technology team.
Benefits & Compensation
- Flexible work environment (remote, hybrid, on-site)
- Shared parental leave
- Study assistance
- Sabbaticals
- Access to wellness programs
- Health insurance options for you and your family
- Group life and accident insurance
- Employee assistance programs and mental health resources
- Flexible working arrangements
- Paid time-off options, including sick leave, vacation, and public holidays
- Subsidized meals and free transportation in select locations
Work Mode
This role offers a hybrid work arrangement.
Elsevier is an equal opportunity employer: qualified applicants are considered for and treated during employment without regard to race, color, creed, religion, sex, national origin, citizenship status, disability status, protected veteran status, age, marital status, sexual orientation, gender identity, genetic information, or any other characteristic protected by law.





