Honeywell is seeking a Principal AI Data Engineer to architect and implement the sophisticated data foundation for our enterprise AI initiatives. In this role, you will design advanced data solutions that drive business insights, enhance decision-making, and empower our diverse portfolio of AI applications.
What You'll Do
- Support end-to-end data needs for all AI modalities, including classic ML, GenAI/LLMs, and agentic AI systems.
- Build robust, scalable data pipelines for structured, semi-structured, and unstructured data, including text, documents, images, audio, video, and logs.
- Develop feature engineering pipelines for classic ML, including feature extraction, transformation, and feature store management.
- Build and optimize GenAI and LLM data pipelines, including embedding generation, vectorization, chunking, metadata extraction, and document enrichment for RAG and context retrieval.
- Develop data ingestion and orchestration workflows that support agentic AI, including memory stores, event-driven pipelines, tool-use data flows, and real-time retrieval services.
- Design and implement advanced data solutions using AWS (S3, Glue, Lambda, EMR, Kinesis), Databricks (Spark, Delta Lake, Vector Search), and Dataiku to enable intelligent systems at scale.
- Implement data governance, quality, lineage, monitoring, and observability to support high-performance, trustworthy AI.
- Partner with data scientists, ML engineers, and AI product teams to deliver datasets for model development, fine-tuning, evaluation, and production inference.
- Optimize pipelines for latency, cost, reliability, and throughput, ensuring AI systems—from batch ML to real-time agents—have the data they need.
What We're Looking For
- Bachelor’s degree in a technical field (CS, Engineering, Math, or related).
- Experience supporting AI at scale across classic ML, GenAI/LLM, and agentic AI systems.
- Experience with vector databases and semantic search (Databricks Vector Search, Pinecone, FAISS, Milvus, OpenSearch).
- Familiarity with LLM and GenAI data preparation, including text processing, tokenization, chunking strategies, and prompt/context formatting.
- Experience with unstructured data technologies (OCR, NLP pipelines, computer vision data processing).
- Hands-on experience with Dataiku for automation, workflow orchestration, and AI project management.
- Knowledge of MLOps tooling: MLflow, Delta Lake, experiment tracking, CI/CD for ML.
- Understanding of agentic AI system patterns, such as memory architectures, tool APIs, event-driven workflows, and reasoning chain data requirements.
- Strong analytical mindset, attention to detail, and commitment to high data quality.
- Ability to thrive in a fast-paced, evolving AI environment and collaborate across cross-functional teams.
- Must be a US Citizen due to contractual requirements.
Technical Stack
- Cloud & Compute: AWS (S3, Glue, Lambda, EMR, Kinesis)
- Data Platforms: Databricks (Spark, Delta Lake, Vector Search), Dataiku
- Vector Databases: Pinecone, FAISS, Milvus, OpenSearch
- MLOps & Orchestration: MLflow, Delta Lake
- Unstructured Data Processing: OCR, NLP pipelines, computer vision data processing
Team & Environment
You will report directly to the AI Director, partnering closely with data scientists, ML engineers, and product teams to deliver critical data infrastructure for AI solutions.
Benefits & Compensation
- Employer-subsidized Medical, Dental, Vision, and Life Insurance
- Short-Term and Long-Term Disability
- 401(k) match
- Flexible Spending Accounts and Health Savings Accounts
- EAP and Educational Assistance
- Parental Leave
- Paid Time Off (for vacation, personal business, sick time, and parental leave)
- 12 Paid Holidays
Work Mode
This is a hybrid position based in Phoenix, AZ.
Honeywell is an equal opportunity employer.





