Fraunhofer-Gesellschaft is looking for a Working Student in Computer Vision to join Fraunhofer SIT. You will implement and apply computer vision and machine learning approaches in security-relevant contexts, focusing on hands-on development, experimentation, and UI creation for research and industrial projects.
What You'll Do
- Implement and apply computer vision approaches, including: visual age estimation, object detection (e.g., YOLO, RT-DETRv2), image classification (e.g., Vision Transformers, CNNs, GNNs), image captioning, visual question answering, multimodal search using Vision-Language Models, image segmentation (e.g., SAM-3, DINO-3), and interactive/explainable classification systems (e.g., RISE, GradCAM).
- Implement and conduct experiments in Machine Learning: clean, prepare, split, visualize data; potentially crawl and scrape data; implement common ML procedures (e.g., hyperparameter optimization, binary/multi-class/multi-label classification, ensemble methods).
- Evaluate and benchmark using standardized metrics in the context of ML (e.g., Accuracy, BAC, Brier, mAP, ROC, AUC) and functional tests.
- Develop UIs and WebApps using common frameworks (fastHTML, Streamlit, Gradio, Svelte, Flask/FastAPI).
- Collaborate on publicly funded and/or directly commissioned projects from industry partners.
What We're Looking For
- Enrolled in a degree program in Computer Science, Mathematics, or a related field with a focus on Machine Learning and ideally Computer Vision.
- Solid knowledge of ML: familiar with various neural network architectures (especially Vision Transformers and CNNs); familiar with basic ML terms and concepts (especially classification, hyperparameter optimization, fine-tuning, model evaluation).
- Solid knowledge of Python is mandatory and will be tested in the interview.
- Willingness to take on new challenges.
- Strong analytical thinking.
Nice to Have
- Ability to independently implement methods and procedures from scientific publications.
- Knowledge and experience in the field of cybersecurity.
Technical Stack
- Languages: Python
- Vision Models: YOLO, RT-DETRv2, Vision Transformers (ViT, Swin, DeiT), CNNs (ResNet, EfficientNet), GNNs (GraphConv, GAT), Vision-Language Models, SAM-3, DINO-3
- Explainable AI: RISE, GradCAM, EigenCAM, Integrated Gradients, ViT Shapley
- UI/Frameworks: fastHTML, Streamlit, Gradio, Svelte, Flask, FastAPI
Benefits & Compensation
- Flexible working hours that can be easily reconciled with your studies.
- An inspiring work environment with state-of-the-art infrastructure.
- The opportunity to gain practical experience and build valuable contacts in research.
- Possibility for subsequent Bachelor's and Master's theses.
We value and promote the diversity of our employees' skills and therefore welcome all applications - regardless of age, gender, nationality, ethnic and social origin, religion, worldview, disability as well as sexual orientation and identity. Severely disabled persons will be given preferential consideration if equally qualified.


