ObviouslyAI is looking for a creative and curious AI Test Engineer to join our team. This is a hands-on role where you'll work directly with AI agents and backend services, writing code, debugging, scripting tests, and evaluating LLM prompts to ensure our systems behave reliably. It's not a traditional backend or QA role; you'll learn new tools, explore unfamiliar apps, and design tests from a user's perspective to improve cutting-edge AI systems.
What You'll Do
- Write, debug, and maintain backend code in Python or JavaScript, building test cases and backend scripts.
- Implement APIs and ensure authentication workflows work as expected.
- Design and execute creative test strategies for AI agent behavior, particularly around LLM-based agents.
- Evaluate AI agent outputs and prompts, contributing to LLM evaluation and metrics using tools like Deepchecks.
- Write scripts and automation to test AI agents and backend workflows.
- Build lightweight automation frameworks and develop or extend test infrastructure.
- Dive into new, unfamiliar apps and services quickly, learning them and building tests as if you were an end-user.
- Anticipate edge cases and potential failure modes by getting into the user's perspective.
- Work hands-on with integrations and modern collaboration tools.
- Test and validate backend workflows that connect user data across systems.
- Collaborate closely with cross-functional teams in a startup environment.
What We're Looking For
- 0–3 years in backend development or testing, ideally in a startup or experimental role.
- Able to write backend code, debug, and write test cases in Python or JavaScript.
- Not a traditional tester; think creatively, experiment boldly, and approach testing like teaching a child.
- Exposure to or experience testing and prompting AI agents, especially GenAI/LLM-based systems.
- Comfortable writing backend scripts, automating tests, and creating testing frameworks.
- Exposure to integrations, understanding of API interactions and authentication.
- Curious and fast learner, comfortable diving into completely unknown tools or apps.
- Super creative in designing tests beyond clicking around.
- Willing to experiment, work on ambiguous problems, and wear multiple hats.
Nice to Have
- Familiarity with tools like Cursor or Windsurf.
- Knowledge of modern automation frameworks (e.g., Selenium).
- Experience in B2B product environments.
- Experience with LLM evaluation frameworks, metrics, or MCPs.
- Experience with integrations like HubSpot or Salesforce.
Technical Stack
- Python, JavaScript
- Deepchecks, Selenium
Team & Environment
You'll join cross-functional teams in a startup environment. We are a small, scrappy group with a strong bent toward failing fast, a bias for action, and attention to detail.
Benefits & Compensation
- Work at the intersection of backend engineering, AI, and creative testing.
- A fast-moving, supportive startup culture that values experimentation and creativity.
- Opportunity to work with cutting-edge AI systems, tools, and frameworks.
- Learn and grow rapidly alongside a talented and collaborative team.
Work Mode
This is a global role with team members in San Francisco, CA and Bangalore, India.
ObviouslyAI is an equal opportunity employer.



