🥝 AI Evaluations Engineer

ConnexAI | manchester, United-Kingdom | Posted May 31, 2026

Job Description

This role sits at the centre of how we measure and improve AI systems in production. 
You’ll define what good performance means across LLMs, ASR, TTS, and full speech-to-speech pipelines, and build the datasets, metrics, and evaluation systems that make AI quality measurable and comparable in the real world. 
You’ll work closely with engineering and product teams to ensure model changes lead to real improvements in user experience, not just better offline benchmarks. 
What you’ll do Design and run evaluations across LLM, ASR, TTS, and speech-to-speech systems 
Build real-world datasets and test cases from production behaviour and edge cases 
Define metrics and scorecards for model and system quality 
Benchmark internal models against external and frontier systems 
Build Python tools to automate evaluation workflows 
Create internal leaderboards, red-teaming setups, and regression tests 
Work with engi...
        

🥝 AI Evaluations Engineer

AI Evaluations Engineer

Job Description

What you’ll do

Apply for This Position