🌿 Back to all jobs

🥝 AI Evaluations Engineer

ConnexAI | manchester, United-Kingdom | Posted May 31, 2026

Job Description

This role sits at the centre of how we measure and improve AI systems in production.

You’ll define what good performance means across LLMs, ASR, TTS, and full speech-to-speech pipelines, and build the datasets, metrics, and evaluation systems that make AI quality measurable and comparable in the real world.

You’ll work closely with engineering and product teams to ensure model changes lead to real improvements in user experience, not just better offline benchmarks.

What you’ll do

  • Design and run evaluations across LLM, ASR, TTS, and speech-to-speech systems
  • Build real-world datasets and test cases from production behaviour and edge cases
  • Define metrics and scorecards for model and system quality
  • Benchmark internal models against external and frontier systems
  • Build Python tools to automate evaluation workflows
  • Create internal leaderboards, red-teaming setups, and regression tests
  • Work with engi...

Apply for This Position

Submit Application