Job Description
Okara — a private, faster & safer chat interface to access the latest open-source AI models. What you'll do Integrate and serve open-source LLMs; build a fast, reliable chat pipeline. Tune prompts/parameters and add guardrails, evals, and fallbacks. Build retrieval + memory (RAG) with clean context handling. Add research tools and productivity connectors; handle auth/rate limits safely. Instrument performance and costs; drive steady latency and reliability wins. Ship features end-to-end with product/design; maintain clear APIs and docs.
You're a fit if you Have shipped LLM features to production and can debug messy edge cases. Understand model behavior (sampling, context windows, function/tool use). Know how to measure and reduce latency, errors, and token spend. Are comfortable with data stores, queues, and observability basics. Care deeply about privacy, security, and responsible data handling. Move fast, own outcomes, and communicate crisply.
Nice to have Experience with model...