Project description
Join the Data Engineering team to contribute to the ongoing maintenance and improvement of an internal LLM‑powered assistant that uses hosted LLM APIs and internal knowledge sources, with a focus on reliability, retrieval quality, and operational excellence.
Responsibilities
- Maintain and enhance ingestion/enrichment pipelines for internal content (parsing/extraction, normalization, metadata enrichment, deduplication, and quality monitoring)
- Improve indexing and retrieval performance and quality (chunking/segmentation refinements, embedding/index update workflows, metadata filtering, caching) and support hybrid retrieval capabilities (vector + keyword/BM25 + metadata)
- Implement and maintain access‑aware retrieval by propagating/enforcing document permissions through indexing and query‑time filters, including audit logs and validation tests
- Improve source attribution so responses reliably point to t...