The Ingestion team is responsible for everything that happens between content arriving from a connector and that content being ready for search and retrieval. This means document processing pipelines that handle parsing, text extraction, chunking, metadata enrichment, embedding generation, and index population — across every file format and content type our customers throw at us.
We’re in the middle of a significant architectural evolution — migrating from a legacy pipeline to a modern, workflow-orchestrated architecture with cleanly separated processing stages: intake, transformation, enrichment, and indexing. The team is also actively designing the next iteration of the pipeline to push further on throughput and resilience.
This is real systems engineering: the problems are about scale, reliability, and the messy realities of processing millions of documents with wildly different structures.