Each project here is built to production standards: typed boundaries, full observability, and an evaluation harness with published results. Source code and benchmark data are public wherever possible.

RAG Starter
A production-patterned RAG pipeline with a full evaluation harness: faithfulness, relevance, and Precision@3 scoring, error rate as a first-class metric, and end-to-end Langfuse tracing.