Production LLM observability: my eval dashboard showed 1 trace, I expected 120

My Eval Dashboard Showed 1 Trace. I Expected 120.

Adding Langfuse to a RAG pipeline looked done until the dashboard showed one trace and zero scores. The real problem was trace structure, not instrumentation. Here is the gotcha, the fix, and the numbers.

May 28, 2026 · 5 min · Jack Monte
I built an eval harness for my RAG pipeline: 40 questions, 3 scorers, and the one number that told me where it breaks

I Built an Eval Harness for My RAG Pipeline. Here's What the Numbers Revealed.

An automated eval harness with 40 golden questions and three scorers turned ’looks reasonable’ into a precise diagnosis of where my RAG pipeline actually breaks.

May 21, 2026 · 6 min · Jack Monte