Musterby Elitery
Integrations

Ragas

Evaluate RAG pipelines with Ragas and feed scores back into Muster traces.

Ragas is an open-source tool for model-based evaluation of Retrieval-Augmented Generation (RAG) pipelines. Together with Muster's tracing, scoring, and analytics, you get a complete loop from instrumentation to evaluation to reporting.

What you can do

Ragas:

  • Generate synthetic test sets for pipeline assessment.
  • Reference-free evaluation without ground-truth data.
  • Performance metrics including faithfulness, answer relevancy, and context precision.
  • Custom prompt optimization with automatic adaptation.
  • CI/CD pipeline integration via Pytest.

Muster:

  • Span- and trace-level scoring.
  • Segmentation and analytics to identify performance gaps.
  • Detailed reporting per use case and user segment.
  • Integrations with OpenAI, LangChain, LlamaIndex, and more.

How it fits together

Typical workflow:

  1. Instrument your RAG pipeline with Muster (via the OpenAI integration, LangChain callback, or LlamaIndex instrumentor).
  2. Run a Ragas evaluation against a sample of your traces or a dedicated test set.
  3. Push Ragas scores back to the corresponding Muster traces using langfuse.create_score(...).
  4. Slice and dice the resulting score distribution in Muster's analytics.
from langfuse import get_client

langfuse = get_client()

# Suppose you ran Ragas and got per-trace results
for trace_id, ragas_result in ragas_results.items():
    langfuse.create_score(
        trace_id=trace_id,
        name="ragas-faithfulness",
        value=ragas_result["faithfulness"],
    )

See also