Integrations
Ragas
Evaluate RAG pipelines with Ragas and feed scores back into Muster traces.
Ragas is an open-source tool for model-based evaluation of Retrieval-Augmented Generation (RAG) pipelines. Together with Muster's tracing, scoring, and analytics, you get a complete loop from instrumentation to evaluation to reporting.
What you can do
Ragas:
- Generate synthetic test sets for pipeline assessment.
- Reference-free evaluation without ground-truth data.
- Performance metrics including faithfulness, answer relevancy, and context precision.
- Custom prompt optimization with automatic adaptation.
- CI/CD pipeline integration via Pytest.
Muster:
- Span- and trace-level scoring.
- Segmentation and analytics to identify performance gaps.
- Detailed reporting per use case and user segment.
- Integrations with OpenAI, LangChain, LlamaIndex, and more.
How it fits together
Typical workflow:
- Instrument your RAG pipeline with Muster (via the OpenAI integration, LangChain callback, or LlamaIndex instrumentor).
- Run a Ragas evaluation against a sample of your traces or a dedicated test set.
- Push Ragas scores back to the corresponding Muster traces using
langfuse.create_score(...). - Slice and dice the resulting score distribution in Muster's analytics.
from langfuse import get_client
langfuse = get_client()
# Suppose you ran Ragas and got per-trace results
for trace_id, ragas_result in ragas_results.items():
langfuse.create_score(
trace_id=trace_id,
name="ragas-faithfulness",
value=ragas_result["faithfulness"],
)