Musterby Elitery
Integrations

Testable Minds

Crowdsource human evaluation of Muster traces using Testable Minds' pre-screened participant pool.

Testable Minds connects Muster (langfuse) traces to a pre-screened participant pool for human evaluation — useful when LLM-as-a-judge isn't enough and you need real users to score outputs.

Setup

1. Create accounts

Register at testable.org/ai/langfuse and select Langfuse during sign-up.

2. Define score configs in Muster

Create at least one score configuration in Muster — that's what participants will rate against.

3. Connect

In Testable's Account → Connections, paste your Muster Secret Key, Public Key, and select your server region. Verify the connection.

4. Create a study

Configure participant demographics, choose which Muster score configs to evaluate, optionally filter traces by tags or environment, and set a minimum trace threshold before launching evaluation sessions.

5. Monitor results

As participants complete evaluations, scores automatically populate the corresponding Muster traces.

Key features

  • Quality assurance — built-in attention checks filter out low-quality responses.
  • Flexible filtering — target traces by tags and environments with AND/OR logic.
  • Multi-config support — evaluate traces against multiple score configurations simultaneously.
  • Scalable — run multiple independent studies concurrently.

The platform automatically batches traces into evaluation sessions once your minimum count is reached.

See also