Testable Minds
Crowdsource human evaluation of Muster traces using Testable Minds' pre-screened participant pool.
Testable Minds connects Muster (langfuse) traces to a pre-screened participant pool for human evaluation — useful when LLM-as-a-judge isn't enough and you need real users to score outputs.
Setup
1. Create accounts
Register at testable.org/ai/langfuse and select Langfuse during sign-up.
2. Define score configs in Muster
Create at least one score configuration in Muster — that's what participants will rate against.
3. Connect
In Testable's Account → Connections, paste your Muster Secret Key, Public Key, and select your server region. Verify the connection.
4. Create a study
Configure participant demographics, choose which Muster score configs to evaluate, optionally filter traces by tags or environment, and set a minimum trace threshold before launching evaluation sessions.
5. Monitor results
As participants complete evaluations, scores automatically populate the corresponding Muster traces.
Key features
- Quality assurance — built-in attention checks filter out low-quality responses.
- Flexible filtering — target traces by tags and environments with AND/OR logic.
- Multi-config support — evaluate traces against multiple score configurations simultaneously.
- Scalable — run multiple independent studies concurrently.
The platform automatically batches traces into evaluation sessions once your minimum count is reached.