[FR] Additional built-in LLM judges for safety, coherence, agent planning, ... · mlflow/mlflow#19061

(3 commentaires) (0 réactions) (1 assigné)Python (3 904 forks)batch import

area/evaluationdomain/genaienhancementhelp wanted

Métriques du dépôt

I cannot contribute this myself, and am requesting help from other contributors

Expand MLflow's built-in judge library.

These judges should be production-ready, well-tested, and work out-of-the-box with minimal configuration.

Add the following built-in judges to mlflow.genai.scorers:

1. Conversational Safety (multi-turn)

mlflow.genai.scorers.ConversationSafety()

2. Conversational Tool Call Efficiency (multi-turn)

mlflow.genai.scorers.ConversationalToolCallEfficiency()

3. Conversational Role Adherence (multi-turn)

mlflow.genai.scorers.ConversationalRoleAdherence()

5. Conversational Coherence (multi-turn)

mlflow.genai.scorers.ConversationalCoherence()

6. Agent Plan Quality (multi-turn)

mlflow.genai.scorers.AgentPlanQuality()

Consistent API: All judges should follow the same calling convention as existing built-in judges
Documentation: Comprehensive examples showing when to use each judge

Direction de recherche: Examinez les juges intégrés existants dans `mlflow.genai.scorers` et implémentez de nouveaux juges en suivant le même modèle d'API. Assurez des tests et une documentation adéquats.
Stack technique: python
Domaine: aimachine learning
Type d'issue: Fonctionnalité
Difficulté: 3
Temps estimé: 1-2 jours
Statut d'activité: Active
Clarté: Claire
Prérequis: PythonGit
Accessibilité débutant: 60