multi-dimensional-metrics

Here is 1 public repository matching this topic...

daqh / llm-eval

This project applies the LLM-Eval framework to the PersonaChat dataset to assess response quality in a conversational context. Using GPT-4o-mini via the OpenAI API, the system generates scores (on a 0-5 or 0-100 scale) for four evaluation metrics: context, grammar, relevance, and appropriateness.

gpt llm llm-eval multi-dimensional-metrics

Updated Mar 24, 2025
Python

Improve this page

Add a description, image, and links to the multi-dimensional-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multi-dimensional-metrics topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multi-dimensional-metrics

Here is 1 public repository matching this topic...

daqh / llm-eval

Improve this page

Add this topic to your repo