fix: trace LLMEvaluator runs #1185

isahers1 · 2024-11-06T19:52:07Z

No description provided.

hinthornw · 2024-11-07T00:28:41Z

python/langsmith/evaluation/llm_evaluator.py

        if isinstance(self.score_config, CategoricalScoreConfig):
            value = output["score"]
-            explanation = output.get("explanation", None)
+            explanation = output.get(self.reasoning_key, None)


I think we should handle if any of (self.reasoning_key, "reasoning", "explanation", "comment") are provided

am slightly confused on how something other than self.score_config.reasoning_key would exist. In the _create_score_json_schema don't we only set score_config.reasoning_key? I am probably just missing something simple

python/langsmith/evaluation/llm_evaluator.py

hinthornw

Nice! Just a few nits left hten should be good

hinthornw · 2024-11-08T23:29:04Z

python/langsmith/evaluation/llm_evaluator.py

-    include_explanation: bool = False
-    explanation_description: Optional[str] = None
+    reasoning_key: Optional[str] = None
+    reasoning_description: Optional[str] = None


This is a breaking change, so we'll need to be able to handle accepting the other value in the init and log a deprecation warning

hinthornw · 2024-11-08T23:29:54Z

python/langsmith/evaluation/llm_evaluator.py

    if isinstance(score_config, CategoricalScoreConfig):
-        properties["score"] = {
+        properties["value"] = {
            "type": "string",
            "enum": score_config.choices,
            "description": f"The score for the evaluation, one of "


It isn't a score anymore, is it? It's the selected category.
The descriptions here matter a lot

python/langsmith/evaluation/llm_evaluator.py

hinthornw · 2024-11-08T23:33:41Z

python/langsmith/evaluation/llm_evaluator.py

+            await self.runnable.ainvoke(variables, config={"run_id": source_run_id}),
+        )
+
+        return self._parse_output(output, str(source_run_id))


nit: Could leave as a UUID

hinthornw · 2024-11-08T23:34:21Z

python/tests/integration_tests/test_llm_evaluator.py

@@ -15,7 +15,7 @@ def test_llm_evaluator_init() -> None:
            key="vagueness",
            choices=["Y", "N"],
            description="Whether the response is vague. Y for yes, N for no.",
-            include_explanation=True,


Should keep a couple of these tests with include_explanation around for backward compat testing

added some backward compatibility tests

hinthornw · 2024-11-12T18:31:41Z

python/langsmith/evaluation/llm_evaluator.py

+        description (str): Detailed description provided to the LLM judge of what
+            this score evaluates.
+        reasoning_key (Optional[str]): Key used to store the reasoning/explanation
+            for the score. Defaults to None.


Say what None means (don't include a reasoning/ CoT field)
Ditto with description (Defaults to ""The explanation for the score."")
I still think we should have a better default than "The explanation for the score." - like "Think step-by-step about what the correct score should be."

hinthornw

Almost there!

hinthornw · 2024-11-13T00:20:46Z

python/langsmith/evaluation/llm_evaluator.py

+        if data.get("include_explanation") and data.get("reasoning_key"):
+            raise ValueError(
+                "Cannot include both include_explanation and reasoning_key, "
+                "please just use reasoning_key - include_explanation has been deprecated"  # noqa: E501


nit: If we're already splitting across lines let's split them within the limit rather than using noqa

hinthornw · 2024-11-13T00:22:29Z

python/langsmith/evaluation/llm_evaluator.py

+    include_explanation: bool = False  # Deprecated
+    explanation_description: Optional[str] = None  # Deprecated
+
+    def __init__(self, **data):


A sad byproduct of us doing this is you lose IDE completion. Could we add the keys here?
Ditto with below.

hinthornw · 2024-11-13T00:23:17Z

python/tests/integration_tests/test_llm_evaluator.py

+@pytest.mark.parametrize(
+    "config_class", [CategoricalScoreConfig, ContinuousScoreConfig]
+)
+def test_backwards_compatibility(config_class) -> None:


nit: Thse could probably go in unit_tests/* ratehr than in integration tests

isahers1 added 3 commits November 6, 2024 11:51

wip

891d999

custom reasoning key

6e8fb9b

fmt

5879497

hinthornw reviewed Nov 7, 2024

View reviewed changes

python/langsmith/evaluation/llm_evaluator.py Outdated Show resolved Hide resolved

python/langsmith/evaluation/llm_evaluator.py Outdated Show resolved Hide resolved

python/langsmith/evaluation/llm_evaluator.py Outdated Show resolved Hide resolved

isahers1 added 5 commits November 6, 2024 18:33

simplification (thanks will!)

1b57c8b

will comments

40f95be

tests

7aead74

fmt

3adfdce

fmt

8a35c07

hinthornw reviewed Nov 8, 2024

View reviewed changes

hinthornw reviewed Nov 12, 2024

View reviewed changes

will comments

37b6212

hinthornw reviewed Nov 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: trace LLMEvaluator runs #1185

fix: trace LLMEvaluator runs #1185

isahers1 commented Nov 6, 2024

hinthornw Nov 7, 2024

isahers1 Nov 7, 2024

hinthornw left a comment

hinthornw Nov 8, 2024

hinthornw Nov 8, 2024

hinthornw Nov 8, 2024

hinthornw Nov 8, 2024

isahers1 Nov 12, 2024

hinthornw Nov 12, 2024

hinthornw left a comment

hinthornw Nov 13, 2024

hinthornw Nov 13, 2024

hinthornw Nov 13, 2024

fix: trace LLMEvaluator runs #1185

Are you sure you want to change the base?

fix: trace LLMEvaluator runs #1185

Conversation

isahers1 commented Nov 6, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hinthornw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hinthornw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment