Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError: 'user_input' when calculating RAGAS metric #2056

Closed
PierreMesure opened this issue Oct 29, 2024 · 7 comments
Closed

KeyError: 'user_input' when calculating RAGAS metric #2056

PierreMesure opened this issue Oct 29, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@PierreMesure
Copy link
Contributor

PierreMesure commented Oct 29, 2024

Issue Type

Bug

Source

source

Giskard Library Version

2.15.3

OS Platform and Distribution

No response

Python version

No response

Installed python packages

ragas==0.2.2

Current Behaviour?

When trying to evaluate a RAG assistant with some RAGAS metrics (context recall), the evaluation fails. See stacktrace below.
This happens when trying to provide the answer as AgentAnswer. We're not super clear about what should be in the documents parameter, the documentation doesn't give any clear example. We're using LlamaIndex so agent_output.source_nodes doesn't return a list of strings. Here's what we've tried:

Standalone code OR list down the steps to reproduce the issue

def answer_fn(question: str, history: List[dict] = []) -> AgentAnswer:
    
    chat_history = [ChatMessage(role=MessageRole.USER, content=msg["content"]) for msg in history]
    agent_output = chat_engine.chat(question, chat_history=chat_history)

    answer = agent_output.response
    # documents = agent_output.source_nodes
    documents = [node.text for node in agent_output.source_nodes if hasattr(node.node, 'text')]

    return AgentAnswer(message=answer,documents=documents)
evaluate(
            answer_fn,
            testset=testset,
            knowledge_base=knowledge_base,
            metrics=[ragas_context_recall, ragas_context_precision, ragas_faithfulness, ragas_answer_relevancy]
        )

Relevant log output

  File "/evaluation_manager.py", line 310, in _run_giskard_evaluation_and_return_generated_report
    return evaluate(
           ^^^^^^^^^
  File "/giskard/rag/evaluate.py", line 105, in evaluate
    metrics_results[sample["id"]].update(metric(sample, answer))
                                         ^^^^^^^^^^^^^^^^^^^^^^
  File "/giskard/rag/metrics/ragas_metrics.py", line 119, in __call__
    return {self.name: self.metric.score(ragas_sample)}
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragas/utils.py", line 159, in emit_warning
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/ragas/metrics/base.py", line 121, in score
    raise e
  File "/ragas/metrics/base.py", line 117, in score
    score = loop.run_until_complete(self._ascore(row=row, callbacks=group_cm))
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/nest_asyncio.py", line 98, in run_until_complete
    return f.result()
           ^^^^^^^^^^
  File "/asyncio/futures.py", line 203, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "/asyncio/tasks.py", line 314, in __step_run_and_handle_result
    result = coro.send(None)
             ^^^^^^^^^^^^^^^
  File "/ragas/metrics/_context_recall.py", line 191, in _ascore
    return await super()._ascore(row, callbacks)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/ragas/metrics/_context_recall.py", line 156, in _ascore
    question=row["user_input"],
             ~~~^^^^^^^^^^^^^^
KeyError: 'user_input'
@Snow31ind
Copy link

I'm experiencing the same issue, and I believe this is a quite critical bug. I couldn't run evaluations with additional metrics. Any update on this?

@alexcombessie
Copy link
Member

Hey @PierreMesure and @Snow31ind - This should be solved by #2052

Can you try again with Giskard latest release?

@Snow31ind
Copy link

Snow31ind commented Oct 31, 2024

@alexcombessie Thank for replying to us. Let me input more context on this issue. The version of giskard and ragas in my requirements.txt file is:

giskard==2.15.3
ragas==0.2.2

I believe 2.15.3 is the latest release, and I still see the same error thrown as above.

After inspecting the stack traceback, I'm wondering if the ragas sample in the giskard ragas metric wrapper matches the required interface in the base ragas metric score method, as the sample doesn't contain the user_input key. That's why I strongly believe it's the root cause.

Could you help double check on that? And is there any tests being run to make sure there's no data interface mismatch?

@PierreMesure
Copy link
Contributor Author

PierreMesure commented Oct 31, 2024

I just reverted ragas to 0.1.21 and it works. 😊

@alexcombessie, I reported another problem fixed by #2052, I don't think this PR will fix it.
I think the problem stemmed from a change in the name of the parameters by RAGAS. In this commit, you can see the change in the documentation. I think the change in variable names comes from this PR

@Snow31ind
Copy link

@PierreMesure Awesome! You make my day. Anyway, this issue is worth having a fixed soon. Thanks team!

@alexcombessie
Copy link
Member

Thanks! @henchaves could you have a look when you have time next week 🙏 ?

@henchaves
Copy link
Member

Hello @PierreMesure and @Snow31ind, thanks a lot for reporting this bug.
Indeed RAGAS v0.2 has changed the name of parameters, which are breaking the RagasMetric call.
I opened a PR to make it compatible with the old version (v0.1) and also the last one (v0.2): #2073

It should be reviewed and merged soon!
Thanks again and sorry for the delay!

@henchaves henchaves added the bug Something isn't working label Nov 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

No branches or pull requests

4 participants