prompt attack #87

zl-comment · 2024-10-16T08:07:56Z

eg
"Determine if the given pair of sentences displays entailment, neutral, or contradiction. Respond with 'entailment', 'neutral', or 'contradiction'\nQuestion: {content}\nAnswer:"
The prompt words Answer and Question cannot be modified. If they are attacked, the accuracy will drop a lot. I did not see this in the prompt words in your conclusion.

Immortalise · 2024-10-16T17:43:19Z

Yes, in our setup, terms like ‘entailment’ and ‘Question’ are fixed and unmodifiable because we believe changing them is unnecessary and would cause a significant shift in meaning compared to the original clean prompts.

In this example, you can define your own unmodifiable words.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

prompt attack #87

prompt attack #87

zl-comment commented Oct 16, 2024

Immortalise commented Oct 16, 2024

prompt attack #87

prompt attack #87

Comments

zl-comment commented Oct 16, 2024

Immortalise commented Oct 16, 2024