Adding reasoning to your AI? Take these resources, they may help you on your way.
AGI/causality/frml grammar | ||
---|---|---|
Deepmind Chomsky Hierarchy | Problems crafted for FSM/PDA/TM | [1] |
automata | a neurallambda tool to gen from grammars | [1] |
im a strange dataset | Tough for LLMs because of self-references. | [1] |
DiagGSM8k | NL Reasoning Benchmark | [1] |
CLadder | Causal reasoning | [1] |
Cause-Effect Pairs | 108 datasets of 2 var dynamics (not NL) | [1] |
MNLI Entailment | sentence parsing + entailment | [1] |
AGENT/TOOL | ||
---|---|---|
THUDM AgentInstruct | long form dialogs | [1] |
WANG AgentInstruct | gpt3 synthesized instructions | [1] |
KnowLM Tool | prompt + tool call + answer | [1] |
Glaive Tool Usage | sys prompt says tools + prompt + answer | [1] |
opentoolformer retrieval | prompt + tool call | [1] |
CODE | ||
---|---|---|
rosetta | same program, many diff languages | [1] |
EvoEval Tool Use | 100 prompt + code + tests | [1] |
MATH/LOGIC | ||
---|---|---|
gsm8k | Grade School Math 8k | [1] |
MetaMath | one-shot math | [1] |
MetaMathFewShot | few-shot math | [1] |
MathPile | 9B tok from filtered internet | [1] |
LogiQA | NL multi choice, requires abstraction | [1] |
Logic-LM | a model combining auto theorem provers and llms | [1] |
Coq Facts | 270k cog theorem prover programs | [1] |
NATURAL LANGUAGE | ||
---|---|---|
Nous Open Reasoning | community contrib tasks | [1] |
UltraInteract_sft | GPT generated iterated reasoning dialogs | [1] |
CoGnition | NL compositional generalization | [1] |
Winogrande | ambiguous sentences, fill in 1 word | [1] |
Winograd_wsc | ambiguous sentences, choose the right word | [1] |
Contradiction | 2 phrases, do they contradict | [1] |
Recognizing Textual Entailment | 2 phrases, do they entail each other | [1] |
Textual Entailment Pool | more entailment | [1] |
Answer Validation | 2 phrases, does the answer solve question | [1] |
Monotonicity Entailment | x is true, does y follow | [1] |
entailment | passage, question -> T/F | [1] |
Commonsense QA | muti choice QA | [1] |
GLUE | several datasets | [1] |
custom multi-hop | use wikipedia's graph of articles | |
MUD videogames | (various could be training data) | |
skunkworks/reasoning | wide variety of NL tasks | [1] |
TOY PROBLEMS | ||
---|---|---|
arc-like | 1D visual puzzles, great seq reasoning | [1] |
re-arc | 2D reverse engineered ARC | [1] |
ARC | competition | [1] |
(misc) | xLSTM paper lists several in appendix | [1] |
expand polynomials | algebraic expansion | [Abstractor] |
linear eq | solve algebraic eqs | [Abstractor] |
Match-To-Sample | cogsci test for relational reasoning | [1] MLPs Learn In Context |
Oddball Detection | cogsci test for relational reasoning | [1] MLPs Learn In Context |
regression | with incontext learning, good reasoning test | [1] MLPs Learn In Context |
clustering | with incontext learning, good reasoning test | [1] MLPs Learn In Context |
COGS | compositional generalization | [1] |
SCAN | systematicity, "$x to the left" | [1] [2] |
clevr | 2d img of 3d shapes + natural language QA | [1] [2] |
lambda calc + beta reductions | generator code, single+multistep | [1] |
lichess-puzzles | chess puzzles | [1] |
pointer net problems | convex hull, TSP, triangulation | [1] |
Big Bench Hard | 23 challenges (only 6k datapoints) | [1] |
logical entailment dataset | logic strings by deepmind | [1] |
logical entailment dataset code | (generate it yourself) | [1] |
FSM Game | generate strings according to grammar | |
Adaptive Grammar | grammar rule might change | |
String/Graph Rewriting | string_rewriting.py |
|
LibraryOfLogic | generate NL from multiple games | [1] |
AB-XY Game | ||
word ladder | ||
parser | ||
longest cmn subseq | ||
string reversal | ||
wisconsin card sorting | ||
anagram | ||
palindrome | ||
permutation composition |
TOKEN AUGMENTED REASONING | ||
---|---|---|
Reasoning tokens | Self-Reasoning Tokens, teaching models to think ahead | [1] |
Quiet-STaR | LLMs Can Teach Themselves to Think Before Speaking | [1] |