add benchmark mme-realworld-lite #606

yfzhang114 · 2024-11-15T02:59:09Z

PR Summary

This pull request introduces one key updates:

Updated the download link for MME-RealWorld, and added a new version, MME-RealWorld-lite, which samples 50 instances per task from MME-RealWorld for inference acceleration.

Performance Results on MME-RealWorld-lite

Below are the performance results of the Slime model, along with other models, on the MME-RealWorld-lite dataset.

Method	LLM	Overall	Perception						Reasoning
			OCR	RS	DT	MO	AD	Avg	OCR	DT	MO	AD	Avg
GPT-4o	-	46.4	81	45	65	34	37	49.1	72	50	42	33	42.1
GPT-4o-mini	-	37.4	70	23	62	19	34	38.8	57	39	19	35	35.2
Qwen2-VL	Qwen2-7B	46.7	86	40	74	28	36	48.2	73	46	47	36	44.4
LLaVA-OV	Qwen2-7B	45.8	82	51	64	34	45	52.8	71	43	45	35	42.7
Slime	Llama3-8B	34.7	58	36	51	29	33	37.7	51	27	38	34	36.4

yfzhang114 added 3 commits November 15, 2024 02:56

add mme-realworld-lite

eae448e

add MME-RealWorld-Lite

5bbdf8a

add benchmark mme-realworld-lite

53c5e1c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add benchmark mme-realworld-lite #606

add benchmark mme-realworld-lite #606

yfzhang114 commented Nov 15, 2024

add benchmark mme-realworld-lite #606

Are you sure you want to change the base?

add benchmark mme-realworld-lite #606

Conversation

yfzhang114 commented Nov 15, 2024

PR Summary

Performance Results on MME-RealWorld-lite