question in Visual Grounding, what format of the region should i give? #164

LokiXun · 2024-07-23T08:13:48Z

Hi, I have a question for visual grounding.
I have a 720x1280 image and i want to describe the region in [0,0, 512,512] (x1,y1, x2,y2)so I follow the CogVLM1's suggestion to change the coordinate this way ( https://github.com/THUDM/CogVLM?tab=readme-ov-file#cookbook )

Format of coordination: The bounding box coordinates in the model's input and output use the format [[x1, y1, x2, y2]], with the origin at the top left corner, the x-axis to the right, and the y-axis downward. (x1, y1) and (x2, y2) are the top-left and bottom-right corners, respectively, with values as relative coordinates multiplied by 1000 (prefixed with zeros to three digits).

So my prompt is the following but the model tend to provide me a description of the whole image, Is my prompt right?

Tell me what you see within the designated area [[000,000,400,712]] in the picture

# this is how I get the region value
origin region [0,0,512,512]
target format: [[000,000,512/1280*1000, 512/720*1000]] >> [[000,000,400,712]]

example:

Tell me what you see within the designated area [[000,000,400,712]] in the picture. Describe each object in a simple sentence is enough.

image

CogVLM2's result

CogVLM2: Within the designated area, the foreground displays a green bus, parked cars, and a pedestrian crossing sign, while the background includes a blue bus stop sign, trees, and a building, all under a clear sky.<|end_of_text|>

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question in Visual Grounding, what format of the region should i give? #164

question in Visual Grounding, what format of the region should i give? #164

LokiXun commented Jul 23, 2024 •

edited

Loading

question in Visual Grounding, what format of the region should i give? #164

question in Visual Grounding, what format of the region should i give? #164

Comments

LokiXun commented Jul 23, 2024 • edited Loading

LokiXun commented Jul 23, 2024 •

edited

Loading