Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto labeling using GPT-4 #35

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from
Draft

Auto labeling using GPT-4 #35

wants to merge 12 commits into from

Conversation

imenelydiaker
Copy link
Collaborator

@imenelydiaker imenelydiaker commented Jun 18, 2024

  • Prompt for labeling with GPT-4 variants
  • Script to compare labels from GPT with human annotations
  • Update all entries with username: "auto" or "gpt"

Summary by Sourcery

This pull request adds a new script for pre-labeling datasets using GPT-4. The script downloads metadata and votes, processes images, generates labels using the OpenAI API, and saves or uploads the results to the Hugging Face Hub.

  • New Features:
    • Introduced a script for pre-labeling the dataset using GPT-4, which includes functionality to download metadata and votes, process images, and generate labels using the OpenAI API.
  • Enhancements:
    • Added functionality to save and upload metadata and votes to the Hugging Face Hub.

Copy link
Contributor

sourcery-ai bot commented Jun 18, 2024

Reviewer's Guide by Sourcery

This pull request introduces a new script auto_labeling_using_llm.py to automate the pre-labeling of datasets using GPT-4. The script includes functions for handling metadata, retrieving votes, computing concepts, and making requests to the OpenAI API. The main function orchestrates the entire labeling process and saves the results locally or pushes them to the hub.

File-Level Changes

Files Changes
scripts/auto_labeling_using_llm.py Introduced a new script to automate dataset labeling using GPT-4, including functions for metadata handling, OpenAI API requests, and result processing.

Tips
  • Trigger a new Sourcery review by commenting @sourcery-ai review on the pull request.
  • You can change your review settings at any time by accessing your dashboard:
    • Enable or disable the Sourcery-generated pull request summary or reviewer's guide;
    • Change the review language;
  • You can always contact us if you have any questions or feedback.

@imenelydiaker imenelydiaker marked this pull request as draft June 18, 2024 11:58
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @imenelydiaker - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 5 issues found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.

"""

import os
import random
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: Unused import 'random'

The 'random' module is imported but not used anywhere in the script. Consider removing it to keep the code clean.

scripts/auto_labeling_using_llm.py Outdated Show resolved Hide resolved
scripts/auto_labeling_using_llm.py Outdated Show resolved Hide resolved
)

pred = response.choices[0].message.content
pred = pred[pred.rfind("{"):pred.rfind("}")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Potential off-by-one error

The slicing operation might exclude the closing brace '}'. Consider using 'pred.rfind("}") + 1' to include it.

scripts/auto_labeling_using_llm.py Show resolved Hide resolved
scripts/auto_labeling_using_llm.py Outdated Show resolved Hide resolved
Copy link
Owner

@Xmaster6y Xmaster6y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally add a image limit to debug, like only labelling the 10 first (or random) images

Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants