-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Auto labeling using GPT-4 #35
base: main
Are you sure you want to change the base?
Conversation
Reviewer's Guide by SourceryThis pull request introduces a new script File-Level Changes
Tips
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @imenelydiaker - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 5 issues found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment to tell me if it was helpful.
""" | ||
|
||
import os | ||
import random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: Unused import 'random'
The 'random' module is imported but not used anywhere in the script. Consider removing it to keep the code clean.
) | ||
|
||
pred = response.choices[0].message.content | ||
pred = pred[pred.rfind("{"):pred.rfind("}")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue (bug_risk): Potential off-by-one error
The slicing operation might exclude the closing brace '}'. Consider using 'pred.rfind("}") + 1' to include it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally add a image limit to debug, like only labelling the 10 first (or random) images
Co-authored-by: Yoann Poupart <[email protected]>
Co-authored-by: Yoann Poupart <[email protected]>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: Yoann Poupart <[email protected]>
Summary by Sourcery
This pull request adds a new script for pre-labeling datasets using GPT-4. The script downloads metadata and votes, processes images, generates labels using the OpenAI API, and saves or uploads the results to the Hugging Face Hub.