-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The config isn't consistent between chunks #370
Comments
Hi @AugustDev sorry that it failed at ~80%. Btw, were you using And, |
Yes, LitData encodes each leaf of the pytree as a single object and therefore, it doesn't know this is a single sample. You can convert it to numpy or torch tensor directly to inform LitData this is a single item and not a list of items. |
Hi @AugustDev, I wanted to follow up and see if the solution recommended by @tchaton was helpful for you. |
Yes, @AugustDev. Let us know how it goes. Also, if you could recommend any similar publicly available data for testing on my end, that would be helpful. |
I was processing large files and received the following error. It failed at around ~80% of the data after about ~1h 20min. The full error is really long, but this is the beginning of it. I'm essentially storing 5 columns where the type of each column is a numpy array. Arrays are of variable length.
🐛 Bug
To Reproduce
Unfortunately I'm not sure how to show how to reproduce without sharing ~100gb dataset.
Additional context
Environment detail
The text was updated successfully, but these errors were encountered: