-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support JSON lines files #120
Comments
Fantastic idea! No timeline yet on implementation, but definitely a very useful feature. I've run into this myself :) |
Actually @ivbeg, would you be able to describe your ideal interface for such a feature? Would the program run the query over each json line individually, or treat the whole file as a large array? |
@evinism It would be great to support both ways to process JSON lines files, but streaming feature would be more important since there are huge JSON lines files, up to 100GB+ compressed. I could provide several examples from public datasets if needed. It's nearly impossible to process such files as a large array. I've developed cmd tool undatum (https://github.com/datacoon/undatum) that support data processing and conversion of JSON lines and BSON files. BSON is a binary format used by MongoDB NoSQL database, very similar to JSON lines . So I would like to integrate query language into undatum to use it with data processing/conversion operations. I've already used dictquery (https://github.com/cyberlis/dictquery) but it's good for filtering only. |
streaming mode for processing jsonl sounds right to me too. Not sure when I'll get to this, but definitely something I want to tackle. |
@evinism I've added experimental support of mistql to undatum, it's supported in main https://github.com/datacoon/undatum version 1.0.13 I hope it could help. |
Adding @ilan-pinto to this thread. For now, let's work on getting this up and running in Python. |
Hi |
For reference, a possible interface for this feature could be as such:
Note that the query is performed in a streaming manner -- for each JSON line in |
Please add support of JSON lines files https://jsonlines.org/
There are a lot of such files published and used. Sometimes they are huge and hard to convert to JSON
The text was updated successfully, but these errors were encountered: