-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Passing through comments with --pass-comments
may modify comments in CSV data
#1346
Comments
--pass-comments
may modify comments
@kusalananda I wish I had a better answer here. The way CSV records are detected are by handing them off to the CSV-parser library. And what it gives back is its own output, with no access to the underlying/original data. In Miller 5 and below, when Miller was in C, I wrote my own CSV parser and could do whatever with it. In Miller 6 and above, which is in Go, I chose to use the https://pkg.go.dev/encoding/csv package, because hey, standard libraries are better, right? But in fact I've had to do all manner of workarounds for weird issues with it. It tempts me to go back and hand-roll my own CSV parser in Go --- basically, porting the Miller <= 5 CSV parser from C to Go. That would be a fun project, and I'd enjoy it a lot. However, it would take a bit of time .... |
Hi there John, My initial reason for submitting this issue here was me noticing it in passing while helping someone with an issue on the Unix&Linux StackExchange site (where I am a moderator and a frequent proponent of Miller). I noticed that the user's comment was modified in the way I mentioned here when I applied my suggested Miller solution to their data. So, since it's an issue that I noticed in the result of a one-off short throwaway command when applied to what I think is example data, I'm not personally invested in or concerned about this. It's more of a "by the way, did you know" sort of bug report. If you want to re-implement the CSV parser in our own Go code (after making sure no other CSV parsing library provides you with any hooks you can use for solving this), go ahead and do that. That might be an interesting project, but it may also give you more future headaches. I don't know. Thanks, and be well! |
@kusalananda thanks! I do want to make this change -- at some point in time -- when time allows. And I really appreciate you surfacing it. The software should not mutate comments. :) |
Why not report the issue upstream so the library can be fixed instead? And when it's updated, miller can get the fix by updating its dependency. |
@janxkoci 🤔 generally I would agree, but here, I'm using the Go CSV-parsing library in a way it really wasn't intended ... I think this codemod should stay in Miller. |
--pass-comments
may modify comments--pass-comments
may modify comments in CSV data
When parsing a CSV data set that contains comments (lines starting with
#
), the--pass-comments
option should ("I expect it to") cause Miller to pass the comment through unchanged. However, if such a comment line contains the input field delimiter the comment is changed to include quotes.Example command:
This outputs the following (note the changed comment line at the end):
Version tested is
mlr version 6.8.0-dev for openbsd/amd64/go1.20.1
.The text was updated successfully, but these errors were encountered: