Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"embedded nul in string" Issue reading GTFS files downloaded or saved on MacOS #217

Open
ttalVlatt opened this issue Oct 16, 2024 · 2 comments

Comments

@ttalVlatt
Copy link

Hi there,

When macOS downloads files, it adds attributes for security etc. that seem to throw off fread() call used in this package. For instance, a common thing macOS adds when you download a file from the internet, such as from mobilitydatabase.org is the com.apple.quarantine. When you try and read a GTFS file that has this, you get this error.

Error in data.table::fread(fs::path(tmpdir, filename), nrows = 1, colClasses = "character") : 
  embedded nul in string: '\005\026\a\0\002\0\0Mac OS X        \0\002\0\0\0\t\0\0\02\0\0\0\xa2\0\0\0\002\0\0\0\xd4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0ATTR\0\0\0\0\0\0\0\xd4\0\0\0\x98\0\0\0<\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\001\0\0\0\x98\0\0\0<\0\0\025com.apple.quarantine\0q/0083'

You can remove those by hand with xattr -dr com.apple.quarantine minneapolis-metro-transit. But even if that's removed you get

Error in data.table::fread(fs::path(tmpdir, filename), nrows = 1, colClasses = "character") : 
  embedded nul in string: '\005\026\a\0\002\0\0Mac OS X        \0\002'

I played around trying pass different encoding options to fread() but none of them seemed to work. Based off this stack overflow there seem to be a few workarounds, but none I can think of easily implementing here. Any ideas?

I don't if they will be much help if you're not on macOS, but I attached files with the problematic attributes here.

minneapolis-mvta-gtfs.zip

@polettif
Copy link
Contributor

When macOS downloads files, it adds attributes for security etc. that seem to throw off fread() call used in this package.

I don't think that's the real issue here. It looks like there were some macOS "helper" files zipped in the feed that lead to the error in fread(). I'd suggest the following workaround with unpacking and re-zipping the feed without the extra files.

library(tidytransit)

# Unzip feed to a local directory
zip::unzip("~/Downloads/minneapolis-mvta-gtfs.zip", exdir = "feed_unzipped")

# Theres a "__MACOSX" directory with mac attribute files of some kind in the zipped feed
list.files("feed_unzipped/__MACOSX", full.names = TRUE, all.files = T)
#>  [1] "feed_unzipped/__MACOSX/."                    
#>  [2] "feed_unzipped/__MACOSX/._agency.txt"         
#>  [3] "feed_unzipped/__MACOSX/._calendar_dates.txt" 
#>  [4] "feed_unzipped/__MACOSX/._fare_attributes.txt"
#>  [5] "feed_unzipped/__MACOSX/._feed_info.txt"      
#>  [6] "feed_unzipped/__MACOSX/._frequencies.txt"    
#>  [7] "feed_unzipped/__MACOSX/._routes.txt"         
#>  [8] "feed_unzipped/__MACOSX/._shapes.txt"         
#>  [9] "feed_unzipped/__MACOSX/._stop_times.txt"     
#> [10] "feed_unzipped/__MACOSX/._stops.txt"          
#> [11] "feed_unzipped/__MACOSX/._transfers.txt"      
#> [12] "feed_unzipped/__MACOSX/._trips.txt"          
#> [13] "feed_unzipped/__MACOSX/.."

# re-zip only txt/geojson files from the local directory
files_in_zip = list.files("feed_unzipped", full.names = TRUE)
files_in_zip <- files_in_zip[tools::file_ext(files_in_zip) %in% c("txt", "geojson")]
zip::zip("minneapolis-mvta-gtfs-fixed.zip", files_in_zip)

# reading the fixed feed works
g = read_gtfs("minneapolis-mvta-gtfs-fixed.zip")

Created on 2024-10-17 with reprex v2.1.1

Now, the minneapolis-mvta-gtfs.zip feed technically violates the gtfs specifications but tidytransit::read_gtfs() (or rather gtfsio::import_gtfs()) should be able to handle this somewhat gracefully.

@polettif
Copy link
Contributor

I remembered a similar issue some time ago, there's another workaround without zipping/unzipping the feed by explicitly defining the files in the file parameter:

zipfiles = zip::zip_list("~/Downloads/minneapolis-mvta-gtfs.zip")$filename
files = tools::file_path_sans_ext(zipfiles[!startsWith(zipfiles, "__MACOSX")])

gtfs = tidytransit::read_gtfs("~/Downloads/minneapolis-mvta-gtfs.zip", files = files)

Created on 2024-10-17 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants