-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Spatialite and MySQL WKB dialects #153
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition, thanks a lot @Oreilles !
Please do a cargo fmt
and we're good to merge.
Thanks for the review, the latest commit fixes the formatting issues! |
Thanks! Now we just have to make clippy happy as well. To test locally, you just have to run |
I think it's good now! |
geozero/src/wkb/wkb_reader.rs
Outdated
@@ -293,17 +424,59 @@ pub(crate) fn process_wkb_geom_n<R: Read, P: GeomProcessor>( | |||
} | |||
} | |||
|
|||
fn emit_coord<P: GeomProcessor>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good approach here - you first create coordinate tuple on line 462, and then you discover you don't even need it if multi_dim is false. I would inline this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed with latest commit
@@ -198,3 +200,12 @@ pub(crate) enum WKBByteOrder { | |||
Xdr = 0, // Big Endian | |||
Ndr = 1, // Little Endian | |||
} | |||
|
|||
impl From<scroll::Endian> for WKBByteOrder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good addition, thanks! I wonder if we need a reverse as well?
geozero/src/wkb/wkb_reader.rs
Outdated
} else { | ||
scroll::LE | ||
}; | ||
let endian = Endian::from(byte_order != 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sigh, Endian::from(bool)
is not the most readable interface. I had to look up what true
means :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. this one is not exactly actionable - just a note in case you can think of a cleaner API usage here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I agree that Endian::from(bool)
is not the very explicit at first glance... If we implement the reverse of From<scroll::Endian> for WKBByteOrder
as you suggested, then maybe we could also implement From<u8> for WKBByteOrder
and chain the two. That's a lot of code for basically a bool check though so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, probably not worth the complexity - maybe just add an extra line to make it more readable, like this?
let is_little_endian = byte_order != 0;
let endian = Endian::from(is_little_endian);
P.S. you can actually inline the second line down below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better, I'll do that 👍
lol, seems like @pka and I were reviewing it at the same time... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see inline
} else { | ||
scroll::LE | ||
}; | ||
let is_little_endian = byte_order != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
turns out there is a much cleaner solution, but it requires a bit of research. https://docs.rs/scroll/latest/scroll/ has a lot of docs for that, but essentially instead of using raw.ioread::<u8>()
you can use raw.ioread::<WKBByteOrder>
. In order for that to work, you would have to implement impl IntoCtx<scroll::Endian> for WKBByteOrder {
or something similar (I never used this crate, just saw some docs about cread vs gread vs pread - not sure which is which). At the end of the day, it should offer a very easy way to just do all the on-the-fly conversions cleanly
P.S. Note that you cannot do raw.ioread::<Encoding>
because you can only implement a trait for the type in the same crate, and Encoding
enum is not part of the geozero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.P.S. note that we don't need to fix ALL such issues at the same time, can be done in another PR. For more docs on this - https://github.com/m4b/scroll#stdio-api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few minor nits, but looks almost ready to merge. Thanks for all the patience!! :)
b0071c0
to
274a333
Compare
Thanks for your time, it's very pleasing contributing to open source projects with responsive maintainers 👍 |
please fix the conflict and lets merge |
Solved |
At first i made a few minor comments here, but then just created a PR for your PR - please merge if agree -- Oreilles#1 |
a few minor suggestions
I merge the two PR! |
The reason for having two methods was performance. This method is the innermost loop of any geometry operation (except for points) and passing additional optional params costs performance for the most common xy case.
Your concerns are valid, if the reader implementing the geozero/geozero/src/geojson/geojson_reader.rs Lines 238 to 251 in ece8dfe
WkbReader looks also fine, but there are maybe readers which don't respect GeomProcessor::dimensions resp. GeomProcessor::multi_dim , which would be a bug.
In your WKB xy to WKB xyzm example, I don't think that the result is invalid, you just loose the additional dimensions. |
If we process a WKB that doesn't have ZM coordinates, it will only call geozero/geozero/src/wkb/wkb_reader.rs Lines 315 to 319 in ece8dfe
And then only X and Y will be written: geozero/geozero/src/wkb/wkb_writer.rs Lines 163 to 169 in ece8dfe
When a parser will try to read the WKB emitted, il will try to read four
You might have tested it already (I haven't) but in this case, maybe the branching caused by having to handle four different outcomes outweight the performance gain of only calling the method with the least amout of parameters:
In any case, I'm more concerned about the bug opportunities here than performance 😅 |
@Oreilles i think this one is ready to go once conflict is fixed. One thing I forgot to ask about the other PR (which affect this PR as well) -- why do you need to call |
@nyurik calling processor.srid(info.srid) allowed for inferring the EWKT geometry srid from the WKB blob without having to manually specify it. The trait geozero/geozero/src/wkt/wkt_writer.rs Lines 82 to 85 in 5e92027
|
Awesome work, thanks!! |
This PR adds support for Spatialite and MySQL WKB dialects.
I'd happily discuss the implementation. Quirks of the spatialite dialect that required some refactoring:
0xFE
. In order to know wether thepolygon_end
(or alike) is the last one, we keep track of the collection entity nesting level. If the nesting level is 0 when a geometry ends, it means it is the last one of the blob and the terminating byte0xFE
must be appended. Even thoughMULTI*
geometries cannot be part of aGEOMETRYCOLLECTION
in Spatialite at the moment, this was my easiest go at the problem - and it also makes it more future-proof in case that feature is added at some point.read_header
parameter ofprocess_wkb_geom_n
so that the parent geometry info can be passed down to its children.f32
values, relative to the previous one (with the exception of the first and the last one), which required some refactoring toprocess_coord
to avoid code duplication between the two implementations.Side notes:
WkbWriter
implementGeomProcessor::srid()
?[minx, miny, maxx, maxy]
(like GeoJSON) whereas gpkg stores it as[minx, maxx, miny, maxy]
I also refactored the roundtrip tests so they can be shared by the 5 dialects.