Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] RFC3986 and RFC3987 - Internationalized Resource Identifier (IRI) vs ZnUrl #14

Open
tomasklapka opened this issue Jan 7, 2015 · 3 comments

Comments

@tomasklapka
Copy link

Hi Sven

I'm working on a RDF library for Pharo. Essential piece of RDF is IRI and ZnUrl class already does a lot of what IRI should do. ZnUrl is also used and is becoming an officially recommended URL class.

Now, I'm still a beginner in Pharo, so I am not sure what would be optimal to do. By optimal I mean that it is acceptable and usable by community and is following best practices.

For RDF I need IRI to be RFC compliant as much as possible.
I would like IRI to be compatible (interchangeable) with ZnUrl if possible.
I want IRI to resolve references (ZnUrl's inContextOf: is not merging paths).
I would like IRI to be able to contain relative IRIs and relative IRIs with relative path (allowing to have ".." segments at the beginning of the path)

I was thinking about an approach of inheriting ZnUrl and overriding everything not working according to URI and IRI RFCs.

Basicly I have created a general IRI parser (PetitParser) with grammar from RFC to cover all possible syntax and overrid ZnUrl parsing with the new parser and debugged most of the messages to pass tests.

It's still a work on progress, but there are already some issues I've encountered:

  • According to RFCs double slashes precedes an authority. if there is no authority double slashes should not be present. ZnUrl has hardcoded mailto and telnet as a schemes without double slashes, though telnet should contain them and mailto should not because mailbox is parsed as a path (using the grammar from RFC)
  • not all schemes use '/' as a path segment separator
  • ZnUrl does not allow having relative URIs pointing above the first non dot segment, ie. '../' path (is there any special reason? or it just was not required?)

Are there any motivations to improve ZnUrl in any way?

What would you suggest? I can see these options:

  • override a lot of ZnUrl and try to align it as much as possible.
  • make IRI not inherit anything from ZnUrl, have similar message api and just have IRI>>#asZnUrl and ZnUrl>>#asIRI for converting.
  • improve ZnUrl if there is a motivation to extend ZnUrl to be more general, internationalized and RFC compliant.

If anyone is interested in the development version (reference resolving and some character processing is not passing fully yet and whole code needs some refactoring and cleaning) of IRI (and RDF), there is a filetree repo in my playground: https://github.com/tomasklapka/my-playground/tree/eca1c3db6e7f97da9ec57bda95592b01f4c3bfc2/rdfTalk/pharo

IRI depends on PetitParser. I'm using Moose for development (contains PetitParser by default)

@svenvc
Copy link
Owner

svenvc commented Jan 7, 2015

Hi Tomáš,

First off, thank you for the feedback, this is always nice.

ZnUrl is the way it is, because it started life as an HTTP(S) URL, then came File URL support and later some more. Of course, not all specifics of all schemes are present. Some implementation decisions were made because of common usage and user expectations.

As a start and to make it possible for you to move forward faster, I would suggest you make your own independent class.

Regards,

Sven

PS: If there are clear RFC violations in ZnUrl I am all ears.

@tomasklapka
Copy link
Author

Thanks for the suggestion, Sven. I've just made the class independent and whole code is more clear to work with now since I'm not thinking about overriding all the time. If I keep the message api aligned to ZnUrl I can always play with interchangeability later.

PS: It's not exactly violating RFC. It's mostly fine in its domain. I'll do more tests including comparision and interchangeablity with ZnUrl and we'll see ;)

@svenvc
Copy link
Owner

svenvc commented Jan 9, 2015

Ah, that is good to hear.

Yes, for now, API compatibility is the way to go.

svenvc added a commit that referenced this issue May 30, 2024
Update Zinc: Add an option #ignoreByteOrderMark to ZnUTFEncoders (true by default)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants