Capturing matching values as the original input #157
randomeizer
started this conversation in
Ideas
Replies: 1 comment 5 replies
-
@randomeizer this was one of the motivations for the My comment parser looked like this: let comment = Parse(.struct(Comment.init(rawValue:))) {
"<!--".utf8
PrefixBy { input, _ in
guard !input.starts(with: "--".utf8), let scalar = input.firstScalar, isLegalCharacter(scalar)
else { return 0 }
return UTF8.width(scalar)
}
.map(.string)
"-->".utf8
} |
Beta Was this translation helpful? Give feedback.
5 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Ok, so this one is a little off to the side. I've often come across situations where I want to parse a slightly more complicated string into a single string value. This was actually one of the inspirations for the
Not
parser.There are a bunch of low-level parsers which are quite useful for parsing through bits of strings, but they all break up the results into individual pieces.
The case that brought this home in particular was the
Comment
grammar for XML:I can do this to parse it correctly:
However, having to map the internal
Parse
into aString
and back to aSubstring
is not ideal, and I end up with a massive array ofSubstring
s. What I really want is a singleSubstring
with the whole comment. That could be mapped again, but it's a lot ofString
/Substring
conversion, and I've lost connection to the original inputSubstring
.My current solution is a
Capture
parser (name inspired by capture groups in regex):With this in hand, my comment parser becomes:
It's simpler, and actually gives me a single
Substring
(or other valid input collection) which is still linked to the original input source.There is a lot of parsing conversion work being done by the
Match
parser that's getting thrown away, so perhaps not the most efficient way of doing this kind of job, so happy to get other thoughts on it. I could certainly write a custom parser for Comment (which I'll probably do in this case), but the issue of having multiple parts of strings broken up in the parse process, that I then want to be a single string has come up enough times that I think it's worth figuring out.Any thoughts?
Beta Was this translation helpful? Give feedback.
All reactions