-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make sure paragraphs are properly created #18
Make sure paragraphs are properly created #18
Conversation
R/rd2markdown.R
Outdated
# We replace one new line sign with two to make sure that proper | ||
# paragraphs are always applied. New line without paragraphs is not something | ||
# used in Rd2 objects and thus we do not account for such situations. | ||
gsub("\n{1}", "\n\n", x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main change in this PR. Rationale explained in the description
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just doing some quick tests, and single newlines are permitted in Rd
s just like in markdown, so this will introduce separations between all the lines. For example, in this package we have this documentation, which uses newlines to keep the Rd width to 80 chars.
rd2markdown/man/documentation_to_markdown.Rd
Lines 25 to 31 in 3d72b8d
\details{ | |
The function provides a vectorized way of extracting multiple documentation | |
objects at once based on paths to the .Rd files sources. The file path will | |
be truncated with \code{\link[base]{basename}}, and used as a name for the | |
output list. Use \code{fragments} parameter whenever you want to limit | |
extracted documentation only to some parts. | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made a couple minor tweaks in this branch:
https://github.com/Genentech/rd2markdown/tree/17-paragraph-splits-2
Notable changes
rd2markdown.TEXT <- function(x, fragments = c(), ...) {
if (grepl("^\\s*\n\\s*$", x)) return(rd2markdown.cr())
trim_extra_newlines(x)
}
trim_extra_newlines <- function(x) {
# non-newline whitespace
ws <- "[^\\S\r\n]"
re1 <- paste0(ws, "+", "(\n\\b)?")
re2 <- paste0("(\n", ws, "*){2,}")
gsub(re1, " ", gsub(re2, "\n\n", x, perl = TRUE), perl = TRUE)
}
Let me know what you think of that style. If you like it, feel free to merge the changes in with this branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I agree it looks better now. Big cudos for refining that!
if (!missing(x) && is.null(package) && !is.null(x) && exists(x, envir = envir)) | ||
package <- getNamespaceName(environment(get(x, envir = envir))) | ||
if (!missing(topic) && is.null(package) && !is.null(topic) && exists(topic, envir = envir)) | ||
package <- getNamespaceName(environment(get(topic, envir = envir))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized it was using the wrong parameter all along. It's fixed now
rd2markdown.character function works completely different than I would like it to. Will fix it right away |
@@ -237,7 +239,7 @@ rd2markdown.dots <- function(x, fragments = c(), ...) { | |||
#' @exportS3Method | |||
#' @rdname rd2markdown | |||
rd2markdown.TEXT <- function(x, fragments = c(), ...) { | |||
gsub("\\s+", " ", gsub("\n", " ", x)) | |||
trim_extra_newlines(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change that made the function discover standalone new line signs was a great idea, but unfortunately, there was one gap that was not possible to overcome which made me revert that. Namely, developers sometimes break the line once in roxygen documentation to make it readable and shorten lines. However if one does it just after a special documentation block (like \code{}
) it will always appear as a solo \n
TEXT tag and therefore would be falsely recognized as a new paragraph. Therefore I decided to look for a different solution to this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good catch! I hadn't considered this. Definitely complicates things.
out <- paste0(out, collapse = collapse) | ||
# Irrelevant form the rendering point of view, just for styling reasons | ||
out <- trim_extra_newlines(out) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that this issue fixes the underlying problem and hopefully most of the edge cases. For some reason we had that trimws()
call in the filter that made us drop alone \n
that otherwise would produce proper paragraphs. The reason was that \n\n
signs are splitted into separate TEXT tags and in some cases were dropped because of that filter. Now it should all work just fine! (fingers crossed, thumbs hold) @dgkf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think we want to trim most newlines. Using Rd2txt
as a reference, lines re-flowed. It is only when we have two consecutive \n
that we want to introduce a paragraph break, similar to markdown.
\\name{test}
\\title{test}
\\description{
one
two
}
Will render as
test
Description:
one two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Frankly, I don't see what's wrong with this example, it looks correct. Also, I don't see how a change to that filter function affects that. The change (keep in mind change is a little bit above, here I marked it a bit incorrectly) makes us simply take all signs that were already present in the rd file, we don't introduce new ones. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I see what you mean, so this should be fine:
## test
one
two
because it still renders as:
test
one two
Unfortunately, I think there are still cases where this will be too assumptive. For example, an Rd can contain trailing whitespace before a newline, which would be considered a newline in markdown but is wrapped with Rd2txt
:
% using "•" to indicate spaces
\\name{test}
\\title{test}
\\description{
••one••
••two••
}
test
Description:
one two
but if we just kept all the whitespace as-is, it would display in markdown as
test
one
two
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry but I still don't see the problem in your example. I've prepared a dummy Rd like the one above (had to manually put spaces as roxygen removes them) and run it through the current version of rd2markdown
from this branch. This is the raw result
"\n\n# test\n\none \n two \n\n"
and rendered file:
So we have indeed one new line sign between one and two in the raw md. But this is only one. Pandoc will ignore it while rendering. Whatever I tried, I couldn't recreate the output you provided, while using Pandoc Obviously, it will work differently in gitlab, as it's using a pseudo-markdown where every new line is rendered as is and I don't think we should account for that.
Would you mind sharing how you created that markdown display at the end of the previous message? I'd like to know what I'm missing here.
nchar(trimws(i)) | ||
} | ||
}, out) | ||
if (!is.null(collapse)) out <- paste0(out, collapse = collapse) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the actual change @dgkf
So, after digging into this, I think this PR has highlighted a bit of hackiness that always felt uncomfortable to me. Whitespace has always been a challenge in this package, and the way it has been solved in the past has been to try to solve it locally, within each tag as it's processed. For example, each tag handles its own whitespace, adding extra whitespace around sections and blocks of markdown. Then we collapse down to just two newlines at most. This is all a bit clunky. I've tried to take a step back and re-approach this problem. The main focus of the approach is to process groups of This approach means that each individual rd tag should not add its own newlines. Instead, they are added by the parent tag. So far I feel like this simplifies things quite a bit, but I'm curious to hear your thoughts. It still has issues, especially with nested blocks. I'm still exploring whether it's worth pursuing. |
Closing, as this was superseded by #20 |
PR address the identified issues. I realized the problem was deep in the
rd2markdown.TEXT
function. In this function, we were trimming all white signs and replacing them with" "
. After consideration, I realized this does not necessarily have much sense, as markdown ignores all such signs if there are too many of them. Therefore I refactored the function to do something opposite. Now it doubles each single "\n" sign in order to ensure proper paragraphs display. This is a slightly arbitrary decision, but I truly believe it is a correct one. Line breaks are not really used in the documentation, and are definitely not commonly used in markdown documents. In my opinion ensuring that whenever a new line is forced, new paragraphs start will have a positive impact on produced markdown's files visibility.