Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure paragraphs are properly created #18

Conversation

maksymiuks
Copy link
Collaborator

PR address the identified issues. I realized the problem was deep in the rd2markdown.TEXT function. In this function, we were trimming all white signs and replacing them with " ". After consideration, I realized this does not necessarily have much sense, as markdown ignores all such signs if there are too many of them. Therefore I refactored the function to do something opposite. Now it doubles each single "\n" sign in order to ensure proper paragraphs display. This is a slightly arbitrary decision, but I truly believe it is a correct one. Line breaks are not really used in the documentation, and are definitely not commonly used in markdown documents. In my opinion ensuring that whenever a new line is forced, new paragraphs start will have a positive impact on produced markdown's files visibility.

@maksymiuks maksymiuks linked an issue Aug 3, 2022 that may be closed by this pull request
R/rd2markdown.R Outdated
# We replace one new line sign with two to make sure that proper
# paragraphs are always applied. New line without paragraphs is not something
# used in Rd2 objects and thus we do not account for such situations.
gsub("\n{1}", "\n\n", x)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main change in this PR. Rationale explained in the description

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just doing some quick tests, and single newlines are permitted in Rds just like in markdown, so this will introduce separations between all the lines. For example, in this package we have this documentation, which uses newlines to keep the Rd width to 80 chars.

\details{
The function provides a vectorized way of extracting multiple documentation
objects at once based on paths to the .Rd files sources. The file path will
be truncated with \code{\link[base]{basename}}, and used as a name for the
output list. Use \code{fragments} parameter whenever you want to limit
extracted documentation only to some parts.
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made a couple minor tweaks in this branch:

https://github.com/Genentech/rd2markdown/tree/17-paragraph-splits-2

Notable changes

rd2markdown.TEXT <- function(x, fragments = c(), ...) {
  if (grepl("^\\s*\n\\s*$", x)) return(rd2markdown.cr())
  trim_extra_newlines(x)
}

trim_extra_newlines <- function(x) {
  # non-newline whitespace
  ws <- "[^\\S\r\n]"
  re1 <- paste0(ws, "+", "(\n\\b)?")
  re2 <- paste0("(\n", ws, "*){2,}")
  gsub(re1, " ", gsub(re2, "\n\n", x, perl = TRUE), perl = TRUE)
}

Let me know what you think of that style. If you like it, feel free to merge the changes in with this branch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I agree it looks better now. Big cudos for refining that!

if (!missing(x) && is.null(package) && !is.null(x) && exists(x, envir = envir))
package <- getNamespaceName(environment(get(x, envir = envir)))
if (!missing(topic) && is.null(package) && !is.null(topic) && exists(topic, envir = envir))
package <- getNamespaceName(environment(get(topic, envir = envir)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized it was using the wrong parameter all along. It's fixed now

@maksymiuks maksymiuks requested a review from dgkf August 3, 2022 12:38
@maksymiuks
Copy link
Collaborator Author

rd2markdown.character function works completely different than I would like it to. Will fix it right away

@@ -237,7 +239,7 @@ rd2markdown.dots <- function(x, fragments = c(), ...) {
#' @exportS3Method
#' @rdname rd2markdown
rd2markdown.TEXT <- function(x, fragments = c(), ...) {
gsub("\\s+", " ", gsub("\n", " ", x))
trim_extra_newlines(x)
Copy link
Collaborator Author

@maksymiuks maksymiuks Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change that made the function discover standalone new line signs was a great idea, but unfortunately, there was one gap that was not possible to overcome which made me revert that. Namely, developers sometimes break the line once in roxygen documentation to make it readable and shorten lines. However if one does it just after a special documentation block (like \code{}) it will always appear as a solo \n TEXT tag and therefore would be falsely recognized as a new paragraph. Therefore I decided to look for a different solution to this issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good catch! I hadn't considered this. Definitely complicates things.

out <- paste0(out, collapse = collapse)
# Irrelevant form the rendering point of view, just for styling reasons
out <- trim_extra_newlines(out)
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that this issue fixes the underlying problem and hopefully most of the edge cases. For some reason we had that trimws() call in the filter that made us drop alone \n that otherwise would produce proper paragraphs. The reason was that \n\n signs are splitted into separate TEXT tags and in some cases were dropped because of that filter. Now it should all work just fine! (fingers crossed, thumbs hold) @dgkf

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think we want to trim most newlines. Using Rd2txt as a reference, lines re-flowed. It is only when we have two consecutive \n that we want to introduce a paragraph break, similar to markdown.

\\name{test}
\\title{test}
\\description{
  one 
  two
}

Will render as

test

Description:

     one two

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Frankly, I don't see what's wrong with this example, it looks correct. Also, I don't see how a change to that filter function affects that. The change (keep in mind change is a little bit above, here I marked it a bit incorrectly) makes us simply take all signs that were already present in the rd file, we don't introduce new ones. Or am I missing something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see what you mean, so this should be fine:

## test

one
two

because it still renders as:

test

one two


Unfortunately, I think there are still cases where this will be too assumptive. For example, an Rd can contain trailing whitespace before a newline, which would be considered a newline in markdown but is wrapped with Rd2txt:

% using "•" to indicate spaces
\\name{test}
\\title{test}
\\description{
••one••
••two••
}
test

Description:

     one two
     

but if we just kept all the whitespace as-is, it would display in markdown as

test

one
two

Copy link
Collaborator Author

@maksymiuks maksymiuks Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but I still don't see the problem in your example. I've prepared a dummy Rd like the one above (had to manually put spaces as roxygen removes them) and run it through the current version of rd2markdown from this branch. This is the raw result

"\n\n# test\n\none \n two \n\n"

and rendered file:

image

So we have indeed one new line sign between one and two in the raw md. But this is only one. Pandoc will ignore it while rendering. Whatever I tried, I couldn't recreate the output you provided, while using Pandoc Obviously, it will work differently in gitlab, as it's using a pseudo-markdown where every new line is rendered as is and I don't think we should account for that.

Would you mind sharing how you created that markdown display at the end of the previous message? I'd like to know what I'm missing here.

nchar(trimws(i))
}
}, out)
if (!is.null(collapse)) out <- paste0(out, collapse = collapse)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the actual change @dgkf

@dgkf
Copy link
Collaborator

dgkf commented Aug 4, 2022

So, after digging into this, I think this PR has highlighted a bit of hackiness that always felt uncomfortable to me. Whitespace has always been a challenge in this package, and the way it has been solved in the past has been to try to solve it locally, within each tag as it's processed.

For example, each tag handles its own whitespace, adding extra whitespace around sections and blocks of markdown. Then we collapse down to just two newlines at most. This is all a bit clunky.

I've tried to take a step back and re-approach this problem. The main focus of the approach is to process groups of TEXT tags at once, processing newlines into a fixed set of cr tags. See clean_text_newlines. I also added some extra testing helpers to try to keep this all straight.

This approach means that each individual rd tag should not add its own newlines. Instead, they are added by the parent tag. So far I feel like this simplifies things quite a bit, but I'm curious to hear your thoughts. It still has issues, especially with nested blocks. I'm still exploring whether it's worth pursuing.

@dgkf
Copy link
Collaborator

dgkf commented Aug 17, 2022

Closing, as this was superseded by #20

@dgkf dgkf closed this Aug 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ensure paragraph splits are retained in long descriptions
2 participants