Make sure paragraphs are properly created #18

maksymiuks · 2022-08-03T12:24:39Z

PR address the identified issues. I realized the problem was deep in the rd2markdown.TEXT function. In this function, we were trimming all white signs and replacing them with " ". After consideration, I realized this does not necessarily have much sense, as markdown ignores all such signs if there are too many of them. Therefore I refactored the function to do something opposite. Now it doubles each single "\n" sign in order to ensure proper paragraphs display. This is a slightly arbitrary decision, but I truly believe it is a correct one. Line breaks are not really used in the documentation, and are definitely not commonly used in markdown documents. In my opinion ensuring that whenever a new line is forced, new paragraphs start will have a positive impact on produced markdown's files visibility.

maksymiuks · 2022-08-03T12:32:58Z

R/rd2markdown.R

+  # We replace one new line sign with two to make sure that proper
+  # paragraphs are always applied. New line without paragraphs is not something
+  # used in Rd2 objects and thus we do not account for such situations.
+  gsub("\n{1}", "\n\n", x)


The main change in this PR. Rationale explained in the description

Just doing some quick tests, and single newlines are permitted in Rds just like in markdown, so this will introduce separations between all the lines. For example, in this package we have this documentation, which uses newlines to keep the Rd width to 80 chars.

rd2markdown/man/documentation_to_markdown.Rd

Lines 25 to 31 in 3d72b8d

\details{

The function provides a vectorized way of extracting multiple documentation

objects at once based on paths to the .Rd files sources. The file path will

be truncated with \code{\link[base]{basename}}, and used as a name for the

output list. Use \code{fragments} parameter whenever you want to limit

extracted documentation only to some parts.

}

I made a couple minor tweaks in this branch:

https://github.com/Genentech/rd2markdown/tree/17-paragraph-splits-2

Notable changes

rd2markdown.TEXT <- function(x, fragments = c(), ...) { if (grepl("^\\s*\n\\s*$", x)) return(rd2markdown.cr()) trim_extra_newlines(x) } trim_extra_newlines <- function(x) { # non-newline whitespace ws <- "[^\\S\r\n]" re1 <- paste0(ws, "+", "(\n\\b)?") re2 <- paste0("(\n", ws, "*){2,}") gsub(re1, " ", gsub(re2, "\n\n", x, perl = TRUE), perl = TRUE) }

Let me know what you think of that style. If you like it, feel free to merge the changes in with this branch.

Thanks! I agree it looks better now. Big cudos for refining that!

maksymiuks · 2022-08-03T12:33:25Z

R/rd2markdown.R

-  if (!missing(x) && is.null(package) && !is.null(x) && exists(x, envir = envir))
-    package <- getNamespaceName(environment(get(x, envir = envir)))
+  if (!missing(topic) && is.null(package) && !is.null(topic) && exists(topic, envir = envir))
+    package <- getNamespaceName(environment(get(topic, envir = envir)))


I realized it was using the wrong parameter all along. It's fixed now

maksymiuks · 2022-08-04T08:23:13Z

rd2markdown.character function works completely different than I would like it to. Will fix it right away

maksymiuks · 2022-08-04T13:35:55Z

R/rd2markdown.R

@@ -237,7 +239,7 @@ rd2markdown.dots <- function(x, fragments = c(), ...) {
 #' @exportS3Method
 #' @rdname rd2markdown
 rd2markdown.TEXT <- function(x, fragments = c(), ...) {
-  gsub("\\s+", " ", gsub("\n", " ", x))
+  trim_extra_newlines(x)


The change that made the function discover standalone new line signs was a great idea, but unfortunately, there was one gap that was not possible to overcome which made me revert that. Namely, developers sometimes break the line once in roxygen documentation to make it readable and shorten lines. However if one does it just after a special documentation block (like \code{}) it will always appear as a solo \n TEXT tag and therefore would be falsely recognized as a new paragraph. Therefore I decided to look for a different solution to this issue.

Ah, good catch! I hadn't considered this. Definitely complicates things.

maksymiuks · 2022-08-04T13:40:29Z

R/map_rd2markdown.R

+    out <- paste0(out, collapse = collapse)
+    # Irrelevant form the rendering point of view, just for styling reasons
+    out <- trim_extra_newlines(out)
+  } 


I believe that this issue fixes the underlying problem and hopefully most of the edge cases. For some reason we had that trimws() call in the filter that made us drop alone \n that otherwise would produce proper paragraphs. The reason was that \n\n signs are splitted into separate TEXT tags and in some cases were dropped because of that filter. Now it should all work just fine! (fingers crossed, thumbs hold) @dgkf

I do think we want to trim most newlines. Using Rd2txt as a reference, lines re-flowed. It is only when we have two consecutive \n that we want to introduce a paragraph break, similar to markdown.

\\name{test} \\title{test} \\description{ one two }

Will render as

test Description: one two

Frankly, I don't see what's wrong with this example, it looks correct. Also, I don't see how a change to that filter function affects that. The change (keep in mind change is a little bit above, here I marked it a bit incorrectly) makes us simply take all signs that were already present in the rd file, we don't introduce new ones. Or am I missing something?

Now I see what you mean, so this should be fine:

## test one two

because it still renders as:

test

one two

Unfortunately, I think there are still cases where this will be too assumptive. For example, an Rd can contain trailing whitespace before a newline, which would be considered a newline in markdown but is wrapped with Rd2txt:

% using "•" to indicate spaces \\name{test} \\title{test} \\description{ ••one•• ••two•• }

test Description: one two

but if we just kept all the whitespace as-is, it would display in markdown as

test

one
two

I'm sorry but I still don't see the problem in your example. I've prepared a dummy Rd like the one above (had to manually put spaces as roxygen removes them) and run it through the current version of rd2markdown from this branch. This is the raw result

"\n\n# test\n\none \n two \n\n"

and rendered file:

So we have indeed one new line sign between one and two in the raw md. But this is only one. Pandoc will ignore it while rendering. Whatever I tried, I couldn't recreate the output you provided, while using Pandoc Obviously, it will work differently in gitlab, as it's using a pseudo-markdown where every new line is rendered as is and I don't think we should account for that.

Would you mind sharing how you created that markdown display at the end of the previous message? I'd like to know what I'm missing here.

maksymiuks · 2022-08-04T19:14:24Z

R/map_rd2markdown.R

-      nchar(trimws(i))
-    }
-  }, out)
-  if (!is.null(collapse)) out <- paste0(out, collapse = collapse)


This is the actual change @dgkf

dgkf · 2022-08-04T20:55:44Z

So, after digging into this, I think this PR has highlighted a bit of hackiness that always felt uncomfortable to me. Whitespace has always been a challenge in this package, and the way it has been solved in the past has been to try to solve it locally, within each tag as it's processed.

For example, each tag handles its own whitespace, adding extra whitespace around sections and blocks of markdown. Then we collapse down to just two newlines at most. This is all a bit clunky.

I've tried to take a step back and re-approach this problem. The main focus of the approach is to process groups of TEXT tags at once, processing newlines into a fixed set of cr tags. See clean_text_newlines. I also added some extra testing helpers to try to keep this all straight.

This approach means that each individual rd tag should not add its own newlines. Instead, they are added by the parent tag. So far I feel like this simplifies things quite a bit, but I'm curious to hear your thoughts. It still has issues, especially with nested blocks. I'm still exploring whether it's worth pursuing.

dgkf · 2022-08-17T18:03:33Z

Closing, as this was superseded by #20

Make sure paragraphs are properly created

2283180

maksymiuks linked an issue Aug 3, 2022 that may be closed by this pull request

Ensure paragraph splits are retained in long descriptions #17

Open

maksymiuks added 2 commits August 3, 2022 14:27

Fix typos

c1a803d

Fix typos

c75540c

maksymiuks commented Aug 3, 2022

View reviewed changes

maksymiuks requested a review from dgkf August 3, 2022 12:38

maksymiuks and others added 2 commits August 3, 2022 14:42

Update Readme with new enhanced functions

ef3f6b7

slight rework to paragraph splits

b0480a2

maksymiuks added 4 commits August 4, 2022 10:58

Enhance docuemntation and rd2markdown.character

e9b7ec2

Further enhancements to rd2markdown.character

6ae776c

Update news

89d1b83

Change filtering in map_rd2markdown

f0b4a70

maksymiuks commented Aug 4, 2022

View reviewed changes

update news

22b659d

maksymiuks commented Aug 4, 2022

View reviewed changes

dgkf closed this Aug 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make sure paragraphs are properly created #18

Make sure paragraphs are properly created #18

maksymiuks commented Aug 3, 2022

maksymiuks Aug 3, 2022

dgkf Aug 3, 2022

dgkf Aug 3, 2022

maksymiuks Aug 4, 2022

maksymiuks Aug 3, 2022

maksymiuks commented Aug 4, 2022

maksymiuks Aug 4, 2022 •

edited

Loading

dgkf Aug 4, 2022

maksymiuks Aug 4, 2022

dgkf Aug 4, 2022

maksymiuks Aug 4, 2022

dgkf Aug 4, 2022

maksymiuks Aug 5, 2022 •

edited

Loading

maksymiuks Aug 4, 2022

dgkf commented Aug 4, 2022 •

edited

Loading

dgkf commented Aug 17, 2022

	\details{
	The function provides a vectorized way of extracting multiple documentation
	objects at once based on paths to the .Rd files sources. The file path will
	be truncated with \code{\link[base]{basename}}, and used as a name for the
	output list. Use \code{fragments} parameter whenever you want to limit
	extracted documentation only to some parts.
	}

Make sure paragraphs are properly created #18

Make sure paragraphs are properly created #18

Conversation

maksymiuks commented Aug 3, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maksymiuks commented Aug 4, 2022

maksymiuks Aug 4, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

test

test

maksymiuks Aug 5, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dgkf commented Aug 4, 2022 • edited Loading

dgkf commented Aug 17, 2022

maksymiuks Aug 4, 2022 •

edited

Loading

maksymiuks Aug 5, 2022 •

edited

Loading

dgkf commented Aug 4, 2022 •

edited

Loading