Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46437][DOCS] Remove cruft from the built-in SQL functions documentation #44393

Closed
wants to merge 4 commits into from

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented Dec 18, 2023

What changes were proposed in this pull request?

  • Remove a bunch of Liquid directives that are not necessary.
  • Add a table of contents to the built-in SQL functions page.
  • Move the generated HTML for built-in SQL functions to a subdirectory.

Why are the changes needed?

To reduce confusion for maintainers.

Does this PR introduce any user-facing change?

Yes. It adds a table of contents and change the heading style of the examples.

Otherwise, the generated docs are identical.

How was this patch tested?

I built Spark, ran ./sql/create-docs.sh, and reviewed the generated docs in my browser.

The page is too long to screenshot completely, but here are a couple of screenshots.

Screenshot 2023-12-20 at 7 06 36 PM Screenshot 2023-12-20 at 7 06 48 PM

Was this patch authored or co-authored using generative AI tooling?

No.

@nchammas
Copy link
Contributor Author

cc @maropu for review, since you authored #28224.

@nchammas
Copy link
Contributor Author

@huaxingao, you may also be interested in reviewing this since you reviewed #28224.

@huaxingao
Copy link
Contributor

cc @HyukjinKwon

@HyukjinKwon
Copy link
Member

Merged to master.

@nchammas nchammas deleted the sql-builtin-funcs-cruft branch December 22, 2023 02:54
@@ -17,202 +17,125 @@ license: |
limitations under the License.
---

{% for static_file in site.static_files %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it was my bad. we still need this if-else because of SKIP_API=1 bundle exec jekyll build. It fails as below

Configuration file: /.../workspace/forked/spark/docs/_config.yml
            Source: /.../workspace/forked/spark/docs
       Destination: /.../workspace/forked/spark/docs/_site
 Incremental build: disabled. Enable with --incremental
      Generating...
  Liquid Exception: Could not locate the included file '_generated_function_html/agg-funcs-table.html' in any of ["/.../workspace/forked/spark/docs/"]. Ensure it exists in one of those directories and, if it is a symlink, does not point outside your site source. in sql-ref-functions-builtin.md
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me revert this for now (as it technically breaks the documentation build)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's strange. Does the PR build not build documentation? Because the build was passing as of a9ae743 (the latest commit in the PR), and the merge into master at e4b5977 also had a passing build.

Here's the documentation build after the merge: https://github.com/apache/spark/actions/runs/7294850973/job/19880408450

What am I missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is that the canonical bundle exec jekyll build would pass because it will trigger the full build of API. If SKIP_API=1 is specified, it will skip the API build (which will skip SQL API generation, then the HTML files would not exist).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just revive the ifs there, and I think we can just go ahead and merge. Let me reopen this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, the branch was removed so I can't reopen the PR 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I will reopen the PR.

I will also tweak things, if possible, so that SKIP_API=1 will work even if the SQL API docs have never been generated before, without requiring all of the conditional includes, because they really clutter things up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #44627 to address the build issue so we are not caught off guard like this again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just revive the ifs there, and I think we can just go ahead and merge. Let me reopen this PR.

@HyukjinKwon - Actually, thinking about this a bit more , is this really the behavior we want to have?

It's technically a type of silent failure if Jekyll successfully builds our docs but can't find some file that was referenced via include or include_relative. The way docs/sql-ref-functions-builtin.md is currently setup makes it so that Jekyll will never detect these missing includes.

Isn't that actually a bad thing? Using this pattern, how will developers know that the documentation built correctly without manually reviewing every rendered site where they wrote a conditional include?

It seems like we need a way to ignore missing includes only when SKIP_API=1. Do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah .. as long as those options work (e.g., SKIP_API=1.), it should be fine.

HyukjinKwon pushed a commit that referenced this pull request Jan 10, 2024
### What changes were proposed in this pull request?

Add [custom Jekyll tags][custom] to enable us to conditionally include files in our documentation build in a more user-friendly manner. [This example][example] demonstrates how a custom tag can build on one of Jekyll's built-in tags.

[custom]: https://github.com/Shopify/liquid/wiki/Liquid-for-Programmers#create-your-own-tags
[example]: Shopify/liquid#370 (comment)

Without this change, files have to be included as follows:

```liquid
{% for static_file in site.static_files %}
    {% if static_file.name == 'generated-agg-funcs-table.html' %}
        {% include_relative generated-agg-funcs-table.html %}
        {% break %}
    {% endif %}
{% endfor %}
```

With this change, they can be included more intuitively in one of two ways:

```liquid
{% include_relative_if_exists generated-agg-funcs-table.html %}
{% include_api_gen generated-agg-funcs-table.html %}
```

`include_relative_if_exists` includes a file if it exists and substitutes an HTML comment if not. Use this tag when it's always OK for an include not to exist.

`include_api_gen` includes a file if it exists. If it doesn't, it tolerates the missing file only if one of the `SKIP_` flags is set. Otherwise it raises an error. Use this tag for includes that are generated for the language APIs. These files are required to generate complete documentation, but we tolerate their absence during development---i.e. when a skip flag is set.

`include_api_gen` will place a visible text placeholder in the document and post a warning to the console to indicate that missing API files are being tolerated.

```sh
$ SKIP_API=1 bundle exec jekyll build
Configuration file: /Users/nchammas/dev/nchammas/spark/docs/_config.yml
            Source: /Users/nchammas/dev/nchammas/spark/docs
       Destination: /Users/nchammas/dev/nchammas/spark/docs/_site
 Incremental build: disabled. Enable with --incremental
      Generating...
Warning: Tolerating missing API files because the following skip flags are set: SKIP_API
                    done in 1.703 seconds.
 Auto-regeneration: disabled. Use --watch to enable.
```

This PR supersedes #44393.

### Why are the changes needed?

Jekyll does not have a succinct way to [check if a file exists][check], so the required directives to implement such functionality are very cumbersome.

We need the ability to do this so that we can [build the docs successfully with `SKIP_API=1`][build], since many includes reference files that are only generated when `SKIP_API` is _not_ set.

[check]: jekyll/jekyll#7528
[build]: #44627

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manually building and reviewing the docs, both with and without `SKIP_API=1`.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #44630 from nchammas/SPARK-46437-conditional-jekyll-include.

Authored-by: Nicholas Chammas <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants