Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-46437][DOCS] Remove cruft from the built-in SQL functions documentation #44393

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/.gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
generated-*.html
_generated_function_html/
183 changes: 43 additions & 140 deletions docs/sql-ref-functions-builtin.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,202 +17,105 @@ license: |
limitations under the License.
---

{% for static_file in site.static_files %}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it was my bad. we still need this if-else because of SKIP_API=1 bundle exec jekyll build. It fails as below

Configuration file: /.../workspace/forked/spark/docs/_config.yml
            Source: /.../workspace/forked/spark/docs
       Destination: /.../workspace/forked/spark/docs/_site
 Incremental build: disabled. Enable with --incremental
      Generating...
  Liquid Exception: Could not locate the included file '_generated_function_html/agg-funcs-table.html' in any of ["/.../workspace/forked/spark/docs/"]. Ensure it exists in one of those directories and, if it is a symlink, does not point outside your site source. in sql-ref-functions-builtin.md
...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me revert this for now (as it technically breaks the documentation build)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's strange. Does the PR build not build documentation? Because the build was passing as of a9ae743 (the latest commit in the PR), and the merge into master at e4b5977 also had a passing build.

Here's the documentation build after the merge: https://github.com/apache/spark/actions/runs/7294850973/job/19880408450

What am I missing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason is that the canonical bundle exec jekyll build would pass because it will trigger the full build of API. If SKIP_API=1 is specified, it will skip the API build (which will skip SQL API generation, then the HTML files would not exist).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just revive the ifs there, and I think we can just go ahead and merge. Let me reopen this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, the branch was removed so I can't reopen the PR 😢

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries. I will reopen the PR.

I will also tweak things, if possible, so that SKIP_API=1 will work even if the SQL API docs have never been generated before, without requiring all of the conditional includes, because they really clutter things up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #44627 to address the build issue so we are not caught off guard like this again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just revive the ifs there, and I think we can just go ahead and merge. Let me reopen this PR.

@HyukjinKwon - Actually, thinking about this a bit more , is this really the behavior we want to have?

It's technically a type of silent failure if Jekyll successfully builds our docs but can't find some file that was referenced via include or include_relative. The way docs/sql-ref-functions-builtin.md is currently setup makes it so that Jekyll will never detect these missing includes.

Isn't that actually a bad thing? Using this pattern, how will developers know that the documentation built correctly without manually reviewing every rendered site where they wrote a conditional include?

It seems like we need a way to ignore missing includes only when SKIP_API=1. Do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah .. as long as those options work (e.g., SKIP_API=1.), it should be fine.

{% if static_file.name == 'generated-agg-funcs-table.html' %}
* Table of contents
{:toc}
HyukjinKwon marked this conversation as resolved.
Show resolved Hide resolved

### Aggregate Functions
{% include_relative generated-agg-funcs-table.html %}
{% include_relative _generated_function_html/agg-funcs-table.html %}
#### Examples
{% include_relative generated-agg-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/agg-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-window-funcs-table.html' %}
### Window Functions
{% include_relative generated-window-funcs-table.html %}
{% include_relative _generated_function_html/window-funcs-table.html %}
#### Examples
{% include_relative generated-window-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/window-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-array-funcs-table.html' %}
### Array Functions
{% include_relative generated-array-funcs-table.html %}
{% include_relative _generated_function_html/array-funcs-table.html %}
#### Examples
{% include_relative generated-array-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/array-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-collection-funcs-table.html' %}
### Collection Functions
{% include_relative generated-collection-funcs-table.html %}
{% include_relative _generated_function_html/collection-funcs-table.html %}
#### Examples
{% include_relative generated-collection-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/collection-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-struct-funcs-table.html' %}
### STRUCT Functions
{% include_relative generated-struct-funcs-table.html %}
{% include_relative _generated_function_html/struct-funcs-table.html %}
#### Examples
{% include_relative generated-struct-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/struct-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-map-funcs-table.html' %}
### Map Functions
{% include_relative generated-map-funcs-table.html %}
{% include_relative _generated_function_html/map-funcs-table.html %}
#### Examples
{% include_relative generated-map-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/map-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-datetime-funcs-table.html' %}
### Date and Timestamp Functions
{% include_relative generated-datetime-funcs-table.html %}
{% include_relative _generated_function_html/datetime-funcs-table.html %}
#### Examples
{% include_relative generated-datetime-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/datetime-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-math-funcs-table.html' %}
### Mathematical Functions
{% include_relative generated-math-funcs-table.html %}
{% include_relative _generated_function_html/math-funcs-table.html %}
#### Examples
{% include_relative generated-math-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/math-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-string-funcs-table.html' %}
### String Functions
{% include_relative generated-string-funcs-table.html %}
{% include_relative _generated_function_html/string-funcs-table.html %}
#### Examples
{% include_relative generated-string-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/string-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-conditional-funcs-table.html' %}
### Conditional Functions
{% include_relative generated-conditional-funcs-table.html %}
{% include_relative _generated_function_html/conditional-funcs-table.html %}
#### Examples
{% include_relative generated-conditional-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/conditional-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-hash-funcs-table.html' %}
### Hash Functions
{% include_relative generated-hash-funcs-table.html %}
{% include_relative _generated_function_html/hash-funcs-table.html %}
#### Examples
{% include_relative generated-hash-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/hash-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-csv-funcs-table.html' %}
### CSV Functions
{% include_relative generated-csv-funcs-table.html %}
{% include_relative _generated_function_html/csv-funcs-table.html %}
#### Examples
{% include_relative generated-csv-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/csv-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-json-funcs-table.html' %}
### JSON Functions
{% include_relative generated-json-funcs-table.html %}
{% include_relative _generated_function_html/json-funcs-table.html %}
#### Examples
{% include_relative generated-json-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/json-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-xml-funcs-table.html' %}
### XML Functions
{% include_relative generated-xml-funcs-table.html %}
{% include_relative _generated_function_html/xml-funcs-table.html %}
#### Examples
{% include_relative generated-xml-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/xml-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-url-funcs-table.html' %}
### URL Functions
{% include_relative generated-url-funcs-table.html %}
{% include_relative _generated_function_html/url-funcs-table.html %}
#### Examples
{% include_relative generated-url-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/url-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-bitwise-funcs-table.html' %}
### Bitwise Functions
{% include_relative generated-bitwise-funcs-table.html %}
{% include_relative _generated_function_html/bitwise-funcs-table.html %}
#### Examples
{% include_relative generated-bitwise-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/bitwise-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-conversion-funcs-table.html' %}
### Conversion Functions
{% include_relative generated-conversion-funcs-table.html %}
{% include_relative _generated_function_html/conversion-funcs-table.html %}
#### Examples
{% include_relative generated-conversion-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/conversion-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-predicate-funcs-table.html' %}
### Predicate Functions
{% include_relative generated-predicate-funcs-table.html %}
{% include_relative _generated_function_html/predicate-funcs-table.html %}
#### Examples
{% include_relative generated-predicate-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/predicate-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-misc-funcs-table.html' %}
### Misc Functions
{% include_relative generated-misc-funcs-table.html %}
{% include_relative _generated_function_html/misc-funcs-table.html %}
#### Examples
{% include_relative generated-misc-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/misc-funcs-examples.html %}

{% for static_file in site.static_files %}
{% if static_file.name == 'generated-generator-funcs-table.html' %}
### Generator Functions
{% include_relative generated-generator-funcs-table.html %}
{% include_relative _generated_function_html/generator-funcs-table.html %}
#### Examples
{% include_relative generated-generator-funcs-examples.html %}
{% break %}
{% endif %}
{% endfor %}
{% include_relative _generated_function_html/generator-funcs-examples.html %}
15 changes: 9 additions & 6 deletions sql/gen-sql-functions-docs.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,19 @@
#

import itertools
import os
import re
import shutil
from collections import namedtuple
from pathlib import Path

# To avoid adding a new direct dependency, we import markdown from within mkdocs.
from mkdocs.structure.pages import markdown

from pyspark.java_gateway import launch_gateway


SPARK_PROJECT_ROOT = Path(__file__).parents[1]

ExpressionInfo = namedtuple("ExpressionInfo", "name usage examples group")

groups = {
Expand Down Expand Up @@ -210,7 +213,7 @@ def generate_functions_table_html(jvm, html_output_dir):
for key, infos in _list_grouped_function_infos(jvm):
function_table = _make_pretty_usage(infos)
key = key.replace("_", "-")
with open("%s/generated-%s-table.html" % (html_output_dir, key), 'w') as table_html:
with open(html_output_dir / f"{key}-table.html", 'w') as table_html:
table_html.write(function_table)


Expand All @@ -233,16 +236,16 @@ def generate_functions_examples_html(jvm, jspark, html_output_dir):
examples = _make_pretty_examples(jspark, infos)
key = key.replace("_", "-")
if examples is not None:
with open("%s/generated-%s-examples.html" % (
html_output_dir, key), 'w') as examples_html:
with open(html_output_dir / f"{key}-examples.html", 'w') as examples_html:
examples_html.write(examples)


if __name__ == "__main__":
jvm = launch_gateway().jvm
jspark = jvm.org.apache.spark.sql.SparkSession.builder().getOrCreate()
jspark.sparkContext().setLogLevel("ERROR") # Make it less noisy.
spark_root_dir = os.path.dirname(os.path.dirname(__file__))
html_output_dir = os.path.join(spark_root_dir, "docs")
html_output_dir = SPARK_PROJECT_ROOT / "docs" / "_generated_function_html"
shutil.rmtree(html_output_dir, ignore_errors=True)
html_output_dir.mkdir()
generate_functions_table_html(jvm, html_output_dir)
generate_functions_examples_html(jvm, jspark, html_output_dir)