Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace middleman-search with lunr with Algolia DocSearch #706

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tnir
Copy link
Collaborator

@tnir tnir commented Jul 18, 2022

What was the end-user problem that led to this PR?

As middleman-search (RubyGems, GitHub) has not been maintained for more than 5.5 years (for example manastech/middleman-search#38), bundler-site maintainers/contributors cannot upgrade lunr to the latest without their additional efforts (or other community's efforts than maintainers/contributors of this repo).

Closes #691

What was your diagnosis of the problem?

UI can be replaced by https://github.com/algolia/docsearch or its forks. Backend (current /search/lunr-index.json) can be replaced by Algolia DocSearch with automated crawling or Typesense cloud with manual crawling by us.

What is your fix for the problem, implemented in this PR?

  • 🎉 Removes middleman-search gem and relevant gems (mini_racer and libv8-node).
  • 🎉 Removes JS/CSS codes and Middleman configuration for middleman-search.
  • 🎉 Removes HTML snippets thanks to new UI from DocSeach v3.
  • 🎉 Removes (the load of) Popover CSS from Bootstrap 5 along with the above JS code removal.
  • 🏗️ Also the load of /application.min.js is moved to outside of <head>.

The crawler is manually run by @tnir at any time.

Screenshots

PC (as-is)

bundler-site-tnir-algol-j2zyh5 herokuapp com_v2

PC (searching)

bundler-site-tnir-algol-j2zyh5 herokuapp com_search_v2

Mobile (as is) Mobile (searching)
bundler-site-tnir-algol-j2zyh5 herokuapp com_(iPhone 12 Pro)_v2 bundler-site-tnir-algol-j2zyh5 herokuapp com_(iPhone 12 Pro)_search_v2
Old screenshots for previous changeset

bundler-site-tnir-algol-j2zyh5 herokuapp com_
bundler-site-tnir-algol-j2zyh5 herokuapp com_search

Mobile (as is) Mobile (searching)
bundler-site-tnir-algol-j2zyh5 herokuapp com_(iPhone 12 Pro) bundler-site-tnir-algol-j2zyh5 herokuapp com_(iPhone 12 Pro)_search

Checklist

Why did you choose this fix out of the possible options?

Algolia's DocSearch for open source projects gets ready in a few days. If Typesense offers

  • a some tiny instance to us at no cost
  • and the permission to the UI
  • DocSearch v3-ready UI for enduser,

we might be able to use #702, but this change looks clear at this moment.

Signed-off-by: Takuya Noguchi [email protected]

@tnir tnir self-assigned this Jul 18, 2022
@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-j2zyh5 July 18, 2022 13:58 Inactive
@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from f34b2cf to 5351a54 Compare July 18, 2022 16:05
@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-j2zyh5 July 18, 2022 16:05 Inactive
@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from 5351a54 to bb85a7e Compare July 18, 2022 16:23
@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-j2zyh5 July 18, 2022 16:23 Inactive
@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from bb85a7e to 9abd7c2 Compare July 18, 2022 16:29
@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-j2zyh5 July 18, 2022 16:30 Inactive
@shortcuts
Copy link

shortcuts commented Jul 18, 2022

Hey, small feedback to improve your DocSearch results, you can update the Crawler config to better scope the lvl0 (e.g. h1), so you have more sections appearing in the modal (like on https://docsearch.algolia.com/)

@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from 9abd7c2 to ba24618 Compare July 19, 2022 01:17
@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-j2zyh5 July 19, 2022 01:17 Inactive
@tnir
Copy link
Collaborator Author

tnir commented Jul 19, 2022

@shortcuts Thanks, but I could not see any good element for lvl0 at this moment. I planned to update them by just adding section name after information architecting 😁 What do you think?

@shortcuts
Copy link

@shortcuts Thanks, but I could not see any good element for lvl0 at this moment. I planned to update them by just adding section name after information architecting 😁 What do you think?

The main title of the page, the navbar section/active subsection is what we recommend

Copy link
Member

@deivid-rodriguez deivid-rodriguez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice!

I agree that Algolia Docsearch seems easier for the moment.

Is it possible to recrawl the website on every deploy instead of daily?

While testing I noticed some extra search results which should not be needed. For example, when looking for "docker", one of the results is a link to the sidebar: https://bundler.io/guides/bundler_docker_guide.html#sidebar-wrapper. Can we avoid generating these redundant results?

assets/javascripts/docsearch.js Show resolved Hide resolved
@@ -30,3 +30,6 @@

.footer
= partial 'layouts/footer'
= javascript_include_tag "https://cdn.jsdelivr.net/npm/@docsearch/[email protected]/dist/umd/index.min.js"
= javascript_include_tag "https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.min.js"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this? Is it related to not longer needing Popover?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used to think this is not needed. By removing the final load import { Popover } from 'bootstrap'; from assets/javascripts/search.js, it looks like bootstrap esm JS will not loaded any more, which will break navbar collapse. So with the current situation of slowness of asset (https://bundler.io/application.min.js) downloading from GitHub Pages.

The below image shows how fast JavaScripts are delivered from CDN (DocSearch JS, Bootstgrap JS, and our own JS from top to bottom):
algolia

See also https://getbootstrap.com/docs/5.1/getting-started/webpack/#importing-javascript

@simi
Copy link
Member

simi commented Jul 19, 2022

Is the search branded right now? That would be big 👎 for me.

@tnir
Copy link
Collaborator Author

tnir commented Jul 19, 2022

@simi Yes I see you, but Typesense will cause the same result...

@tnir
Copy link
Collaborator Author

tnir commented Jul 19, 2022

@simi As the next step to this PR, we can prepare our own cluster on Algolia (or Typesense cloud). We can estimate how much it will be if some sponsors pay for it.

If this is not possible for you, you're free to open your own Algolia account and run DocSearch on your own without this limitation. In that case, though, depending on the size of your documentation, you might need a paid account (free accounts can hold as much as 10k records).

https://docsearch.algolia.com/docs/DocSearch-program

@deivid-rodriguez
Copy link
Member

In any case, it seems best for us to be in full control of when we run the crawler (after every deploy), so running our own cluster would seem like the way to go.

@shortcuts
Copy link

shortcuts commented Jul 19, 2022

Is it possible to recrawl the website on every deploy instead of daily?

Sure, we have a GitHub action available: https://github.com/algolia/algoliasearch-crawler-github-actions (which works with DocSearch :))

@tnir
Copy link
Collaborator Author

tnir commented Jul 19, 2022

In any case, it seems best for us to be in full control of when we run the crawler (after every deploy), so running our own cluster would seem like the way to go.

Is it possible to recrawl the website on every deploy instead of daily?

My previous comment is at least now incorrect (my memory might be from years ago...). As described in the doc below, we can schedule it more frequently and trigger it manually. Not sure we can do it via API/CLI though:

Crawls are scheduled at a random time once a week. You can configure this schedule from the config file or trigger one manually from the Crawler interface.

https://www.algolia.com/doc/tools/crawler/apis/configuration/schedule/ still says every 24 hours: schedule: 'every 1 day at 3:00 pm',

While testing I noticed some extra search results which should not be needed. For example, when looking for "docker", one of the results is a link to the sidebar: https://bundler.io/guides/bundler_docker_guide.html#sidebar-wrapper. Can we avoid generating these redundant results?

Crawler configuration can be modified at https://crawler.algolia.com/admin/crawlers/7f4b6579-bba3-4c1f-a0ff-462d9f408281/configuration/edit (I sent an invitation to you). Just hours ago, I updated it as follows:

new Crawler({
  rateLimit: 8,
  startUrls: ["https://bundler.io"],
  renderJavaScript: false,
  sitemaps: [],
  ignoreCanonicalTo: false,
  discoveryPatterns: ["https://bundler.io/**"],
  schedule: "at 5:44 PM on Sunday",
  actions: [
    {
      indexName: "bundler",
      pathsToMatch: ["https://bundler.io/**"],
      recordExtractor: ({ helpers }) => {
        return helpers.docsearch({
          recordProps: {
            lvl1: [
              ".commands h2#NAME",
              "h1",
              "#page-content-wrapper h2",
              "head > title",
            ],
            content: [
              "#page-content-wrapper p, #page-content-wrapper li",
              "p",
              ".team .card-body",
              "span.contributor",
            ],
            lvl0: {
              selectors: "",
              defaultValue: "Documentation",
            },
            lvl2: ["h3"],
            lvl3: ["h4"],
            lvl4: ["h5"],
            lvl5: ["article h5", "main h5", "h5"],
            lvl6: ["article h6", "main h6", "h6"],
          },
          aggregateContent: true,
          recordVersion: "v3",
        });
      },
    },
  ],
  initialIndexSettings: {
    bundler: {
      attributesForFaceting: ["type", "lang"],
      attributesToRetrieve: [
        "hierarchy",
        "content",
        "anchor",
        "url",
        "url_without_anchor",
        "type",
      ],
      attributesToHighlight: ["hierarchy", "content"],
      attributesToSnippet: ["content:10"],
      camelCaseAttributes: ["hierarchy", "content"],
      searchableAttributes: [
        "unordered(hierarchy.lvl0)",
        "unordered(hierarchy.lvl1)",
        "unordered(hierarchy.lvl2)",
        "unordered(hierarchy.lvl3)",
        "unordered(hierarchy.lvl4)",
        "unordered(hierarchy.lvl5)",
        "unordered(hierarchy.lvl6)",
        "content",
      ],
      distinct: true,
      attributeForDistinct: "url",
      customRanking: [
        "desc(weight.pageRank)",
        "desc(weight.level)",
        "asc(weight.position)",
      ],
      ranking: [
        "words",
        "filters",
        "typo",
        "attribute",
        "proximity",
        "exact",
        "custom",
      ],
      highlightPreTag: '<span class="algolia-docsearch-suggestion--highlight">',
      highlightPostTag: "</span>",
      minWordSizefor1Typo: 3,
      minWordSizefor2Typos: 7,
      allowTyposOnNumericTokens: false,
      minProximity: 1,
      ignorePlurals: true,
      advancedSyntax: true,
      attributeCriteriaComputedByMinProximity: true,
      removeWordsIfNoResults: "allOptional",
    },
  },
  appId: "3JA5LRH987",
  apiKey: "masked",
});

@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from ba24618 to eff461e Compare July 19, 2022 12:12
@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-j2zyh5 July 19, 2022 12:13 Inactive
@tnir
Copy link
Collaborator Author

tnir commented Jul 19, 2022

Updated the primary color for box-shadow etc. in searchbox and modal as I have forgotten the adaption of Bundler branding color.

@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from eff461e to 846f7c4 Compare July 19, 2022 14:42
@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-j2zyh5 July 19, 2022 14:42 Inactive
@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from 846f7c4 to e85d9be Compare July 19, 2022 14:44
@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-j2zyh5 July 19, 2022 14:44 Inactive
@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from e85d9be to dd60648 Compare July 19, 2022 15:28
@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-j2zyh5 July 19, 2022 15:28 Inactive
@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from dd60648 to d2fdc7e Compare July 22, 2022 00:16
@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-j2zyh5 July 22, 2022 00:16 Inactive
@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from d2fdc7e to 2b5f07e Compare July 22, 2022 02:02
@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-j2zyh5 July 22, 2022 02:02 Inactive
@tnir
Copy link
Collaborator Author

tnir commented Jul 22, 2022

Admin role is now requested to Org members/Repo admins in #730 to set up periodic indexing on Actions 🙏

@tnir tnir mentioned this pull request Jul 22, 2022
6 tasks
@tnir tnir marked this pull request as draft July 23, 2022 23:43
@tnir tnir removed the request for review from deivid-rodriguez July 23, 2022 23:43
@tnir tnir added this to the Architecture overhaul milestone Jul 23, 2022
@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from 2b5f07e to 093b187 Compare July 27, 2022 08:54
Copy link
Collaborator Author

@tnir tnir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it soon.

source/layouts/base.haml Outdated Show resolved Hide resolved
@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch 2 times, most recently from 3f2bf0e to 5238184 Compare July 27, 2022 09:08
@tnir tnir force-pushed the tnir-algolia-docsearch-691 branch from 5238184 to 2c82de3 Compare July 27, 2022 09:10
@tnir tnir marked this pull request as ready for review July 27, 2022 09:10
@tnir
Copy link
Collaborator Author

tnir commented Jul 27, 2022

@deivid-rodriguez How can I trigger a review app for this (old) PR (just for development purpose)?

@deivid-rodriguez deivid-rodriguez temporarily deployed to bundler-site-tnir-algol-qcaxiy July 29, 2022 06:37 Inactive
@deivid-rodriguez
Copy link
Member

I do it manually from Heroku UI, I created it for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

meta: Replace lunr (middleman-search) with DocSearch
4 participants