Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML search: check for query terms presence in index term properties #13097

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,9 @@ Bugs fixed

* #13060: HTML Search: use ``Map`` to store per-file term scores.
Patch by James Addison
* #13097: HTML Search: add a precautionary check for query term
presence in index properties before accessing them.
Patch by James Addison

Testing
-------
4 changes: 2 additions & 2 deletions sphinx/search/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,8 +164,8 @@ class _JavaScriptIndex:
on the documentation search object to register the index.
"""

PREFIX = 'Search.setIndex('
SUFFIX = ')'
PREFIX = "Search.setIndex('"
SUFFIX = "')"

def dumps(self, data: Any) -> str:
data_json = json.dumps(data, separators=(',', ':'), sort_keys=True)
Expand Down
9 changes: 6 additions & 3 deletions sphinx/themes/basic/static/searchtools.js
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,8 @@ const Search = {
(document.body.appendChild(document.createElement("script")).src = url),

setIndex: (index) => {
Search._index = index;
const reviver = (k, v) => (typeof v === "object" && v !== null) ? Object.freeze(v) : v;
Search._index = JSON.parse(index, reviver);
Comment on lines +222 to +223
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @wlach is right that immutability is orthoganal here. I think 'simply' using JSON.parse rather than an object literal would solve the __proto__ case. Would we then need to keep the hasOwnProperty checks?

An alternative would be to use a reviver function that constructs Map objects/serialise the index with Map literals, and use .get(), but that may also be expensive.

Suggested change
const reviver = (k, v) => (typeof v === "object" && v !== null) ? Object.freeze(v) : v;
Search._index = JSON.parse(index, reviver);
Search._index = JSON.parse(index);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @wlach is right that immutability is orthoganal here. I think 'simply' using JSON.parse rather than an object literal would solve the __proto__ case.

Ok, yep - I can revert 443fd21 to remove the use of Object.freeze / the reviver function.

Would we then need to keep the hasOwnProperty checks?

We should keep those, I think. Without them, queries for __proto__ may continue to inadvertently retrieve object prototypes instead of getting an undefined result.

An alternative would be to use a reviver function that constructs Map objects/serialise the index with Map literals, and use .get(), but that may also be expensive.

What do you think about constructing those Map objects directly in the searchindex.js file format, instead of using JSON.parse and a reviver?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about constructing those Map objects directly in the searchindex.js file format, instead of using JSON.parse and a reviver?

IMO this is definitely worth a shot, I suspect it'll be quite fast.

if (Search._queued_query !== null) {
const query = Search._queued_query;
Search._queued_query = null;
Expand Down Expand Up @@ -513,9 +514,11 @@ const Search = {
// perform the search on the required terms
searchTerms.forEach((word) => {
const files = [];
// find documents, if any, containing the query word in their text/title term indices
// use Object.hasOwnProperty to avoid mismatching against prototype properties
const arr = [
{ files: terms[word], score: Scorer.term },
{ files: titleTerms[word], score: Scorer.title },
{ files: terms.hasOwnProperty(word) ? terms[word] : undefined, score: Scorer.term },
{ files: titleTerms.hasOwnProperty(word) ? titleTerms[word] : undefined, score: Scorer.title },
];
// add support for partial matches
if (word.length > 2) {
Expand Down
2 changes: 1 addition & 1 deletion tests/js/fixtures/cpp/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions tests/js/fixtures/ecmascript/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/js/fixtures/multiterm/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/js/fixtures/partial/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion tests/js/fixtures/titles/searchindex.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Empty file.
6 changes: 6 additions & 0 deletions tests/js/roots/ecmascript/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
ECMAScript
----------

This is a sample JavaScript (aka ``ECMAScript``) project used to generate a search engine index fixture.

Use the `__proto__` property to access the `prototype <https://developer.mozilla.org/en-US/docs/Learn/JavaScript/Objects/Object_prototypes>`_ (if any) of an object instance.
32 changes: 32 additions & 0 deletions tests/js/searchtools.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,38 @@ describe('Basic html theme search', function() {

});

describe('can handle edge-case search queries', function() {

it('can search for the javascript prototype property', function() {
eval(loadFixture("ecmascript/searchindex.js"));

searchParameters = Search._parseQuery('__proto__');

hits = [
[
'index',
'ECMAScript',
'',
null,
5,
'index.rst',
'text'
]
];
expect(Search._performSearch(...searchParameters)).toEqual(hits);
});

it('does not find the javascript prototype property in unrelated documents', function() {
eval(loadFixture("partial/searchindex.js"));

searchParameters = Search._parseQuery('__proto__');

hits = [];
expect(Search._performSearch(...searchParameters)).toEqual(hits);
});

});

});

describe("htmlToText", function() {
Expand Down
2 changes: 1 addition & 1 deletion tests/test_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def load_searchindex(path: Path) -> Any:
assert searchindex.startswith('Search.setIndex(')
assert searchindex.endswith(')')

return json.loads(searchindex[16:-1])
return json.loads(searchindex[17:-2])


def is_registered_term(index: Any, keyword: str) -> bool:
Expand Down
Loading