Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keyword search not fully functional #24

Open
lydiahughes opened this issue Apr 5, 2022 · 3 comments
Open

Keyword search not fully functional #24

lydiahughes opened this issue Apr 5, 2022 · 3 comments
Labels
bug Something isn't working search Issues with search

Comments

@lydiahughes
Copy link
Collaborator

Email from researcher: There is also something not right with the keyword search compared to the old site. I just searched for the term ‘capitalism’ in Black Dwarf and it brings up no results! For 7 Days it brings up only sections headed ‘Capitalism’.

@conatus conatus added bug Something isn't working search Issues with search labels Jul 6, 2022
@lydiahughes
Copy link
Collaborator Author

The second part of this "For 7 Days it brings up only sections headed ‘Capitalism’." is no longer true, but it is still true that Black Dwarf shows no results, not only for Capitalism but for any search term. The Search function does not work for Black Dwarf

@chrisdevereux
Copy link
Contributor

chrisdevereux commented Jul 18, 2022

Hi Hanna – the immediate reason for this is that none of the black dwarf issues are tagged with 'capitalism' as a keyword.

This has happened for a couple of reasons:

  1. The keywords associated with articles are automatically generated using an algorithm which looks for words and phrases that occur more frequently in that article than in other articles.

    We did this because we didn't have a list of keywords associated with each article and wanted to populate the articles with a 'good enough' set of keywords at first. The idea was always that the keywords would need to be edited by hand as the algorithm is very far from perfect – it works for well for words that are less evenly distributed (like 'guevara'), but not for words that show up pretty much everywhere (like 'capitalism')

  2. What we mean by keywords is a little different to the old archive - from using the old archive, it seemed to treat keywords as phrases that appeared in the article. In the new archive, keywords are better understood as more like categories or topics, although it also gives people the option of searching by phrase.

    We have two ways of searching articles – by keyword (which is the default in advanced search) or by literal phrase (which happens in the basic search and is an option in advanced search). If you use the 'incluides phrase' option to search black dwarf, you'll see a much bigger set of results. Eg: https://banmarchive.org.uk/search/?mode=advanced&publication=188&decade=&author=&bools=AND&ops=phrase&values=capitalism

What can we do about it? I can see a few other options (presented in order of how much work they would be). Am open to suggestions from you as well!

  • Remove the 'matches keyword' search option or make 'includes phrase' the default.
  • Change the keyword algorithm to be more permissive about wheter it tags an article with a keyword (maybe every word in its content except common ones like 'the')
  • Change the language we use in the search feature to talk about categories instead of keywords and gradually go through every article, categorising them by hand.

@lydiahughes
Copy link
Collaborator Author

Hanna: Hi chris, thanks, this is really useful to understand.
I think my order of preference would be your option 2 first, trying to change the algorithm to see if it helps making the search better. And then second, if that didn't significantly improve results, then your option 1, making includes phrase the default. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working search Issues with search
Projects
None yet
Development

No branches or pull requests

3 participants