Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update ElasticSearch and Elastica to 6.x #1880

Open
wants to merge 14 commits into
base: qa/2.x
Choose a base branch
from

Conversation

anvit
Copy link
Contributor

@anvit anvit commented Oct 9, 2024

Update ElasticSearch and Elastica and address issues introduced by breaking changes in ElasticSearch 6.x.

The most notable changes are:

  • Dropped support for multiple mapping types: ES 6.x no longer allows using multiple indices, They suggest handling this by creating multiple indices, each with its own type.
  • Mapping changes: The include_in_all mapping parameter is disallowed and the _all field is disabled by default.

Changes in Elastica:

  • Elastica only has a addDocuments, deleteDocuments, and updateDocuments methods in 6.x and there are no longer addDocument, deleteDocument, updateDocument methods for handling a singular documents.

@anvit anvit self-assigned this Oct 10, 2024
@anvit anvit requested a review from a team October 10, 2024 16:22
@anvit anvit added this to the 2.9.0 milestone Oct 10, 2024
Copy link
Contributor

@jraddaoui jraddaoui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great @anvit!

I have not been able to look at everything in detail yet, but I have added a few comments that I think are important and may require some work already. Let me know your thoughts ;)

Comment on lines 80 to 81
$itemType = QubitSearch::getInstance()->index->getIndexTypeName($item['type']);
$search->addIndex($index)->addType($index->getType($itemType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using a different type on each index, I'd suggest to use the dummy type name _doc, as it will be used in 7.x:

https://www.elastic.co/blog/removal-of-mapping-types-elasticsearch#schedule

Using the same type for all indices should simplify many places modified in this PR.

$bulk->setType('QubitInformationObject');
$type = 'QubitInformationObject';
$bulk->setIndex(QubitSearch::getInstance()->index->getType($type)->getName());
$bulk->setType(QubitSearch::getInstance()->index->getIndexTypeName($type));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be just $bulk->setType("_doc");

{
$this->_instance = $instance;
$this->_instance = [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels strange to use an index decorator to manage multiple indexes. I'd remove this class and move the logic altogether to arElasticSearchPlugin.class.php.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at removing this at first but realized that one reason we'd kept this file was that since we introduced multiple indices, we needed to use the ES index name set in the AtoM config as the prefix for all index names, and we would be repeating a lot of code for adding in the prefix if we didn't include a function for doing this.

I've renamed this class and refactored some code to clarify what this class is for.

Comment on lines 485 to 486
!isset($propertyProperties['copy_to'])
|| ('_all' == $propertyProperties['copy_to'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will now copy to _all all the mapping fields where you removed include_in_all: false. It may be better to add copy_to: _all to the fields we really want to include and change this condition to isset($propertyProperties['copy_to']) && ('_all' == $propertyProperties['copy_to']).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the mappings.yml file to copy all relevant mappings to _all

anvit and others added 10 commits November 15, 2024 15:53
Update ElasticSearch to 6.8.23 in the Dockerfile and update Elastica to
6.x in composer.
- 'inline' scripts are deprecated, changed to 'source'
Update arElasticSearchPlugin to use multiple indices instead of multiple
types since ES 6.x removed being able to add multiple types. Also update
mapping to remove include_in_all and add that to the _all field using
copy_to instead since include_in_all was also removed in ES 6.x
Update autocompleteAction to use multiple indices
Add the appropriate index types for ES requests that did not need to
specify explicit types in ES 5.x
Add missing document type to partialUpdate in arElasticSearchPlugin
Update condition in arElasticSearchPluginUtil to only get string
fields included in _all if copy_to is explicitly set to _all in the
mappping.
Renamed arElasticSearchIndexDecorator as
arElasticSearchMultiIndexWrapper for clarity and added a few comments
about changes needed for ElasticSearch/Elastica 7.x.
Rename getType from arElasticSearchMultiIndexWrapper to getIndex for
clarity.
Add a dummy ElasticSearch index type since one is still required for
ES 6.x, instead of having a seperate type name for each of the
multiple indices.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants