Improve taxonomy parser performance by serializing writes #249

alexgarel · 2023-09-07T08:31:58Z

The parser in parser/openfoodfacts_taxonomy_parser/parser.py create_nodes, creates nodes one by one, whereas it never makes a request to deduce what to create.

So instead of creating data one by one we should:

create a list of all the queries we need to run
batch run them

The same applies to create_child_link, where requests could be run in batch.

This might improve performance by a important factor.

One elegant way to do this kind of batch request is https://neo4j.com/docs/cypher-manual/current/clauses/unwind/, but I'm not sure we need it.

teolemon added this to 🧬✎ Taxonomy Editor (General) Apr 9, 2024

github-project-automation bot moved this to Todo in 🧬✎ Taxonomy Editor (General) Apr 9, 2024

teolemon added the 🚅 Performance label Aug 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve taxonomy parser performance by serializing writes #249

Improve taxonomy parser performance by serializing writes #249

alexgarel commented Sep 7, 2023

Improve taxonomy parser performance by serializing writes #249

Improve taxonomy parser performance by serializing writes #249

Comments

alexgarel commented Sep 7, 2023