Author: Aadarsh Anantha Ramakrishnan
Organization: Open Food Facts
Project: Build a Taxonomy Editor
Mentors: Alex Garel, Stephane Gigandet, Pierre Slamich, Charles Nepote
The Open Food Facts database contains a lot of information on food products, such as ingredients, labels, additives etc. To link this information to useful properties such as Nutri-Score, Agribalyse and many more, taxonomies are used within the database.
A taxonomy in Open Food Facts is a raw text file containing a Directed Acyclic Graph (DAG) where each leaf node has one or more parent nodes. It is mainly used for classification and translation of various food products within the database. Hence, taxonomies are at the heart of data structures in the Open Food Facts database and must be maintained properly.
The taxonomy files present in Open Food Facts are long to read (ingredients.txt taxonomy alone has around 80000 lines!) and cumbersome to edit by contributors.
This project provides an User-Friendly interface developed for editing taxonomies with ease. This tool is helpful for contributors to visualize a node's translations, properties, parents and children in a single page. The editor allows users to perform CRUD operations on the taxonomy and on the nodes present. A fast search mechanism for finding nodes within the taxonomy has also been implemented successfully.
The introduction of this Taxonomy Editor would help existing contributors edit taxonomies seamlessly and will encourage more contributions from the wonderful community of Open Food Facts.
- Designed a brand-new architecture for the entire project by decoupling the editor from the source taxonomy files in Product Opener.
- Created a React frontend with the following features:
- Display root nodes present in a taxonomy
- Creating and deleting new root nodes in a taxonomy
- Listing and updating different translations, properties, parents and children of a node in a taxonomy
- Searching nodes across different translations
- Importing taxonomies directly from GitHub and ability to start a new project
- Implemented a Python API (created using FastAPI) to interface with the frontend and perform database operations in Neo4J.
- Helped in creating a specification for parsing a Taxonomy text file.
- Reviewed parser and converter programs written in Python for conversion of a taxonomy from a raw text file to Neo4J.
- Reviewed and created Docker related files for setting up the project in development and production.
The Taxonomy Editor is an MVP, which can gauge community interest and can bring a lot more taxonomy contributions for Open Food Facts. All the core features required by a contributor to edit a taxonomy, have been implemented within the given time frame. Some of the features related to integration with GitHub are being fine-tuned.
We have a lot of scope for the next phase of this Project in converting the MVP to a full-blown version. I have documented the same in GitHub within the project.
- I have invested a good amount of time during the project start for developing understanding on the overall architecture and tech stack for the Taxonomy Editor. Thanks to my mentors, with their mentorship and guidance, I was able to move faster with the deliverables of the project.
- One of the most exciting parts of this project were my learnings on time management. I have to manage my priorities between my college classroom sessions, semester exams, hackathons and projects. My mentors were supportive throughout and helped me to complete all the planned deliverables with quality.
- Before GSoC, I had very little knowledge on developing websites with React. Through this project, I was able to use React effectively to build many complex and reusable components for the Taxonomy Editor. I enjoyed the learning process of a new tool.
- Being a Pythonista, I love creating new programs/tools using Python. This project helped me learn a lot about the creation of asynchronous API's and integrating them with database transactions to enable concurrency.
- Since taxonomies are essentially DAGs at its base, me and my mentors decided on using Neo4j as the database for the editor. Even though I hadn't used Neo4j, I was surprised by the user-friendliness of the Cypher Query Language and was able to create complex queries in no time!
- My code quality and documentation ability has improved drastically, thanks to detailed reviews from mentors over the course of this project.
I'd like to thank all my mentors for helping me throughout this project. A special shoutout to Alex Garel and Daniel T, for their comprehensive code reviews and guidance. I would also like to thank Bryan Han for his work on the creation of the parser and other components related to it.
Contributing to Open Food Facts has been amazing, and I am looking forward to contributing more in the future! I am glad to have worked with the amazing set of people in Open Food Facts. I am extremely thankful to Google Summer of Code for providing me with this opportunity to enhance my programming skills and learn a lot about open-source development along the way.
- #88 - feat: Automatic release to PyPI + fix: failing flake8 tests
- #89 - fix: Automate PyPI workflow
- #90 - build: Generate automatic documentation
- #92 - fix: Dependabot error during CodeQL workflow + fix: Disable labeler on forks
- #94 - build: Package for Conda
- #99 - build: Changed branch for release-please
- #101 - docs: Rework documentation in README.md
- #102 - docs: Update PR template
- #9 - docs: Update README.md
- #10 - feat: Setup basic FastAPI project
- #19 - fix: Changed JSON according to spec
- #26 - feat: New paths for backend API
- #35 - feat: Add GET paths for parents and children + fix: Update Neo4J query
- #41 - feat: Add CRUD features for entries and root nodes
- #42 - feat: Add home screen
- #43 - feat: All entries page + Navbar
- #44 - feat: Initialize edit entry page
- #45 - feat: List all properties of an entry
- #46 - feat: Edit synonyms, stopwords page
- #47 - feat: Display parents and children of an entry
- #74 - fix: Error while updating a node (backend)
- #76 - feat: New paths for API
- #86 - build: Change parser workflow
- #91 - feat: Search functionality - Backend
- #92 - feat: Search functionality - Frontend
- #93 - fix: Using Neo4J transactions
- #97 - fix: Change session to transactions
- #100 - fix: Add UUIDs after fetching node
- #101 - fix: Add multiple labels in backend
- #102 - fix: Add score > 0 condition for search
- #103 - fix: Update requirements.txt for backend
- #104 - feat: Add new endpoint for fetching root nodes