Skip to content

dream-aim-deliver/kernel-planckster

Repository files navigation

Kernel Planckster Code style: black

This repository contains the core management system for Max Planck Institute Data Systems Group's Satellite Data Augmentation Project. It is being developed by DAD (Dream, Aim, Deliver) as part of the collaboration with the Max Planck Institute.

Development

Forks

Please fork this repository to your own GitHub account and clone it to your local machine. Please avoid pushing to this repository directory directly, either to the main branch or to any other branch.

gh repo clone <your-username>/kernel-planckster

git remote add upstream https://github.com/dream-aim-deliver/kernel-planckster.git

Then head over to

https://github.com/<your_username>/kernel-planckster/settings/actions

and enable GitHub Actions for your fork by selecting the Allow all actions and reusable workflows option.

Database Models

Database Models

Setup

# AT THE ROOT OF THE PROJECT

python3 -m venv .venv
source .venv/bin/activate

# Install poetry
pip install poetry

On ARM architectures ( like Apple Silicone), you might need to install psycopg2 with the following command:

brew install libpq --build-from-source
brew install openssl

export LDFLAGS="-L/opt/homebrew/opt/[email protected]/lib -L/opt/homebrew/opt/libpq/lib"
export CPPFLAGS="-I/opt/homebrew/opt/[email protected]/include -I/opt/homebrew/opt/libpq/include"

pip3 install psycopg2

Continue with the setup as follows:

# Install dependencies
poetry install

# Setup pre-commit
pre-commit install
pre-commit run --all-files

# Set up environment variables for pytest in pyproject.toml as needed, but the defaults should work

Configuration

Kernel Planckster is configured via environment variables directly. These environment variables are NOT loaded from any .env file. You can set these environment variables in your shell or in your IDE.

Name Default Value
KP_ROOT_DIRECTORY ./tests/mocks
KP_SOURCE_DATA_DIR source_data
KP_RDBMS_HOST localhost
KP_RDBMS_PORT 5435
KP_RDBMS_DBNAME kp-db
KP_RDBMS_USERNAME postgres
KP_RDBMS_PASSWORD postgres
KP_FASTAPI_PORT 8005
KP_OBJECT_STORE_HOST localhost
KP_OBJECT_STORE_PORT 9002
KP_OBJECT_STORE_ACCESS_KEY minio
KP_OBJECT_STORE_SECRET_KEY minio123
KP_OBJECT_STORE_BUCKET default
KP_OBJECT_STORE_SIGNED_URL_EXPIRY 60

Autogenerate Alembic Migrations

Using docker containers to spin up an SQL database, you can autogenerate migrations with Alembic. From the root of the project, do:

docker compose --profile dev up -d
alembic upgrade head
alembic revision --autogenerate -m "migration message"
alembic upgrade head
alembic downgrade base
alembic upgrade head
docker compose --profile dev down

Make sure to fix any errors given by the alembic commands above before executing the next one and commiting the changes. In particular, you might need to fix the alembic version files.

Accessing the server and database

For development, the environment variable KP_MODE has to be set to development. This is the default value used by the docker-compose.yml and launch.json files.

In development mode, postgres, fastapi and adminer are all running in docker containers. These containers will be started automatically when you start FastAPI launch configuration in VSCode or when you run the dev script in the root of the project, as shown below:

poetry run dev

The containers will be removed when you stop the dev script or stop the FastAPI launch configuration in VSCode.

Service Host/Port Mode
FastAPI http://localhost:8000 development
Postgres 0.0.0.0:5432 development
Adminer UI http://localhost:8080 development

FastAPI provides the Swagger UI at http://localhost:8000/docs to manually test the API endpoints, and the ReDoc UI at http://localhost:8000/redoc for a documentation view of the endpoints. In development mode, you can access Adminer interface at http://localhost:8080 to check the database. The credentials are:

System: PostgreSQL
Server: db
Username: postgres
Password: postgres
Database: kp-db

To test the object store in dev mode, you can run:

poetry run dev:storage

This will do the same as poetry run dev but also start a minio container:

Service Host/Port Mode
MinIO http://localhost:9001 development

You can access MinIO at http://localhost:9001 to check the object storage. The credentials are:

Access Key: minio
Secret Key: minio123

Testing

You run tests on the command line with

poetry run pytest -s

Or in the VSCode test explorer UI.

In test mode, postgres, fastapi, adminer, and minio are all running docker containers and will be automatically removed once the tests are finished.

DANGLING CONTAINER WARNING: If you are debugging tests in VSCode and notice that your test is failing, which triggers a breakpoint, then DO NOT STOP the debugger. Just press F5 and let the tests finish. The containers will be removed automatically once the tests are finished. Otherwise, you will end up with dangling test containers that will have to be removed manually. To remove these containers manually:

cd tests
docker compose down

In testing mode, you can access the services as follows:

Service Host/Port Mode
FastAPI http://localhost:8005 test
Postgres 0.0.0.0:5435 test
Adminer UI http://localhost:8085 test
MinIO http://localhost:9002 test

In test mode, you can access Adminer interface at http://localhost:8085 to check the database. The credentials are:

System: PostgreSQL
Server: db
Username: postgres
Password: postgres
Database: kp-db

You can also access MinIO at http://localhost:9002 to check the object storage. The credentials are defined in the pyproject.toml file, under the pytest ini_options section.

Running the production server (FastAPI)

In production mode, you must configure the dependencies like MinIO, Postgres, Kafka, etc. via environment variables.

See the configuration section above for the list of environment variables.

Below is an example of environment variables that you can set in your shell if you are starting the provided docker-compose.yml file

 export KP_MODE=production
 export KP_FASTAPI_HOST=0.0.0.0
 export KP_FASTAPI_PORT=80
 export KP_FASTAPI_RELOAD=false
 export KP_RDBMS_HOST=localhost
 export KP_RDBMS_PORT=5435
 export KP_RDBMS_DBNAME=kp-db
 export KP_RDBMS_USERNAME=postgres
 export KP_RDBMS_PASSWORD=postgres
 export KP_OBJECT_STORE_HOST=localhost
 export KP_OBJECT_STORE_PORT=9002
 export KP_OBJECT_STORE_ACCESS_KEY=minio
 export KP_OBJECT_STORE_SECRET_KEY=minio123
 export KP_OBJECT_STORE_BUCKET=default
 export KP_OBJECT_STORE_SIGNED_URL_EXPIRY=60

Then you can run the server with:

poetry run start

In production mode, the Uvicorn server is not started with the --reload flag, so you will have to restart the server manually if you make any changes to the code.

Additionally, the --proxy-headers flag is set to True by default in production mode. This is to ensure that the correct client IP address is logged in the server logs. This will tell Uvicorn to trust the headers sent by that proxy telling it that the application is running behind HTTPS, etc.

Contributing

We use VSCode as our IDE. If you use VSCode, please install the recommended extensions.

Issues

Please use Issues to report any bugs or feature requests.

Once you have been assigned an issue, please create a branch with the following naming convention:

feature-<issue number>-<short description>

We recommend using the provided create-feature-branch utility to create a branch with the correct name. This script will also pull the latest changes from the remote repository and create the new branch for you to work in.

./tools/create-feature-branch <issue number> <short description>

Commits

Commit your change. The commit command must include a specific message format:

git commit -m "<component>: <change_message> #<issue number>"

Valid component names are listed in the label list and are usually specified on the issue of the change.

Add additional explanations to the body of the commit, such as motivation for certain decisions and background information. Here are some general rules: https://cbea.ms/git-commit/.

Using multiple commits is allowed as long as they achieve an independent, well-defined, change and are well-described. Otherwise multiple commits should be squashed.

Pull Requests

Before submitting a pull request, please:

  1. Run pytest, at the root of the project, and fix all the errors:
poetry run pytest -s
  1. Run mypy, at the root of the project, and fix all type errors:
poetry run mypy .
  1. Run black, at the root of the project
poetry run black .

Push the commit to your forked repository and create the pull request. Try to keep the Pull Request simple, it should achieve the single objective described in the issue. Multiple enhancements/fixes should be split into multiple Pull Requests.

Watch the pull request for comments and reviews. For any pull requests update, please try to squash/amend your commits to avoid “in-between” commits.

If you add a github-recognised keyword in the pull request name or in the pull request description, then the associated issue can be closed automatically once the pull request is merged, e.g.:

<component>: <change_message> Fix #<issue number>