diff --git a/docs/runtime_suite/audit-trail/10_overview.md b/docs/runtime_suite/audit-trail/10_overview.md new file mode 100644 index 0000000000..6fccd32daf --- /dev/null +++ b/docs/runtime_suite/audit-trail/10_overview.md @@ -0,0 +1,268 @@ +--- +id: overview +title: Audit Trail +sidebar_label: Overview +--- + + + +Audit trail collects events happening in the systems inside your environment to provide observability and accountability. + +Each audit log represents an event happened within a system and should include the following information, to answer the five Ws: + +- **who** did it: the user or system who performed the operation and other users involved; +- **what** happened: which operation was performed (data is accessed or changed, etc.) and which resources were affected (a record was created, read, updated or deleted, etc.); +- **why** it happened: the operation scope (creation, deletion, consultation, aggregation, etc.); +- **where** it happened: which service triggered the event and generated the audit log and which one, if different, carried out the operation; +- **when** it happened: when the operation was performed (a timestamp). + +For example, if a doctor creates an appointment with a patient using the [Calendar component][am-calendar] and the [Appointment Manager][appointment-manager], the audit log should include at least: + +- **who**: the account ID of the doctor; +- **what**: the details about the insert operation performed on the [CRUD Service][crud-service], including: + - an identifier of the [CRUD Service][crud-service] instance called by the [Appointment Manager][appointment-manager]; + - the name of the collection where the appointment was created; + - the unique id of the appointment record created in the CRUD collection; + - the unique id of the patient involved in the appointment; +- **why**: the creation of an appointment; +- **where**: the services involved in the operation, including: + - the [Calendar component][am-calendar], used by the doctor to create the appointment; + - the [Appointment Manager][appointment-manager], called by the component; + - the [CRUD Service][crud-service], called by the Appointment Manager to create the appointment record. + +The audit trail is meant to provide enough information to answer common questions like: + +- who accessed the medical records of a given patient in the last month; +- who changed a system configuration in the last 24 hours; +- which medical records were accessed or modified by a doctor in the last week; +- etc. + +The following table provides a glossary for the most common terms and expressions used throughout this page. +Unless stated otherwise, when you encounter any of these terms or expressions you should assume they have the stated meaning. + +| Term | Definition | +|---------------|----------------------------------------------------------------------------------------------------------------| +| *Audit Trail* | The entire collection of audit logs. | +| *Data Store* | The system where the audit logs are stored (database, message queue, etc.). | +| *Id* | Unique identifier, globally or within a given *namespace*. | +| *GCP* | Acronym of Google Cloud Platform. | +| *Namespace* | One or more fields identifying a subsystem of the infrastructure (service, database, table, collection, etc.). | +| *Operation* | An activity performed on a system, by a user or another system acting as clients. | +| *Resource* | A database record or system component accessed or modified by an operation. | +| *Source* | The system where the event happens and is recorded in the audit trail. | +| *System* | A software service running in your cluster. | +| *User* | A physical person interacting with a software system. | + +## Architecture + +![High-level architecture](img/audit-trail-architecture.png) + +Here's what happens when an audit log is generated by your microservice: + +1. the Audit Trail sidecar collects the audit logs generated by your application through a shared volume, enrich them and sends them to a Kafka topic; +2. the [Kafka2Rest][kafka2rest] service processes the logs from the Kafka topic and sends a `POST` request to the [CRUD Service][crud-service]; +3. the [CRUD Service][crud-service] saves the audit log in a [MongoDB][mongodb] collection. + +From here, you can easily query the audit logs according to your needs, for example building a frontend application using the [Microfrontend composer][microfrontend-composer]. + +### Monitoring and alerting + +Under certain conditions, especially where a lot of logs are generated, some logs may be lost (including audit logs). + +Since the log file is stored on an ephemeral volume, if the pod is restarted the log file could be lost before the sidecar is able to finish processing its content. +The sidecar is designed to automatically resume watching the application log from the beginning after is restarted, so it should collect correctly any log generated by the main application container while it was not available. + +To avoid logs file becoming too large, the sidecar rotate logs automatically every day and keeps only the last five. Each time, the original log file is truncated in place after creating a copy, to minimize the risk of incoming logs not written to the file. There is still a chance that logs are written to the original file during the few milliseconds between the creation of the copy and the truncation of the original one. + +Therefore, we recommend setting up proper alarms on your infrastructure to ensure the health of all the architecture components. +If you are using the PaaS, you can leverage [Grafana Alerting][paas-grafana-alerting] to monitor the health of the logging stack, the sidecars, [Kafka2Rest][kafka2rest] and the [CRUD Service][crud-service]. + +For additional performance considerations, please take a look at the [*Performance tuning*][performance] section. + +## Data model + +The audit logs are enriched with metadata and normalized by the sidecar. This section provides an overview of its data model, which is inspired by the following standards and solutions: + +- [RFC 3881][rfc-3881] +- [FHIR AuditEvent][fhir-audit-event] +- [OpenTelemetry Logs][open-telemetry-logs-data-model] +- [GCP Cloud Logging][gcp-cloud-logging-data-model] + +To ensure semantical consistency across the logs generated by the different services running inside your projects and make it easier to query and aggregate logs from different sources, we provide a reference data model that you can use as starting point when passing structured data to your logs. + +The main goal of having a shared data model is to enable aggregating and querying audit logs generated by heterogenous systems through a unified interface. + +We encourage you to customize the data model to suit your specific needs while ensuring it includes enough information to be able to answer quickly and effectively to the most common questions mentioned at the beginning of this page. + +### Version + +| Field name | Type | Required | RFC 3881 | FHIR | OpenTelemetry | GCP | +|------------|--------|----------|----------|------|---------------|-----| +| `version` | String | Yes | - | - | - | - | + +The version of the audit log data model, to ensure backward and forward compatibility. + +The value should adhere to [semantic versioning][semantic-versioning]. + +```json +{ + "version": "1.0.0" +} +``` + +### Timestamp + +| Field name | Type | Required | RFC 3881 | FHIR | OpenTelemetry | GCP | +|-------------|--------|----------|--------------------------------------------|---------------------------------------|--------------------------------------------|-------------------------------------------| +| `timestamp` | String | Yes | [Event Date/Time][rfc-3881-event-datetime] | [recorded][fhir-audit-event-recorded] | [Timestamp][open-telemetry-logs-timestamp] | [timestamp][gcp-cloud-logging-data-model] | + +A timestamp indicating when the event happened. + +```json +{ + "timestamp": "2023-12-01T09:34:56.789Z" +} +``` + +### Checksum + +| Field name | Type | Required | RFC 3881 | FHIR | OpenTelemetry | GCP | +|----------------------|--------|----------|----------|------|---------------|-----| +| `checksum` | Object | Yes | - | - | - | - | +| `checksum.algorithm` | String | Yes | - | - | - | - | +| `checksum.value` | String | Yes | - | - | - | - | + +An integrity checksum (`checksum.value`) computed using preferably the SHA-512 algorithm (`checksum.algorithm`) on the other log fields. + +```json +{ + "checksum": { + "algorithm": "sha512", + "value": "b1f4aaa6b51c19ffbe4b1b6fa107be09c8acafd7c768106a3faf475b1e27a940d3c075fda671eadf46c68f93d7eabcf604bcbf7055da0dc4eae6743607a2fc3f" + } +} +``` + +### Message + +| Field name | Type | Required | RFC 3881 | FHIR | OpenTelemetry | GCP | +|------------|--------|----------|----------|------|---------------|-----| +| `message` | String | No | - | - | - | - | + +The log message. + +```json +{ + "message": "A log message" +} +``` + +### Metadata + +| Field name | Type | Required | RFC 3881 | FHIR | OpenTelemetry | GCP | +|----------------------|--------|----------|-------------------------------------------------------------------------------------|---------------------------------------|---------------------------------------------------|-----------------------------------------------| +| `metadata` | Object | Yes | - | - | [Attributes][open-telemetry-logs-attributes] | - | +| `metadata.event` | String | No | [Event ID][rfc-3881-event-id] | [code][fhir-audit-event-code] | [Attributes][open-telemetry-logs-attributes] | - | +| `metadata.severity` | String | No | - | [severity][fhir-audit-event-severity] | [SeverityText][open-telemetry-logs-severity-text] | [severity][gcp-cloud-logging-log-severity] | +| `metadata.operation` | String | No | [Event Action Code][rfc-3881-event-action-code] | [action][fhir-audit-event-action] | [Attributes][open-telemetry-logs-attributes] | - | +| `metadata.request` | String | No | [Network Access Point Identification][rfc-3881-network-access-point-identification] | - | [Attributes][open-telemetry-logs-attributes] | [HttpRequest][gcp-cloud-logging-http-request] | +| `metadata.resource` | String | No | - | - | [Attributes][open-telemetry-logs-attributes] | - | +| `metadata.source` | String | Yes | [Audit Source ID][rfc-3881-audit-source-id] | - | [Attributes][open-telemetry-logs-attributes] | - | +| `metadata.user` | String | No | [User ID][rfc-3881-user-id] | - | [Attributes][open-telemetry-logs-attributes] | - | + +The `metadata` field is design to contain structured data passed to the logger and representing event metadata you can later query on MongoDB, like: + +```js +logger.audit({ + event: 'AM/AppointmentCreated', + severity: 'info', + source: 'appointment-manager', + resource: 'AM/Appointment/appointment-12345', + user: 'dr.john.watson' +}, 'Appointment created') +``` + +which would be stored as: + +```json +{ + "metadata": { + "event": "AM/AppointmentCreated", + "severity": "info", + "source": "appointment-manager", + "resource": "AM/Appointment/appointment-12345", + "user": "dr.john.watson", + }, + "message": "Appointment created" +} +``` + +We recommend enforcing a common data model, to ensure you can correlate events and metadata originating from different sources. +Each service can then add custom fields to provide context specific details. + +In our plugins, we try to follow the common data schema specified in the table, with the following semantics: + +- `event`: type of event (API called, job executed, medical record updated, etc.); +- `severity`: the log level associated to the event, like `debug`, `info`, `warning`, `error` and so on; +- `operation`: type of operation performed (record created, read, accessed or deleted, etc.); +- `request`: the ID of the request triggering or originating the event; +- `resource`: unique identifier of the main resource affected by the operation (medical record ID, etc.); +- `source`: unique identifier of the application or system where the event occurs and the audit log is generated; +- `user`: unique identifier of the user who triggered the request. + + +[performance]: 20_configuration.md#performance-tuning + +[am-calendar]: /runtime_suite/care-kit/20_components/10_am-calendar.md +[appointment-manager]: /runtime_suite/appointment-manager/10_overview.md +[paas-grafana-alerting]: /infrastructure/paas/tools/grafana.md#alerting +[crud-service]: /runtime_suite/crud-service/10_overview_and_usage.md +[kafka2rest]: /runtime_suite/crud-service/10_overview_and_usage.md +[microfrontend-composer]: /microfrontend-composer/overview.md + +[fhir-audit-event]: https://www.hl7.org/fhir/auditevent.html "AuditEvent | FHIR" +[fhir-audit-event-action]: https://www.hl7.org/fhir/auditevent-definitions.html#AuditEvent.action +[fhir-audit-event-agent]: https://www.hl7.org/fhir/auditevent-definitions.html#AuditEvent.agent +[fhir-audit-event-code]: https://www.hl7.org/fhir/auditevent-definitions.html#AuditEvent.code +[fhir-audit-event-entity]: https://www.hl7.org/fhir/auditevent-definitions.html#AuditEvent.entity +[fhir-audit-event-recorded]: https://www.hl7.org/fhir/auditevent-definitions.html#AuditEvent.recorded +[fhir-audit-event-severity]: https://www.hl7.org/fhir/auditevent-definitions.html#AuditEvent.severity +[fhir-audit-event-source]: https://www.hl7.org/fhir/auditevent-definitions.html#AuditEvent.source + +[gcp-cloud-logging-data-model]: https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry "Log Entry | Cloud Logging | Google Cloud" +[gcp-cloud-logging-http-request]: https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#HttpRequest +[gcp-cloud-logging-log-severity]: https://cloud.google.com/logging/docs/reference/v2/rest/v2/LogEntry#logseverity + +[open-telemetry-logs-data-model]: https://opentelemetry.io/docs/specs/otel/logs/data-model/ "Logs Data Model | OpenTelemetry" +[open-telemetry-logs-attributes]: https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-attributes +[open-telemetry-logs-body]: https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-body +[open-telemetry-logs-resource]: https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-resource +[open-telemetry-logs-severity-text]: https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-severitytext +[open-telemetry-logs-timestamp]: https://opentelemetry.io/docs/specs/otel/logs/data-model/#field-timestamp + +[rfc-3881]: https://www.rfc-editor.org/rfc/rfc3881 "Security Audit and Access Accountability Message XML Data Definitions for Healthcare Applications" +[rfc-3881-audit-source-id]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.4.2 +[rfc-3881-event-datetime]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.1.3 +[rfc-3881-event-action-code]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.1.2 +[rfc-3881-event-outcome-indicator]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.1.4 +[rfc-3881-event-id]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.1.1 +[rfc-3881-network-access-point-identification]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.3 +[rfc-3881-participant-object-data-life-cycle]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.5.3 +[rfc-3881-participant-object-detail]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.5.9 +[rfc-3881-participant-object-id]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.5.6 +[rfc-3881-participant-object-id-type-code]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.5.4 +[rfc-3881-participant-object-identification]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.5 +[rfc-3881-participant-object-query]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.5.8 +[rfc-3881-participant-object-type-code]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.5.1 +[rfc-3881-participant-object-type-code-role]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.5.2 +[rfc-3881-user-id]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.2.1 +[rfc-3881-active-participant-identification]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.2 +[rfc-3881-participant-object-detail]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.5.9 +[rfc-3881-participant-object-query]: https://datatracker.ietf.org/doc/html/rfc3881#section-5.5.8 + +[mongodb]: https://www.mongodb.com +[semantic-versioning]: https://semver.org/spec/v2.0.0.html diff --git a/docs/runtime_suite/audit-trail/20_configuration.md b/docs/runtime_suite/audit-trail/20_configuration.md new file mode 100644 index 0000000000..d7bb586afe --- /dev/null +++ b/docs/runtime_suite/audit-trail/20_configuration.md @@ -0,0 +1,299 @@ +--- +id: configuration +title: Configuration +sidebar_label: Configuration +--- + + + +To setup the architecture described previously, you need to configure the following components: + +- the Audit Trail sidecar; +- a [Kafka][kafka] cluster; +- the [Kafka2Rest][kafka2rest] service; +- the [CRUD Service][crud-service]; +- a [MongoDB][mongodb] server. + +## Sidecar + +The `Audit Trail` sidecar is available in the marketplace and can be attached to any microservice in your project. + +The only requirement for the sidecar to work is to [share a volume][k8s-shared-volume] with the main application container, so that logs are written to a shared log file, which is read by the sidecar to filter the audit logs based on their log level, enrich them with additional metadata and sends them to the Kafka topic. + +To configure your sidecar and starts collecting the audit logs, you must: + +1. Attach the sidecar to the microservice (see the [official documentation][console-managing-sidecars]) +2. [Add a shared volume](#shared-volume) +3. Configure the sidecar [environmental variables](#environment-variables) + +### Shared volume + +Once you have attached a sidecar to your microservice, you have two options: + +- customize your project configuration with [Kustomize][console-kustomize]; +- customize the service deployment file using [Console Raw Manifest configuration][console-service-raw-manifest-configuration]. + +#### Customize with Kustomize + +Open the GitLab repository containing your Console project configuration and perform the following steps for each project environment you want to make the changes: + +1. Create a new YAML file with an arbitrary name - e.g. `patch.yaml` - under `overlays/` + +2. Open the `overlays//kustomization.yaml` file and add a new entry in the `patches` section, specifying the name of your microservice under `target/name` and the name of the file created at the previous step under `path`: + +```yaml +apiVersion: kustomize.config.k8s.io/v1beta1 +kind: Kustomization +patches: + - target: + kind: Deployment + name: + path: patch.yaml +``` + +4. In the `patch.yaml` file, add a new shared volume + +```yaml +- op: add + path: /spec/template/spec/volumes/- + value: + name: logs + emptyDir: {} +``` + +4. In the `patch.yaml` file, attach to each container the shared volume ensuring the `name` property matches the name you chose for the shared volume defined previously. + +```yaml +# Let's assume the first container listed under `spec/template/spec/containers` in the deployment file is the main application container +- op: add + path: "/spec/template/spec/containers/0/volumeMounts/-" + value: + mountPath: "/logs/" + name: "logs" + +# Let's assume the second container listed under `spec/template/spec/containers` in the deployment file is the sidecar container +- op: add + path: "/spec/template/spec/containers/1/volumeMounts/-" + value: + mountPath: "/logs/" + name: "logs" +``` + +5. In the `patch.yaml` file, override the service Dockerfile command to send logs both to standard output and to the file specified in the sidecar `LOG_FILE_PATH` environment variable. + +```yaml +# Let's assume the first container listed under `spec/template/spec/containers` in the deployment file is the main application container +- op: add + path: "/spec/template/spec/containers/0/command" + value: ["/bin/sh", "-c"] + args: ["node ./dist/index.js | tee /logs/application.log"] +``` + +6. From the Console, set the `LOG_FILE_PATH` environment variable of the sidecar with the absolute path of the log file, in the example `/logs/application.log`. + +#### Customize with Console Raw Manifest + +After enabling the [Raw Manifest configuration][console-service-raw-manifest-configuration] on your service, you should be able to edit the deployment file, called `configuration/.deployment.yml`. + +1. Under `spec/template/spec/volumes`, add a new shared volume. + +```yml +apiVersion: apps/v1 +kind: Deployment +spec: + template: + spec: + volumes: + - name: logs + emptyDir: {} +``` + +2. Attach to each container listed under `spec/template/spec/containers` the shared volume, ensuring the `name` property matches the name you chose for the shared volume defined previously. + +```yml +apiVersion: apps/v1 +kind: Deployment +spec: + template: + spec: + containers: + - name: your-service + image: ... + volumeMounts: + - name: logs + mountPath: /logs/ + - name: audit-trail + image: nexus.mia-platform.eu/mia-care/plugins/audit-trail:1.0.0 + volumeMounts: + - name: logs + mountPath: /logs/ + volumes: + - name: logs + emptyDir: {} +``` + +3. Set the `LOG_FILE_PATH` environment variable to the absolute path of a file located inside the mount patch (`/logs/` in our example); you can choose any name we want for the file, like `application.log`. + +```yml +apiVersion: apps/v1 +kind: Deployment +spec: + template: + spec: + containers: + - name: your-service + image: ... + volumeMounts: + - name: logs + mountPath: /logs/ + - name: audit-trail + image: nexus.mia-platform.eu/mia-care/plugins/audit-trail:1.0.0 + volumeMounts: + - name: logs + mountPath: /logs/ + env: + - name: LOG_FILE_PATH + value: /logs/application.log + volumes: + - name: logs + emptyDir: {} +``` + +1. Ensure your service sends logs both to standard output and to the file specified in the sidecar `LOG_FILE_PATH` environment variable. You can simply change the Dockerfile default command (`CMD`) to sends it output to both the standard output (as by default) and the log file using `tee`. If you cannot edit the Dockerfile directly, for example because you are using a marketplace plugin, you can simply override the Dockerfile default command in the [microservice configuration `args` field][console-microservices-args]. + +### Environment variables + +To configure the sidecar you must set the following environment variables. +The required environment variables are highlighted in **bold**. +We strongly recommend using secrets to load sensitive information (like Kafka credentials and brokers address) into environment variables. + +| Name | Required | Default | Version | Description | +|------------------------------|----------|---------|---------|--------------------------------------------------------------------------------------------| +| AUDIT_TRAIL_LOG_LEVEL | No | 1100 | * | The log level associated to audit logs. | +| **KAFKA_BROKERS** | Yes | - | * | A comma-separated list of Kafka brokers address. | +| **KAFKA_CLIENT_ID** | Yes | - | * | The Kafka client ID, for example the identifier of the service the sidecar is attached to. | +| KAFKA_TOPIC | | - | * | Name of the Kafka topic to send audit logs to. | +| KAFKA_AUTH_MECHANISM | | PLAIN | * | Authentication mechanism, used only if `KAFKA_USERNAME` and `KAFKA_PASSWORD` are set. | +| KAFKA_USERNAME | | - | * | Username of the Kafka credentials. | +| KAFKA_PASSWORD | | - | * | Password of the Kafka credentials | +| KAFKA_CONNECTION_TIMEOUT | No | 1000 | * | Time in milliseconds to wait for a successful connection. | +| KAFKA_AUTHENTICATION_TIMEOUT | No | 10000 | * | Time in milliseconds to wait for a successful authentication. | +| **LOG_FILE_PATH** | Yes | - | * | Absolute path of the log file inside the shared volume. | +| **LOG_LEVEL** | Yes | - | * | Logging level for the sidecar. | + + +:::danger + +When configuring your microservice, be careful to set a log level that is lower or equal than the `AUDIT_TRAIL_LOG_LEVEL`, otherwise the audit logs will be suppressed by the logger. + +::: + +## Kafka + +To collect the logs you just need to create a dedicated topic in a Kafka cluster. + +After creating a new topic, you should configure the retention period, taking into account the amount of audit logs generated by the systems. When choosing the retention period, you need to find a balance between the availability of logs for later processing, accounting for the unavailability of the downstream services ([Kafka2Rest][kafka2rest] and the [CRUD Service][crud-service]), and the amount of logs generated by your services, which can vary between environments depending on the level of traffic and the configured log level. + +A log will appear in Kafka as a message with a payload looking like this: + +```json +{ + "version": "1.0.0", + "timestamp": "2024-04-30T09:12:06.095Z", + "checksum": { + "algorithm": "sha512", + "value": "e474e95adfb31ef4cac7d992732a43d65e3313c852bd5e318dd84087b39ab62b19ff4c5590a6d5d5055ee5e3669c384c55eff0f36fe090205bd67d92d4aa9381" + }, + "metadata": { + "level": 1100, + "msg": "Hello World Audit Log", + "event": "MiaCare/HelloWorld/v1", + "severity": "INFO" + }, + "message": "Hello World Audit Log", + "rawLog": "{\"level\":1100,\"msg\":\"Hello World Audit Log\", ...}" +} +``` + +The audit logs are enriched with several fields before being sent from the sidecar to the Kafka topic: + +- `version`: the version of the audit logs reference schema; +- `timestamp`: when the audit log was recorded ; +- `checksum`: this checksum is generated automatically from the original log (available in the `rawLog` field); +- `metadata`: this field contains all the log fields, including the ones passed as first argument to the logger; +- `message`: this field contains the original log message (currently the message must be in the log `msg` field); +- `rawLog`: the original log, as emitted by the application logger. + +For more details, check the [data model overview][overview-data-model]. + +:::tip + +If you need to record when the even occurred, you should pass it explicitly as field of the object passed as first argument to the logger, so it's recorded in the metadata and available later for querying. + +::: + +## Kafka2Rest + +The [Kafka2Rest][kafka2rest] service should authenticate with Kafka using [SASL/SCRAM][kafka-sasl-scram] with Transport Layer Security (TLS) in combination with dedicated credentials, that have exclusive access to the topic. + +You can also configure a [validator processor][kafka2rest-validator-processor], to performs additional filtering on the audit logs, and a [body processor][kafka2rest-body-processor], to manipulate the audit log before sending it as payload of the POST to the [CRUD Service][crud-service]. + +## CRUD Service + +The [CRUD Service][crud-service] should connect to MongoDB using dedicated credentials, which only allow insert or read operations, and must not be exposed. + +You must create a CRUD collection with the custom fields described in the following table or you can easily import the fields from this JSON file + +| Name | Type | Required | Nullable | +|-------------|-----------|----------|----------| +| `version` | String | No | No | +| `timestamp` | Date | No | No | +| `metadata` | RawObject | No | No | +| `checksum` | RawObject | No | No | +| `message` | String | No | No | +| `rawLog` | String | No | No | + +## Performance tuning + +After you have setup all the components of the architecture, you need to estimate the amount of logs generated by your microservices and appropriately scale your infrastructure. + +Once you have established you overall performance requirements, in terms of expected incoming traffic and amount of generated logs by each service, you should carry a load test to assign the proper resources to each component, starting with the sidecar, which has to read all the logs of the main application container, filter the audit logs and forward them to Kafka. + +We recommend starting with vertical scaling, trying to assign more CPU resources to process incoming logs faster, and eventually resolve to scale horizontally with multiple replicas of the service. +According to our internal benchmarks, a sidecar with the following resources should be able process around 1000 logs/second: + +| Service | Version | CPU (req/lim) | RAM (req/lim) | +|--------------|---------|---------------|---------------| +| CRUD Service | 5.4.2 | 150/300 | 100/150 | +| Kafka2Rest | 1.1.1 | 150/300 | 50/100 | +| Sidecar | 1.0.0 | 150/300 | 50/100 | + +Then you can focus on scaling appropriately the [Kafka2Rest][kafka2rest] and [CRUD Service][crud-service] services. + +Starting with [Kafka2Rest][kafka2rest], you can use a conservative configuration, since logs forwarded to the Kafka topic can be processed asynchronously. If you opt for multiple replicas, for optimal performance you should configure the topic to have the same number of partitions, so each [Kafka2Rest][kafka2rest] replica can parallelize work processing logs from different partitions. + +Finally, you can look at the [CRUD Service][crud-service] and scale it to "match" the amount of requests generated by [Kafka2Rest][kafka2rest]. +We expect that, in most cases, the default configuration will work fine. +For additional information and guidance, take a look at the [CRUD Service performance documentation][crud-service-performance]. + + +[kafka]: https://kafka.apache.org/ +[k8s-shared-volume]: https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/ +[kafka-sasl-scram]: https://docs.confluent.io/platform/current/kafka/authentication_sasl/authentication_sasl_scram.html +[mongodb]: https://www.mongodb.com + +[console-kustomize]: /console/project-configuration/kustomize-your-configurations/index.md +[console-service-raw-manifest-configuration]: /development_suite/api-console/api-design/services.md#raw-manifest-configuration +[console-managing-sidecars]: /console/design-your-projects/sidecars.md +[console-microservices-args]: docs/development_suite/api-console/api-design/services.md#microservice-configuration +[crud-service]: /runtime_suite/crud-service/10_overview_and_usage.md +[crud-service-performance]: /runtime_suite/crud-service/40_performance_overview.md +[kafka2rest]: /runtime_suite/crud-service/10_overview_and_usage.md +[kafka2rest-body-processor]: /runtime_suite/kafka2rest-service/02_configuration.md#body-processors +[kafka2rest-validator-processor]: /runtime_suite/kafka2rest-service/02_configuration.md#validator-processors + +[overview-data-model]: 10_overview.md#data-model diff --git a/docs/runtime_suite/audit-trail/30_security.md b/docs/runtime_suite/audit-trail/30_security.md new file mode 100644 index 0000000000..1fe36cf1ac --- /dev/null +++ b/docs/runtime_suite/audit-trail/30_security.md @@ -0,0 +1,95 @@ +--- +id: security +title: Security +sidebar_label: Security +--- + + + +Audit logs may contain sensitive information that are not encrypted until they are stored into MongoDB, so you must ensure that: + +1. Data in transit is always encrypted, by using secure connections between [Kafka][kafka], [Kafka2Rest][kafka2rest], the [CRUD Service][crud-service] and [MongoDB][mongodb] +2. Only these services can access the audit trail, by using dedicated credentials with exclusive access to the [Kafka][kafka] topic and the [MongoDB][mongodb] collection +3. The audit trail cannot be altered, deleted or tampered with after being stored into [MongoDB][mongodb] + +The following sections illustrate the technical measures you should implement to satisfy these requirements. + +## Security checklist + +- [ ] Ensure the log level of your microservice is lower or equal than the `AUDIT_TRAIL_LOG_LEVEL`, otherwise the audit logs will be suppressed by the logger. +- [ ] Ensure your microservice does not log any personal or sensitive information or encrypt them client-side; for a detailed list of the data you should not include in your logs, please take a look at the [OWASP Logging Cheat Sheet][owasp-logging-cheat-sheet-data-to-exclude]. +- [ ] Configure monitoring and automatic alerting mechanisms on Kafka or MongoDB to promptly notify your team about relevant events, like potential incidents or suspicious activities (see also the [monitoring and alerting overview][overview-monitoring-alerting]). +- [ ] Restrict access to audit logs to prevent altering or tampering and regularly review access policies +- [ ] Use dedicated MongoDB credentials with exclusive access to the audit logs collection and limited permissions, to prevent update or delete operations on the records or the collection itself +- [ ] Consider using a separate MongoDB instance dedicated exclusively to store audit logs, especially if they contain sensitive information +- [ ] Define audit logs retention policies for Kafka, Grafana and MongoDB + +For more detailed security guidelines, we recommend reading the [OWASP Logging Cheat Sheet][owasp-logging-cheat-sheet] and [OWASP Top 10][owasp-top-10-logging-monitoring-failures]. + +The remaining sections delves into the security of the architecture components. + +## Logs redundancy and retention policies + +To ensure redundancy and protect the availability and integrity of the audit trail, we recommend sending a copy of the audit logs, together with the other application logs, to Grafana. This is the default behavior if you configure the Audit Trail in the PaaS. + +This measure provides an additional layer of security against any technical issues and malicious attempts of altering or tampering with the audit logs, since, depending on the various retention policies, you are going to have up to three different read-only copies of the audit logs on Kafka, Grafana and MongoDB respectively. + +The following table provides a starting point to figure out the best retention policies for your specific use case, giving enough room to find an adequate balance of security and resource usage. + +| Component | Retention | +|-----------|-----------------------------| +| Grafana | 30-60 days (45 in the PaaS) | +| Kafka | 1-5 days | +| MongoDB | Forever | + +## Encryption of data in transit + +The sidecar and [Kafka2Rest][kafka2rest] should authenticate with Kafka using [SASL/SCRAM][kafka-sasl-scram] with Transport Layer Security (TLS) and use dedicated credentials, granting exclusive access to the topic and ensuring no other system can write in the Kafka topic. + +For additional information on how to configure a Kafka cluster to encrypt data in transit, please take a look at Confluent [official documentation][kafka-encryption] and [security course][kafka-security-course]. + +All HTTP services should communicate with each other only using HTTPS connection with Transport Layer Security (TLS), unless internal endpoints are used. + +## Encryption of data at rest + +Kafka does not natively support data encryption at rest, so you may need to perform client-side encryption on sensitive information before including them in the log. + +To ensure audit logs cannot be altered, deleted and tampered with once they have been stored inside a MongoDB collection, you should: + +- Use a dedicated MongoDB database, where only audit logs are stored +- Create a user with [custom roles][mongodb-user-roles], allowing only insert or read operations (update or delete operations must be forbidden) +- Review roles and permissions of default users to prevent them from accessing, updating or deleting audit logs or altering or deleting the collection +- Enable [Client-side field level encryption (CSFLE)][mongodb-csfle] on the fields which may contain sensitive information + +## Client-side encryption + +The [CRUD Service][crud-service] should use [client-side encryption][crud-service-csfle] on all fields that may contain sensitive or personal information and use [Google Key Management Service][crud-service-google-kms] to safely store the master encryption key. + +:::note + +Client-side encryption requires an enterprise version of MongoDB supporting [Client-side field level encryption (CSFLE)][mongodb-csfle] (v4.2 or later). + +::: + + +[kafka]: https://kafka.apache.org/ +[kafka-encryption]: https://docs.confluent.io/platform/current/kafka/encryption.html +[kafka-sasl-scram]: https://docs.confluent.io/platform/current/kafka/authentication_sasl/authentication_sasl_scram.html +[kafka-security-course]: https://developer.confluent.io/courses/security/intro/ +[mongodb]: https://www.mongodb.com +[mongodb-csfle]: https://www.mongodb.com/docs/v7.0/core/csfle/ +[mongodb-user-roles]: https://www.mongodb.com/docs/manual/core/security-user-defined-roles/ +[owasp-logging-cheat-sheet]: https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html +[owasp-logging-cheat-sheet-data-to-exclude]: https://cheatsheetseries.owasp.org/cheatsheets/Logging_Cheat_Sheet.html#data-to-exclude +[owasp-top-10-logging-monitoring-failures]: https://owasp.org/Top10/A09_2021-Security_Logging_and_Monitoring_Failures/ + +[crud-service]: /runtime_suite/crud-service/10_overview_and_usage.md +[crud-service-csfle]: /development_suite/api-console/api-design/gdpr.md#client-side-encryption +[crud-service-google-kms]: /runtime_suite/crud-service/30_encryption_configuration.md#configure-csfle-with-the-google-cloud-platform-gcp +[kafka2rest]: /runtime_suite/crud-service/10_overview_and_usage.md + +[overview-monitoring-alerting]: 10_overview.md#monitoring-and-alerting diff --git a/docs/runtime_suite/audit-trail/changelog.md b/docs/runtime_suite/audit-trail/changelog.md new file mode 100644 index 0000000000..4e2c283c7c --- /dev/null +++ b/docs/runtime_suite/audit-trail/changelog.md @@ -0,0 +1,18 @@ +--- +id: changelog +title: Changelog +sidebar_label: CHANGELOG +--- + + + +All notable changes to this project will be documented in this file. + +The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/), +and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). + +## [1.0.0] 2024-11-06 diff --git a/docs/runtime_suite/audit-trail/img/audit-trail-architecture.png b/docs/runtime_suite/audit-trail/img/audit-trail-architecture.png new file mode 100644 index 0000000000..fecb0878a1 Binary files /dev/null and b/docs/runtime_suite/audit-trail/img/audit-trail-architecture.png differ diff --git a/static/docs_files_to_download/audit-trail/audit_logs.json b/static/docs_files_to_download/audit-trail/audit_logs.json new file mode 100644 index 0000000000..90939db15a --- /dev/null +++ b/static/docs_files_to_download/audit-trail/audit_logs.json @@ -0,0 +1,98 @@ +[ + { + "name": "_id", + "type": "ObjectId", + "required": true, + "nullable": false, + "description": "_id" + }, + { + "name": "creatorId", + "type": "string", + "required": true, + "nullable": false, + "description": "creatorId" + }, + { + "name": "createdAt", + "type": "Date", + "required": true, + "nullable": false, + "description": "createdAt" + }, + { + "name": "updaterId", + "type": "string", + "required": true, + "nullable": false, + "description": "updaterId" + }, + { + "name": "updatedAt", + "type": "Date", + "required": true, + "nullable": false, + "description": "updatedAt" + }, + { + "name": "__STATE__", + "type": "string", + "required": true, + "nullable": false, + "description": "__STATE__" + }, + { + "name": "version", + "type": "string", + "required": false, + "nullable": false, + "sensitivityValue": 0, + "encryptionEnabled": false, + "encryptionSearchable": false + }, + { + "name": "timestamp", + "type": "Date", + "required": false, + "nullable": false, + "sensitivityValue": 0, + "encryptionEnabled": false, + "encryptionSearchable": false + }, + { + "name": "metadata", + "type": "RawObject", + "required": false, + "nullable": false, + "sensitivityValue": 0, + "encryptionEnabled": false, + "encryptionSearchable": false + }, + { + "name": "checksum", + "type": "RawObject", + "required": false, + "nullable": false, + "sensitivityValue": 0, + "encryptionEnabled": false, + "encryptionSearchable": false + }, + { + "name": "message", + "type": "string", + "required": false, + "nullable": false, + "sensitivityValue": 0, + "encryptionEnabled": false, + "encryptionSearchable": false + }, + { + "name": "rawLog", + "type": "string", + "required": false, + "nullable": false, + "sensitivityValue": 0, + "encryptionEnabled": false, + "encryptionSearchable": false + } +] \ No newline at end of file