Skip to content

Latest commit

 

History

History
381 lines (305 loc) · 15.1 KB

README.md

File metadata and controls

381 lines (305 loc) · 15.1 KB

This Helm chart deploys the vertica-kafka-scheduler with two modes:

  • initializer: Configuration mode. Starts a container so that you can exec into it and configure it.
  • launcher: Launch mode. Launches the vkconfig scheduler. Starts a container that calls vkconfig launch automatically. Run this mode after you configure the container in initializer mode.

Install the charts

Add the charts to your repo and install the Helm chart. The following helm install command uses the image.tag parameter to install version 24.1.0:

$ helm repo add vertica-charts https://vertica.github.io/charts
$ helm repo update
$ helm install vkscheduler vertica-charts/vertica-kafka-scheduler \ 
    --set "image.tag=24.1.0"

Sample manifests

The following dropdowns provide sample manifests for a Kafka cluster, VerticaDB operator and custom resource (CR), and vkconfig scheduler. These manifests are applied in Usage to demonstrate how a simple deployment:

kafka-cluster.yaml (with Strimzi operator)
apiVersion: kafka.strimzi.io/v1beta2
kind: Kafka
metadata:

  namespace: kafka
  name: my-cluster
spec:
  kafka:
    version: 3.6.0
    replicas: 1
    listeners:
      - name: plain
        port: 9092
        type: internal
        tls: false
      - name: tls
        port: 9093
        type: internal
        tls: true
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      default.replication.factor: 1
      min.insync.replicas: 1
      inter.broker.protocol.version: "3.6"
    storage:
      type: jbod
      volumes:
      - id: 0
        type: persistent-claim
        size: 100Gi
        deleteClaim: false
  zookeeper:
    replicas: 1
    storage:
      type: persistent-claim
      size: 100Gi
      deleteClaim: false
  entityOperator:
    topicOperator: {}
    userOperator: {}
vdb-op-cr.yaml
apiVersion: vertica.com/v1
kind: VerticaDB
metadata:
  annotations:
    vertica.com/include-uid-in-path: "false"
    vertica.com/vcluster-ops: "false"
  name: vdb-1203
spec:
  communal:
    credentialSecret: ""
    endpoint: https://s3.amazonaws.com
    path: s3://<path>/<to>/<s3-bucket>
  image: vertica/vertica-k8s:12.0.3-0
  initPolicy: Create
  subclusters:
  - name: sc0
    size: 3
    type: primary
vertica-kafka-scheduler.yaml
 image:
   repository: opentext/kafka-scheduler
   pullPolicy: IfNotPresent
   tag: 12.0.3
 launcherEnabled: false
 replicaCount: 1
 initializerEnabled: true
 conf:
   generate: true
   content:
     config-schema: Scheduler
     username: dbadmin
     dbport: '5433'
     enable-ssl: 'false'
     dbhost: 10.20.30.40
 tls:
   enabled: false
 serviceAccount:
   create: true

Usage

The following sections deploy a Kafka cluster and a VerticaDB operator and CR on Kubernetes. Then, they show you how to configure Vertica to consume data from Kafka by setting up the necessary tables and configuring the scheduler. Finally, you launch the scheduler and send data on the command line to test the implementation.

Deploy the manifests

Apply manifests on Kubernetes to create a Kafka cluster, VerticaDB operator, and VerticaDB CR:

  1. Create a namespace. The following command creates a namespace named kafka:

    kubectl create namespace kafka
  2. Create the Kafka custom resource. Apply the kafka-cluster.yaml manifest:

    kubectl apply -f kafka-cluster.yaml
  3. Deploy the VerticaDB operator and custom resource. The vdb-op-cr.yaml manifest deploys version 12.0.3. Before you apply the manifest, edit spec.communal.path to provide a path to an existing S3 bucket:

    kubectl apply -f vdb-op-cr.yaml

Set up Vertica

Create tables and resources so that Vertica can consume data from a Kafka topic:

  1. Create a Vertica database for Kafka messages:
    CREATE FLEX TABLE KafkaFlex();
  2. Create the Kafka user:
    CREATE USER KafkaUser;
  3. Create a resource pool:
    CREATE RESOURCE POOL scheduler_pool PLANNEDCONCURRENCY 1;

Create a Kafka topic

Start the Kafka service, and create a Kafka topic that the scheduler can consume data from:

  1. Open a new shell and start the Kafka producer:
    kubectl -namespace kafka run kafka-producer -ti --image=quay.io/strimzi/kafka:0.38.0-kafka-3.6.0 --rm=true --restart=Never -- bash
  2. Create the Kafka topic that the scheduler subscribes to:
    bin/kafka-console-producer.sh --bootstrap-server my-cluster-kafka-bootstrap.kafka:9092 --topic KafkaTopic1

Configure the scheduler

Deploy the scheduler container in initializer mode, and configure the scheduler to consume data from the Kafka topic:

  1. Deploy the vertica-kafka-scheduler Helm chart. This manifest has initializerEnabled set to true so you can configure the vkconfig container before you launch the scheduler:

    kubectl apply -f vertica-kafka-scheduler.yaml
  2. Use kubectl exec to get a shell in the initializer pod:

    kubectl exec -namespace main -it vk1-vertica-kafka-scheduler-initializer -- bash
  3. Set configuration options for the scheduler. For descriptions of each of the following options, see vkconfig script options:

    # scheduler options 
    vkconfig scheduler --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
     --frame-duration 00:00:10 \
     --create --operator KafkaUser \
     --eof-timeout-ms 2000 \
     --config-refresh 00:01:00 \
     --new-source-policy START \
     --resource-pool scheduler_pool
    
    # target options 
    vkconfig target --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
     --target-schema public \
     --target-table KafkaFlex
    
    # load spec options 
    vkconfig load-spec --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
     --load-spec KafkaSpec \
     --parser kafkajsonparser \
     --load-method DIRECT \
     --message-max-bytes 1000000
    
    # cluster options 
    vkconfig cluster --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
     --cluster KafkaCluster \
     --hosts my-cluster-kafka-bootstrap.kafka:9092
    
    # source options 
    vkconfig source --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
     --cluster KafkaCluster \
     --source KafkaTopic1 \
     --partitions 1
    
    # microbatch options 
    vkconfig microbatch --add --conf /opt/vertica/packages/kafka/config/vkconfig.conf \
     --microbatch KafkaBatch1 \
     --add-source KafkaTopic1 \
     --add-source-cluster KafkaCluster \
     --target-schema public \
     --target-table KafkaFlex \
     --rejection-schema public \
     --rejection-table KafkaFlex_rej \
     --load-spec KafkaSpec

Launch the scheduler

After you configure the scheduler options, you can deploy it in launcher mode:

helm upgrade -namespace main vk1 vertica-charts/vertica-kafka-scheduler \
  --set "launcherEnabled=true"

Testing the deployment

Now that you have a containerized Kafka cluster and VerticaDB CR running, you can test that the scheduler is automatically sending data from the Kafka producer to Vertica:

  1. In the terminal that is running your Kafka producer, send sample JSON data:

    >{"a": 1}
    >{"a": 1000}
  2. In a different terminal, open vsql and query the KafkaFlex table for the data:

    => SELECT compute_flextable_keys_and_build_view('KafkaFlex');
                                     compute_flextable_keys_and_build_view                    
    --------------------------------------------------------------------------------------------------------
     Please see public.KafkaFlex_keys for updated keys
    The view public.KafkaFlex_view is ready for querying
    (1 row)
     
    => SELECT a from KafkaFlex_view;
     a
    -----
     1
     1000
    (2 rows)

Parameters

affinity
Applies affinity rules that constrain the scheduler to specific nodes.
conf.configMapName
Name of the ConfigMap to use and optionally generate. If omitted, the chart picks a suitable default.
conf.content
Set of key-value pairs in the generated ConfigMap. If conf.generate is false, this setting is ignored.
conf.generate
When set to true, the Helm chart controls the creation of the vkconfig.conf ConfigMap.
Default: true
fullNameOverride
Gives the Helm chart full control over the name of the objects that get created. This takes precedence over nameOverride.
initializerEnabled
When set to true, the initializer pod is created. This can be used to run any setup tasks needed.
Default: true
image.pullPolicy
How often Kubernetes pulls the image for an object. For details, see Updating Images in the Kubernetes documentation.
Default: IfNotPresent
image.repository
The image repository and name that contains the Vertica Kafka Scheduler.
Default: opentext/kafka-scheduler
image.tag
Version of the Vertica Kafka Scheduler. This setting must match the version of the Vertica server that the scheduler connects to.
Default: Helm chart's appVersion
imagePullSecrets
List of Secrets that contain the required credentials to pull the image.
launcherEnabled
When set to true, the Helm chart creates the launch deployment. Enable this setting after you configure the scheduler options in the container.
Default: true
jvmOpts
Values to assign to the VKCONFIG_JVM_OPTS environment variable in the pods.

NOTE You can omit most truststore and keystore settings because they are set by tls.* parameters.

nameOverride
Controls the name of the objects that get created. This is combined with the Helm chart release to form the name.
nodeSelector
nodeSelector that controls where the pod is scheduled.
podAnnotations
Annotations that you want to attach to the pods.
podSecurityContext
Security context for the pods.
replicaCount
Number of launch pods that the chart deploys.
Default: 1
resources
Host resources to use for the pod.
securityContext
Security context for the container in the pod.
serviceAccount.annotations
Annotations to attach to the ServiceAccount.
serviceAccount.create
When set to true, a ServiceAccount is created as part of the deployment.
Default: true
serviceAccount.name
Name of the service account. If this parameter is not set and serviceAccount.create is set to true, a name is generated using the fullname template.
timezone
Utilize this to manage the timezone of the logger. As logging employs log4j, ensure you use a Java-friendly timezone ID. Refer to this site for available IDs: https://docs.oracle.com/middleware/1221/wcs/tag-ref/MISC/TimeZones.html
Default: UTC
tls.enabled
When set to true, the scheduler is set up for TLS authentication.
Default: false
tls.keyStoreMountPath
Directory name where the keystore is mounted in the pod. This setting controls the name of the keystore within the pod. The full path to the keystore is constructed by combining this parameter and tls.keyStoreSecretKey.
tls.keyStorePassword
Password that protects the keystore. If this setting is omitted, then no password is used.
tls.keyStoreSecretKey
Key within tls.keyStoreSecretName that is used as the keystore file name. This setting and tls.keyStoreMountPath form the full path to the key in the pod.
tls.keyStoreSecretName
Name of an existing Secret that contains the keystore. If this setting is omitted, no keystore information is included.
tls.trustStoreMountPath
Directory name where the truststore is mounted in the pod. This setting controls the name of the truststore within the pod. The full path to the truststore is constructed by combining this parameter with tls.trustStoreSecretKey.
tls.trustStorePassword
Password that protects the truststore. If this setting is omitted, then no password is used.
tls.trustStoreSecretKey
Key within tls.trustStoreSecretName that is used as the truststore file name. This is used with tls.trustStoreMountPath to form the full path to the key in the pod.
tls.trustStoreSecretName
Name of an existing Secret that contains the truststore. If this setting is omitted, then no truststore information is included.
tolerations
Applies tolerations that control where the pod is scheduled.