Kudu source connector

Support Kudu Version

  • 1.11.1/1.12.0/1.13.0/1.14.0/1.15.0

Support Those Engines

Spark
Flink
SeaTunnel Zeta

Key features

Description

Used to read data from Kudu.

The tested kudu version is 1.11.1.

Data Type Mapping

kudu Data Type SeaTunnel Data Type
BOOL BOOLEAN
INT8
INT16
INT32
INT
INT64 BIGINT
DECIMAL DECIMAL
FLOAT FLOAT
DOUBLE DOUBLE
STRING STRING
UNIXTIME_MICROS TIMESTAMP
BINARY BYTES

Source Options

Name Type Required Default Description
kudu_masters String Yes - Kudu master address. Separated by ‘,’,such as ‘192.168.88.110:7051’.
table_name String Yes - The name of kudu table.
client_worker_count Int No 2 * Runtime.getRuntime().availableProcessors() Kudu worker count. Default value is twice the current number of cpu cores.
client_default_operation_timeout_ms Long No 30000 Kudu normal operation time out.
client_default_admin_operation_timeout_ms Long No 30000 Kudu admin operation time out.
enable_kerberos Bool No false Kerberos principal enable.
kerberos_principal String No - Kerberos principal. Note that all zeta nodes require have this file.
kerberos_keytab String No - Kerberos keytab. Note that all zeta nodes require have this file.
kerberos_krb5conf String No - Kerberos krb5 conf. Note that all zeta nodes require have this file.
scan_token_query_timeout Long No 30000 The timeout for connecting scan token. If not set, it will be the same as operationTimeout.
scan_token_batch_size_bytes Int No 1024 * 1024 Kudu scan bytes. The maximum number of bytes read at a time, the default is 1MB.
filter Int No 1024 * 1024 Kudu scan filter expressions,Not supported yet.
schema Map No 1024 * 1024 SeaTunnel Schema.
table_list Array No - The list of tables to be read. you can use this configuration instead of table_path example: table_list = [{ table_name = "kudu_source_table_1"},{ table_name = "kudu_source_table_2"}]
common-options No - Source plugin common parameters, please refer to Source Common Options for details.

Task Example

Simple:

The following example is for a Kudu table named “kudu_source_table”, The goal is to print the data from this table on the console and write kudu table “kudu_sink_table”

  1. # Defining the runtime environment
  2. env {
  3. parallelism = 2
  4. job.mode = "BATCH"
  5. }
  6. source {
  7. # This is a example source plugin **only for test and demonstrate the feature source plugin**
  8. kudu {
  9. kudu_masters = "kudu-master:7051"
  10. table_name = "kudu_source_table"
  11. result_table_name = "kudu"
  12. enable_kerberos = true
  13. kerberos_principal = "xx@xx.COM"
  14. kerberos_keytab = "xx.keytab"
  15. }
  16. }
  17. transform {
  18. }
  19. sink {
  20. console {
  21. source_table_name = "kudu"
  22. }
  23. kudu {
  24. source_table_name = "kudu"
  25. kudu_masters = "kudu-master:7051"
  26. table_name = "kudu_sink_table"
  27. enable_kerberos = true
  28. kerberos_principal = "xx@xx.COM"
  29. kerberos_keytab = "xx.keytab"
  30. }
  31. }

Multiple Table

  1. env {
  2. # You can set engine configuration here
  3. parallelism = 1
  4. job.mode = "STREAMING"
  5. checkpoint.interval = 5000
  6. }
  7. source {
  8. # This is a example source plugin **only for test and demonstrate the feature source plugin**
  9. kudu{
  10. kudu_masters = "kudu-master:7051"
  11. table_list = [
  12. {
  13. table_name = "kudu_source_table_1"
  14. },{
  15. table_name = "kudu_source_table_2"
  16. }
  17. ]
  18. result_table_name = "kudu"
  19. }
  20. }
  21. transform {
  22. }
  23. sink {
  24. Assert {
  25. rules {
  26. table-names = ["kudu_source_table_1", "kudu_source_table_2"]
  27. }
  28. }
  29. }

Changelog

2.2.0-beta 2022-09-26

  • Add Kudu Source Connector

Next Version

  • Change plugin name from KuduSource to Kudu 3432