Hudi source connector

Support Those Engines

Spark
Flink
SeaTunnel Zeta

Key Features

Description

Used to read data from Hudi. Currently, only supports hudi cow table and Snapshot Query with Batch Mode.

In order to use this connector, You must ensure your spark/flink cluster already integrated hive. The tested hive version is 2.3.9.

Supported DataSource Info

  • Currently, only supports Hudi cow table and Snapshot Query with Batch Mode

Data Type Mapping

Hudi Data Type Seatunnel Data Type
ALL TYPE STRING

Source Options

Name Type Required Default Description
table.path String Yes - The hdfs root path of hudi table,such as ‘hdfs://nameserivce/data/hudi/hudi_table/‘.
table.type String Yes - The type of hudi table. Now we only support ‘cow’, ‘mor’ is not support yet.
conf.files String Yes - The environment conf file path list(local path), which used to init hdfs client to read hudi table file. The example is ‘/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml’.
use.kerberos bool No false Whether to enable Kerberos, default is false.
kerberos.principal String yes when use.kerberos = true - When use kerberos, we should set kerberos principal such as ‘test_user@xxx’.
kerberos.principal.file string yes when use.kerberos = true - When use kerberos, we should set kerberos principal file such as ‘/home/test/test_user.keytab’.
common-options config No - Source plugin common parameters, please refer to Source Common Options for details.

Task Example

Simple:

This example reads from a Hudi COW table and configures Kerberos for the environment, printing to the console.

  1. # Defining the runtime environment
  2. env {
  3. parallelism = 2
  4. job.mode = "BATCH"
  5. }
  6. source{
  7. Hudi {
  8. table.path = "hdfs://nameserivce/data/hudi/hudi_table/"
  9. table.type = "cow"
  10. conf.files = "/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml"
  11. use.kerberos = true
  12. kerberos.principal = "test_user@xxx"
  13. kerberos.principal.file = "/home/test/test_user.keytab"
  14. }
  15. }
  16. transform {
  17. # If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
  18. # please go to https://seatunnel.apache.org/docs/transform-v2/sql/
  19. }
  20. sink {
  21. Console {}
  22. }

Changelog

2.2.0-beta 2022-09-26

  • Add Hudi Source Connector