Spark Configuration Catalogs Spark adds an API to plug in table catalogs that are used to load, create, and manage Iceberg tables. Spark catalogs are configured by setting Spark ...
Gobblin General Questions What is Gobblin? What programming languages does Gobblin support? Does Gobblin require any external software to be installed? What Hadoop versions can ...
Java API Quickstart Create a table Tables are created using either a Catalog or an implementation of the Tables interface. Using a Hive catalog The Hive catalog connects to...
Flink Queries Iceberg support streaming and batch read With Apache Flink ‘s DataStream API and Table API. Reading with SQL Iceberg support both streaming and batch read in Flink...
Spark Structured Streaming Iceberg uses Apache Spark’s DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support...
Aliyun OSS configs Aliyun OSS Credentials Aliyun OSS Libs In this page, we explain how to get your Hudi spark job to store into Aliyun OSS. Aliyun OSS configs There are two c...
Managing Watermarks in a Job Basics Task Failures Multi-Dataset Jobs Gobblin State Deep Dive State class hierarchy How States are Used in a Gobblin Job This page has two p...
Introduction Quartz Azkaban Oozie Launching Gobblin in Local Mode Example Config Files Uploading Files to HDFS Adding Gobblin jar Dependencies Launching the Job Launching ...
GCS Configs GCS Credentials GCS Libs For Hudi storage on GCS, regional buckets provide an DFS API with strong consistency. GCS Configs There are two configurations required ...