Managing Watermarks in a Job Basics Task Failures Multi-Dataset Jobs Gobblin State Deep Dive State class hierarchy How States are Used in a Gobblin Job This page has two p...
Flink Configuration Catalog Configuration A catalog is created and named by executing the following query (replace <catalog_name> with your catalog name and <config_key> =<confi...
Introduction Quartz Azkaban Oozie Launching Gobblin in Local Mode Example Config Files Uploading Files to HDFS Adding Gobblin jar Dependencies Launching the Job Launching ...
Spark Structured Streaming Iceberg uses Apache Spark’s DataSourceV2 API for data source and catalog implementations. Spark DSv2 is an evolving API with different levels of support...
Features and Limitations Features Apache XTable™ (Incubating) provides users with the ability to translate metadata from one table format to another. Apache XTable™ (Incubatin...
Overview Information Recorded Job Execution Information Task Execution Information Default Implementation Rest Query API Example Queries Job Execution History Server Over...
Overview Information Recorded Job Execution Information Task Execution Information Default Implementation Rest Query API Example Queries Job Execution History Server Over...
Metrics Reporting As of 1.1.0 Iceberg supports the MetricsReporter and the MetricsReport APIs. These two APIs allow expressing different metrics reports while supporting a plu...
Partitioning What is partitioning? Partitioning is a way to make queries faster by grouping similar rows together when writing. For example, queries for log entries from a logs ...