Introduction Quartz Azkaban Oozie Launching Gobblin in Local Mode Example Config Files Uploading Files to HDFS Adding Gobblin jar Dependencies Launching the Job Launching ...
Getting Started The latest version of Iceberg is 1.8.1 . Spark is currently the most feature-rich compute engine for Iceberg operations. We recommend you to get started with Spar...
Spark DDL To use Iceberg in Spark, first configure Spark catalogs . Iceberg uses Apache Spark’s DataSourceV2 API for data source and catalog implementations. CREATE TABLE Spark...
Introduction Implementation Summary Entities Work Flow Configuration Introduction The Google Search Console data ingestion project is to download query and analytics data f...
Partitioning What is partitioning? Partitioning is a way to make queries faster by grouping similar rows together when writing. For example, queries for log entries from a logs ...
Advantages of Migrating to Gobblin Kafka Ingestion Related Job Config Properties Config properties for pulling Kafka topics Config properties for compaction Deployment and Chec...
Features and Limitations Features Apache XTable™ (Incubating) provides users with the ability to translate metadata from one table format to another. Apache XTable™ (Incubatin...
Spark Queries To use Iceberg in Spark, first configure Spark catalogs . Iceberg uses Apache Spark’s DataSourceV2 API for data source and catalog implementations. Querying with S...