Partitioning What is partitioning? Partitioning is a way to make queries faster by grouping similar rows together when writing. For example, queries for log entries from a logs ...
Branching and Tagging Overview Iceberg table metadata maintains a snapshot log, which represents the changes applied to a table. Snapshots are fundamental in Iceberg as they are ...
Spark Queries To use Iceberg in Spark, first configure Spark catalogs . Iceberg uses Apache Spark’s DataSourceV2 API for data source and catalog implementations. Querying with S...
Evolution Iceberg supports in-place table evolution . You can evolve a table schema just like SQL — even in nested structures — or change partition layout when data volume chang...
Introduction Hadoop and S3 The s3a File System The s3 File System Getting Gobblin to Publish to S3 Signing Up For AWS Setting Up EC2 Launching an EC2 Instance EC2 Package I...
Querying from Google BigQuery Iceberg tables To read an Apache XTable™ (Incubating) synced Iceberg table from BigQuery , you have two options: Using Iceberg JSON metadata file ...
Source schema Converters Converters available in Gobblin Schema specification Supported data types by different converters Primitive types Complex types Array Map Record En...
Gobblin Execution Modes Overview One important feature of Gobblin is that it can be run on different platforms. Currently, Gobblin can run in standalone mode (which runs on a sing...