Introduction Hadoop and S3 The s3a File System The s3 File System Getting Gobblin to Publish to S3 Signing Up For AWS Setting Up EC2 Launching an EC2 Instance EC2 Package I...
Getting Started The latest version of Iceberg is 1.8.1 . Spark is currently the most feature-rich compute engine for Iceberg operations. We recommend you to get started with Spar...
Overview Information Recorded Job Execution Information Task Execution Information Default Implementation Rest Query API Example Queries Job Execution History Server Over...
Spark DDL To use Iceberg in Spark, first configure Spark catalogs . Iceberg uses Apache Spark’s DataSourceV2 API for data source and catalog implementations. CREATE TABLE Spark...
Overview Information Recorded Job Execution Information Task Execution Information Default Implementation Rest Query API Example Queries Job Execution History Server Over...
Introduction Pre-requisites Steps Configuration Details What Next? Introduction The Kafka writer allows users to create pipelines that ingest data from Gobblin sources into ...
Over the years, LinkedIn’s data infrastructure team built custom solutions for ingesting diverse data entities into our Hadoop eco-system. At one point, we were running 15 t...
Syncing to Glue Data Catalog This document walks through the steps to register an Apache XTable™ (Incubating) synced table in Glue Data Catalog on AWS. Pre-requisites Source ta...