Syncing to Glue Data Catalog This document walks through the steps to register an Apache XTable™ (Incubating) synced table in Glue Data Catalog on AWS. Pre-requisites Source ta...
Introduction Docker Docker Repositories Run the docker image with simple wikipedia jobs Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion Run Gobblin as a Service ...
Flink Connector Apache Flink supports creating Iceberg table directly without creating the explicit Flink catalog in Flink SQL. That means we can just create an iceberg table by s...
RisingWave RisingWave is a Postgres-compatible SQL database designed for real-time event streaming data processing, analysis, and management. It can ingest millions of events per...
Over the years, LinkedIn’s data infrastructure team built custom solutions for ingesting diverse data entities into our Hadoop eco-system. At one point, we were running 15 t...
Gobblin Execution Modes Overview One important feature of Gobblin is that it can be run on different platforms. Currently, Gobblin can run in standalone mode (which runs on a sing...
Querying from Apache Spark To read an Apache XTable™ (Incubating) synced target table (regardless of the table format) in Apache Spark locally or on services like Amazon EMR, Goog...
Using Gobblin as a Library Creating an Embedded Gobblin instance Configuring Embedded Gobblin Running Embedded Gobblin Extending Embedded Gobblin Using Gobblin as a Library ...