Syncing to Glue Data Catalog This document walks through the steps to register an Apache XTable™ (Incubating) synced table in Glue Data Catalog on AWS. Pre-requisites Source ta...
Flink Connector Apache Flink supports creating Iceberg table directly without creating the explicit Flink catalog in Flink SQL. That means we can just create an iceberg table by s...
Topic Java RTopic object implements Publish / Subscribe mechanism based on Redis Pub/Sub or Valkey Pub/Sub . It allows to subscribe on events published with multiple instances o...
RisingWave RisingWave is a Postgres-compatible SQL database designed for real-time event streaming data processing, analysis, and management. It can ingest millions of events per...
Gobblin Execution Modes Overview One important feature of Gobblin is that it can be run on different platforms. Currently, Gobblin can run in standalone mode (which runs on a sing...
Introduction Docker Docker Repositories Run the docker image with simple wikipedia jobs Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion Run Gobblin as a Service ...
Introduction Hive SerDe Integration Writing to an ORC File Data Flow Extending Gobblin’s SerDe Integration Introduction Gobblin is capable of writing data to ORC files by le...
Over the years, LinkedIn’s data infrastructure team built custom solutions for ingesting diverse data entities into our Hadoop eco-system. At one point, we were running 15 t...
Using Gobblin as a Library Creating an Embedded Gobblin instance Configuring Embedded Gobblin Running Embedded Gobblin Extending Embedded Gobblin Using Gobblin as a Library ...