Documentation Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to compute engines including Spark, Trino, PrestoDB, Flink, Hive and Impala u...
Gobblin General Questions What is Gobblin? What programming languages does Gobblin support? Does Gobblin require any external software to be installed? What Hadoop versions can ...
Introduction Quartz Azkaban Oozie Launching Gobblin in Local Mode Example Config Files Uploading Files to HDFS Adding Gobblin jar Dependencies Launching the Job Launching ...
Spark DDL To use Iceberg in Spark, first configure Spark catalogs . Iceberg uses Apache Spark’s DataSourceV2 API for data source and catalog implementations. CREATE TABLE Spark...
Lock Redis or Valkey based distributed reentrant Lock object for Java and implements Lock interface. Uses pub/sub channel to notify other threads across all Redisson instances w...
Id generator Redis or Valkey based Java Id generator RIdGenerator generates unique numbers but not monotonically increased. At first request, batch of id numbers is allocated and...
Using Gobblin as a Library Creating an Embedded Gobblin instance Configuring Embedded Gobblin Running Embedded Gobblin Extending Embedded Gobblin Using Gobblin as a Library ...
Topic Java RTopic object implements Publish / Subscribe mechanism based on Redis Pub/Sub or Valkey Pub/Sub . It allows to subscribe on events published with multiple instances o...
Daft Daft is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry. It exposes its flavor of t...