Introduction Docker Docker Repositories Run the docker image with simple wikipedia jobs Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion Run Gobblin as a Service ...
Introduction Record format Configuration General configuration values Authentication No credentials Using certificates Using bucket password Document level expiration 1 - Ex...
Daft Daft is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry. It exposes its flavor of t...
Native implementation Client Side caching is implemented using client tracking listener through RESP3 protocol available in Redis or Valkey. It’s used to speed up read operation...
How Hive Registration Works in Gobblin HiveSpec HiveRegistrationPolicy HiveSerDeManager Predicate and Activity How to Use Hive Registration in Your Gobblin Job Hive Regist...
Topic Java RTopic object implements Publish / Subscribe mechanism based on Redis Pub/Sub or Valkey Pub/Sub . It allows to subscribe on events published with multiple instances o...
Querying from Google BigQuery Iceberg tables To read an Apache XTable™ (Incubating) synced Iceberg table from BigQuery , you have two options: Using Iceberg JSON metadata file ...
Querying from StarRocks StarRocks allows you to query table formats like Hudi, Delta and Iceberg tables using our external catalog feature. Users do not need additional configura...