Introduction Docker Docker Repositories Run the docker image with simple wikipedia jobs Use Gobblin Standalone on Docker for Kafka and HDFS Ingestion Run Gobblin as a Service ...
Using Gobblin as a Library Creating an Embedded Gobblin instance Configuring Embedded Gobblin Running Embedded Gobblin Extending Embedded Gobblin Using Gobblin as a Library ...
How Hive Registration Works in Gobblin HiveSpec HiveRegistrationPolicy HiveSerDeManager Predicate and Activity How to Use Hive Registration in Your Gobblin Job Hive Regist...
Introduction How it works gobblin-modules/ Gobblin flavor Current flavors and modules What’s next Introduction Gobblin-modules is a way to support customization of the gobb...
Daft Daft is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry. It exposes its flavor of t...
Spark Queries To use Iceberg in Spark, first configure Spark catalogs . Iceberg uses Apache Spark’s DataSourceV2 API for data source and catalog implementations. Querying with S...
Building interoperable tables using Apache XTable™ (Incubating) This demo walks you through a fictional use case and the steps to add interoperability between table formats using ...
Reliability Iceberg was designed to solve correctness problems that affect Hive tables running in S3. Hive tables track data files using both a central metastore for partitions a...