Introduction Dataset Config Management Requirement Data Model Versioning Client library Config Store Current Dataset Config Management Implementation Data model Client appli...
Gobblin General Questions What is Gobblin? What programming languages does Gobblin support? Does Gobblin require any external software to be installed? What Hadoop versions can ...
Daft Daft is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry. It exposes its flavor of t...
Using Gobblin as a Library Creating an Embedded Gobblin instance Configuring Embedded Gobblin Running Embedded Gobblin Extending Embedded Gobblin Using Gobblin as a Library ...
Introduction How it works gobblin-modules/ Gobblin flavor Current flavors and modules What’s next Introduction Gobblin-modules is a way to support customization of the gobb...
Building interoperable tables using Apache XTable™ (Incubating) This demo walks you through a fictional use case and the steps to add interoperability between table formats using ...
Reliability Iceberg was designed to solve correctness problems that affect Hive tables running in S3. Hive tables track data files using both a central metastore for partitions a...