Building interoperable tables using Apache XTable™ (Incubating) This demo walks you through a fictional use case and the steps to add interoperability between table formats using ...
Daft Daft is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry. It exposes its flavor of t...
Using Gobblin as a Library Creating an Embedded Gobblin instance Configuring Embedded Gobblin Running Embedded Gobblin Extending Embedded Gobblin Using Gobblin as a Library ...
Reliability Iceberg was designed to solve correctness problems that affect Hive tables running in S3. Hive tables track data files using both a central metastore for partitions a...
Introduction Hadoop and S3 The s3a File System The s3 File System Getting Gobblin to Publish to S3 Signing Up For AWS Setting Up EC2 Launching an EC2 Instance EC2 Package I...
Introduction How it works gobblin-modules/ Gobblin flavor Current flavors and modules What’s next Introduction Gobblin-modules is a way to support customization of the gobb...
Overview How to submit .pull file through HDFS Overview Previously, the job configuration files could only be loaded from and monitored in the local file system. Efforts have ...