Documentation Overview GitHub Wiki Limitations MkDocs ReadTheDocs Additional Information Documentation Overview The documentation for Gobblin is based on ReadTheDocs and Mk...
Introduction Hive SerDe Integration Writing to an ORC File Data Flow Extending Gobblin’s SerDe Integration Introduction Gobblin is capable of writing data to ORC files by le...
Using Gobblin as a Library Creating an Embedded Gobblin instance Configuring Embedded Gobblin Running Embedded Gobblin Extending Embedded Gobblin Using Gobblin as a Library ...
Daft Daft is a distributed query engine written in Python and Rust, two fast-growing ecosystems in the data engineering and machine learning industry. It exposes its flavor of t...
Querying from Apache Spark To read an Apache XTable™ (Incubating) synced target table (regardless of the table format) in Apache Spark locally or on services like Amazon EMR, Goog...
Overview Redisson offers ability to run as standalone node and participate in distributed computing. Such Nodes are used to run MapReduce , ExecutorService , ScheduledExecutorServ...
Introduction Skip first three rows Return middle two rows Using an expression with SKIP to return a subset of the rows SKIP defines from which record to start including the r...
Users may add custom functions to AGE. When using Cypher functions, all function calls with a Cypher query use the default namespace of: ag_catalog . However if a user wants to us...
Synchronous and Asynchronous API Redisson instances are fully thread-safe. Synchronous and Asynchronous API could be reached via RedissonClient interface. Most Redisson objects...