Introduction Hive SerDe Integration Writing to an ORC File Data Flow Extending Gobblin’s SerDe Integration Introduction Gobblin is capable of writing data to ORC files by le...
Documentation Overview GitHub Wiki Limitations MkDocs ReadTheDocs Additional Information Documentation Overview The documentation for Gobblin is based on ReadTheDocs and Mk...
How Hive Registration Works in Gobblin HiveSpec HiveRegistrationPolicy HiveSerDeManager Predicate and Activity How to Use Hive Registration in Your Gobblin Job Hive Regist...
Exists(Property) Exists(Path) Predicates are boolean functions that return true or false for a given set of input. They are most commonly used to filter out subgraphs in the WHE...
result_table_name [string] parallelism [int] Example Common parameters of source connectors name type required default value result_table_name string no - ...
Overview Redisson offers ability to run as standalone node and participate in distributed computing. Such Nodes are used to run MapReduce , ExecutorService , ScheduledExecutorServ...
Spark Queries To use Iceberg in Spark, first configure Spark catalogs . Iceberg uses Apache Spark’s DataSourceV2 API for data source and catalog implementations. Querying with S...