Emily Riederer works in data science for the consumer finance industry where she leads a team to build analysis tools in R and cultivate an open science culture in industry. Previ...
Introduction Hive on Spark Differences Between Kyuubi and HiveServer2 Performance References Introduction HiveServer2 is a service that enables clients to execute Hive QL qu...
Introduction Quartz Azkaban Oozie Launching Gobblin in Local Mode Example Config Files Uploading Files to HDFS Adding Gobblin jar Dependencies Launching the Job Launching ...
Unless the target readers are highly interested in the computational details while they read a report, you may not want to show the source code blocks in the report. For this purp...
Managing Watermarks in a Job Basics Task Failures Multi-Dataset Jobs Gobblin State Deep Dive State class hierarchy How States are Used in a Gobblin Job This page has two p...
Gobblin General Questions What is Gobblin? What programming languages does Gobblin support? Does Gobblin require any external software to be installed? What Hadoop versions can ...
A MySQL, Oracle, PostgreSQL, or Amazon RDS database instance must be running and available to be used by Ranger. The Ranger installation will create two new users (default names: ...
Yihui typed out most of the words in this book, which is the only justification for him being the “first” author. Christophe has made substantial contribution to this book by help...