The Kyuubi Hive Connector is a datasource for both reading and writing Hive table, It is implemented based on Spark DataSource V2, and supports concatenating multiple Hive metastore at the same time.

This connector can be used to federate queries of multiple hives warehouse in a single Spark cluster.

Hive Connector Integration

To enable the integration of kyuubi spark sql engine and Hive connector through Apache Spark Datasource V2 and Catalog APIs, you need to:

Dependencies

The classpath of kyuubi spark sql engine with Hive connector supported consists of

  1. kyuubi-spark-connector-hive_2.12-1.9.1, the hive connector jar deployed with Kyuubi distributions
  2. a copy of spark distribution

In order to make the Hive connector packages visible for the runtime classpath of engines, we can use one of these methods:

  1. Put the Kyuubi Hive connector packages into $SPARK_HOME/jars directly
  2. Set spark.jars=/path/to/kyuubi-hive-connector

Configurations

To activate functionality of Kyuubi Hive connector, we can set the following configurations:

  1. spark.sql.catalog.hive_catalog org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
  2. spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.version hive-metastore-version
  3. spark.sql.catalog.hive_catalog.hive.metastore.uris thrift://metastore-host:port
  4. spark.sql.catalog.hive_catalog.hive.metastore.port port
  5. spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.jars path
  6. spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.jars.path file:///opt/hive1/lib/*.jar

For details about the multi-version Hive configuration, see the related multi-version Hive configurations supported by Apache Spark.

Hive Connector Operations

Taking CREATE NAMESPACE as a example,

  1. CREATE NAMESPACE ns;

Taking CREATE TABLE as a example,

  1. CREATE TABLE hive_catalog.ns.foo (
  2. id bigint COMMENT 'unique id',
  3. data string)
  4. USING parquet;

Taking SELECT as a example,

  1. SELECT * FROM hive_catalog.ns.foo;

Taking INSERT as a example,

  1. INSERT INTO hive_catalog.ns.foo VALUES (1, 'a'), (2, 'b'), (3, 'c');

Taking DROP TABLE as a example,

  1. DROP TABLE hive_catalog.ns.foo;

Taking DROP NAMESPACE as a example,

  1. DROP NAMESPACE hive_catalog.ns;