Connectors for Spark SQL Query Engine - Hive - 《Apache Kyuubi 1.9.1》

Hive Connector Integration
- Dependencies
- Configurations
Hive Connector Operations

The Kyuubi Hive Connector is a datasource for both reading and writing Hive table, It is implemented based on Spark DataSource V2, and supports concatenating multiple Hive metastore at the same time.

This connector can be used to federate queries of multiple hives warehouse in a single Spark cluster.

Hive Connector Integration

To enable the integration of kyuubi spark sql engine and Hive connector through Apache Spark Datasource V2 and Catalog APIs, you need to:

Referencing the Hive connector dependencies
Setting the spark extension and catalog configurations

Dependencies

The classpath of kyuubi spark sql engine with Hive connector supported consists of

kyuubi-spark-connector-hive_2.12-1.9.1, the hive connector jar deployed with Kyuubi distributions
a copy of spark distribution

In order to make the Hive connector packages visible for the runtime classpath of engines, we can use one of these methods:

Put the Kyuubi Hive connector packages into $SPARK_HOME/jars directly
Set spark.jars=/path/to/kyuubi-hive-connector

Configurations

To activate functionality of Kyuubi Hive connector, we can set the following configurations:

spark.sql.catalog.hive_catalog     org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.version     hive-metastore-version
spark.sql.catalog.hive_catalog.hive.metastore.uris     thrift://metastore-host:port
spark.sql.catalog.hive_catalog.hive.metastore.port     port
spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.jars     path
spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.jars.path     file:///opt/hive1/lib/*.jar

For details about the multi-version Hive configuration, see the related multi-version Hive configurations supported by Apache Spark.

Hive Connector Operations

Taking CREATE NAMESPACE as a example,

CREATE NAMESPACE ns;

Taking CREATE TABLE as a example,

CREATE TABLE hive_catalog.ns.foo (
  id bigint COMMENT 'unique id',
  data string)
USING parquet;

Taking SELECT as a example,

SELECT * FROM hive_catalog.ns.foo;

Taking INSERT as a example,

INSERT INTO hive_catalog.ns.foo VALUES (1, 'a'), (2, 'b'), (3, 'c');

Taking DROP TABLE as a example,

DROP TABLE hive_catalog.ns.foo;

Taking DROP NAMESPACE as a example,

DROP NAMESPACE hive_catalog.ns;