The Kyuubi Hive Connector is a datasource for both reading and writing Hive table, It is implemented based on Spark DataSource V2, and supports concatenating multiple Hive metastore at the same time.
This connector can be used to federate queries of multiple hives warehouse in a single Spark cluster.
Hive Connector Integration
To enable the integration of kyuubi spark sql engine and Hive connector through Apache Spark Datasource V2 and Catalog APIs, you need to:
- Referencing the Hive connector dependencies
- Setting the spark extension and catalog configurations
Dependencies
The classpath of kyuubi spark sql engine with Hive connector supported consists of
- kyuubi-spark-connector-hive_2.12-1.9.1, the hive connector jar deployed with Kyuubi distributions
- a copy of spark distribution
In order to make the Hive connector packages visible for the runtime classpath of engines, we can use one of these methods:
- Put the Kyuubi Hive connector packages into
$SPARK_HOME/jars
directly - Set
spark.jars=/path/to/kyuubi-hive-connector
Configurations
To activate functionality of Kyuubi Hive connector, we can set the following configurations:
spark.sql.catalog.hive_catalog org.apache.kyuubi.spark.connector.hive.HiveTableCatalog
spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.version hive-metastore-version
spark.sql.catalog.hive_catalog.hive.metastore.uris thrift://metastore-host:port
spark.sql.catalog.hive_catalog.hive.metastore.port port
spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.jars path
spark.sql.catalog.hive_catalog.spark.sql.hive.metastore.jars.path file:///opt/hive1/lib/*.jar
For details about the multi-version Hive configuration, see the related multi-version Hive configurations supported by Apache Spark.
Hive Connector Operations
Taking CREATE NAMESPACE
as a example,
CREATE NAMESPACE ns;
Taking CREATE TABLE
as a example,
CREATE TABLE hive_catalog.ns.foo (
id bigint COMMENT 'unique id',
data string)
USING parquet;
Taking SELECT
as a example,
SELECT * FROM hive_catalog.ns.foo;
Taking INSERT
as a example,
INSERT INTO hive_catalog.ns.foo VALUES (1, 'a'), (2, 'b'), (3, 'c');
Taking DROP TABLE
as a example,
DROP TABLE hive_catalog.ns.foo;
Taking DROP NAMESPACE
as a example,
DROP NAMESPACE hive_catalog.ns;