The Kyuubi Hive Connector is a datasource for both reading and writing Hive table, It is implemented based on Spark DataSource V2, and supports concatenating multiple Hive metastore at the same time.
This connector can be used to federate queries of multiple hives warehouse in a single Spark cluster.
Hive Connector Integration
To enable the integration of kyuubi spark sql engine and Hive connector through Apache Spark Datasource V2 and Catalog APIs, you need to:
- Referencing the Hive connector dependencies
- Setting the spark extension and catalog configurations
Dependencies
The classpath of kyuubi spark sql engine with Hive connector supported consists of
- kyuubi-spark-connector-hive_2.12-1.9.1, the hive connector jar deployed with Kyuubi distributions
- a copy of spark distribution
In order to make the Hive connector packages visible for the runtime classpath of engines, we can use one of these methods:
- Put the Kyuubi Hive connector packages into
$SPARK_HOME/jarsdirectly - Set
spark.jars=/path/to/kyuubi-hive-connector
Configurations
To activate functionality of Kyuubi Hive connector, we can set the following configurations:
spark.sql.catalog.hive_catalog org.apache.kyuubi.spark.connector.hive.HiveTableCatalogspark.sql.catalog.hive_catalog.spark.sql.hive.metastore.version hive-metastore-versionspark.sql.catalog.hive_catalog.hive.metastore.uris thrift://metastore-host:portspark.sql.catalog.hive_catalog.hive.metastore.port portspark.sql.catalog.hive_catalog.spark.sql.hive.metastore.jars pathspark.sql.catalog.hive_catalog.spark.sql.hive.metastore.jars.path file:///opt/hive1/lib/*.jar
For details about the multi-version Hive configuration, see the related multi-version Hive configurations supported by Apache Spark.
Hive Connector Operations
Taking CREATE NAMESPACE as a example,
CREATE NAMESPACE ns;
Taking CREATE TABLE as a example,
CREATE TABLE hive_catalog.ns.foo (id bigint COMMENT 'unique id',data string)USING parquet;
Taking SELECT as a example,
SELECT * FROM hive_catalog.ns.foo;
Taking INSERT as a example,
INSERT INTO hive_catalog.ns.foo VALUES (1, 'a'), (2, 'b'), (3, 'c');
Taking DROP TABLE as a example,
DROP TABLE hive_catalog.ns.foo;
Taking DROP NAMESPACE as a example,
DROP NAMESPACE hive_catalog.ns;
