New in version 1.4.0: Kyuubi community maintains a forked Hive JDBC driver module and provides both shaded and non-shaded packages.
This packages aims to support some missing functionalities of the original Hive JDBC driver. For Kyuubi engines that support multiple catalogs, it provides meta APIs for better support. The behaviors of the original Hive JDBC driver have remained.
To access a Hive data warehouse or new Lakehouse formats, such as Apache Iceberg/Hudi, Delta Lake using the Kyuubi JDBC driver for Apache kyuubi, you need to configure the following:
- The list of driver library files - Referencing the JDBC Driver Libraries.
- The Driver or DataSource class - Registering the Driver Class.
- The connection URL for the driver - Building the Connection URL
Referencing the JDBC Driver Libraries
Before you use the jdbc driver for Apache Kyuubi, the JDBC application or Java code that you are using to connect to your data must be able to access the driver JAR files.
Using the Driver in Java Code
In the code, specify the artifact kyuubi-hive-jdbc-shaded from Maven Central according to the build tool you use.
Maven
<dependency>
<groupId>org.apache.kyuubi</groupId>
<artifactId>kyuubi-hive-jdbc-shaded</artifactId>
<version>1.9.1</version>
</dependency>
sbt
libraryDependencies += "org.apache.kyuubi" % "kyuubi-hive-jdbc-shaded" % "1.9.1"
Gradle
implementation group: 'org.apache.kyuubi', name: 'kyuubi-hive-jdbc-shaded', version: '1.9.1'
Using the Driver in a JDBC Application
For JDBC Applications, such as BI tools, SQL IDEs, please check the specific guide for detailed information.
Is your favorite tool missing? Report an feature request or help us document it.
Registering the Driver Class
Before connecting to your data, you must register the JDBC Driver class for your application.
- org.apache.kyuubi.jdbc.KyuubiHiveDriver
- org.apache.kyuubi.jdbc.KyuubiDriver (Deprecated)
The following sample code shows how to use the java.sql.DriverManager class to establish a connection for JDBC:
private static Connection newKyuubiConnection() throws Exception {
Connection connection = DriverManager.getConnection(CONNECTION_URL);
return connection;
}
Building the Connection URL
Basic Connection URL format
Use the connection URL to supply connection information to the kyuubi server or cluster that you are accessing. The following is the format of the connection URL for the Kyuubi Hive JDBC Driver
jdbc:subprotocol://host:port[/catalog]/[schema];<clientProperties;><[#|?]sessionProperties>
- subprotocol: kyuubi or hive2
- host: DNS or IP address of the kyuubi server
- port: The number of the TCP port that the server uses to listen for client requests
- catalog: Optional catalog name to set the current catalog to run the query against.
- schema: Optional database name to set the current database to run the query against, use default if absent.
- clientProperties: Optional semicolon(;) separated key=value parameters identified and affect the client behavior locally. e.g., user=foo;password=bar.
- sessionProperties: Optional semicolon(;) separated key=value parameters used to configure the session, operation or background engines. For instance, kyuubi.engine.share.level=CONNECTION determines the background engine instance is used only by the current connection. spark.ui.enabled=false disables the Spark UI of the engine.
Important
- The sessionProperties MUST come after a leading number sign(#) or question mark (?).
- Properties are case-sensitive
- Do not duplicate properties in the connection URL
Connection URL over HTTP
New in version 1.6.0.
jdbc:subprotocol://host:port/schema;transportMode=http;httpPath=<http_endpoint>
- http_endpoint is the corresponding HTTP endpoint configured by
kyuubi.frontend.thrift.http.path
at the server side.
Connection URL over Service Discovery
jdbc:subprotocol://<zookeeper quorum>/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=kyuubi
- zookeeper quorum is the corresponding zookeeper cluster configured by
kyuubi.ha.addresses
at the server side. - zooKeeperNamespace is the corresponding namespace configured by
kyuubi.ha.namespace
at the server side.
HiveServer2 Compatibility
New in version 1.8.0.
JDBC Drivers need to negotiate a protocol version with Kyuubi Server/HiveServer2 when connecting.
Kyuubi Hive JDBC Driver offers protocol version v10 (clientProtocolVersion=9
, supported since Hive 2.3.0) to server by default.
If you need to connect to HiveServer2 before 2.3.0, please set client property clientProtocolVersion to a lower number.
jdbc:subprotocol://host:port[/catalog]/[schema];clientProtocolVersion=9;
All supported protocol versions and corresponding Hive versions can be found in TProtocolVersion.java and its git commits.
Kerberos Authentication
Since 1.6.0, Kyuubi JDBC driver implements the Kerberos authentication based on JAAS framework instead of Hadoop UserGroupInformation, which means it does not forcibly rely on Hadoop dependencies to connect a kerberized Kyuubi Server.
Kyuubi JDBC driver supports different approaches to connect a kerberized Kyuubi Server. First of all, please follow the krb5.conf instruction to setup krb5.conf
properly.
Authentication by Principal and Keytab
New in version 1.6.0.
It’s the simplest way w/ minimal setup requirements for Kerberos authentication.
It’s straightforward to use principal and keytab for Kerberos authentication, just simply configure them in the JDBC URL.
jdbc:kyuubi://host:port/schema;kyuubiClientPrincipal=<clientPrincipal>;kyuubiClientKeytab=<clientKeytab>;kyuubiServerPrincipal=<serverPrincipal>
- kyuubiClientPrincipal: Kerberos
principal
for client authentication - kyuubiClientKeytab: path of Kerberos
keytab
file for client authentication - kyuubiClientTicketCache: path of Kerberos
ticketCache
file for client authentication, available since 1.8.0. - kyuubiServerPrincipal: Kerberos
principal
configured by kyuubi.kinit.principal at the server side.kyuubiServerPrincipal
is available as an alias ofprincipal
since 1.7.0, useprincipal
for previous versions.
Authentication by Principal and TGT Cache
Another typical usage of Kerberos authentication is using kinit to generate the TGT cache first, then the application does Kerberos authentication through the TGT cache.
jdbc:kyuubi://host:port/schema;kyuubiServerPrincipal=<serverPrincipal>
Authentication by Hadoop UserGroupInformation doAs
(programing only)
This approach allows project which already uses Hadoop UserGroupInformation for Kerberos authentication to easily connect the kerberized Kyuubi Server. This approach does not work between [1.6.0, 1.7.0], and got fixed in 1.7.1.
String jdbcUrl = "jdbc:kyuubi://host:port/schema;kyuubiServerPrincipal=<serverPrincipal>"
UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytab(clientPrincipal, clientKeytab);
ugi.doAs((PrivilegedExceptionAction<String>) () -> {
Connection conn = DriverManager.getConnection(jdbcUrl);
...
});
Authentication by Subject (programing only)
String jdbcUrl = "jdbc:kyuubi://host:port/schema;kyuubiServerPrincipal=<serverPrincipal>;kerberosAuthType=fromSubject"
Subject kerberizedSubject = ...;
Subject.doAs(kerberizedSubject, (PrivilegedExceptionAction<String>) () -> {
Connection conn = DriverManager.getConnection(jdbcUrl);
...
});