If you are deploying Kyuubi with a kerberized Hadoop cluster, it is strongly recommended that kyuubi.authentication should be set to KERBEROS too.

Kerberos Overview

Kerberos is a network authentication protocol that provides the tools of authentication and strong cryptography over the network. The Kerberos protocol uses strong cryptography so that a client or a server can prove its identity to its server or client across an insecure network connection. After a client and server have used Kerberos to prove their identity, they can also encrypt all of their communications to assure privacy and data integrity as they go about their business.

The Kerberos architecture is centered around a trusted authentication service called the key distribution center, or KDC. Users and services in a Kerberos environment are referred to as principals; each principal shares a secret, such as a password, with the KDC.

Enable Kerberos Authentication

To enable the Kerberos authentication method, we need to

Create a Kerberos principal and keytab

You can use the following commands in a Linux-based Kerberos environment to set up the identity and update the keytab file:

The kyuubi.keytab file must be owned and readable by the Linux login user.

  1. # kadmin
  2. : addprinc -randkey superuser/FQDN@REALM
  3. : ktadd -k ./kyuubi.keytab superuser/FQDN@REALM

A widespread use case of kyuubi is to replace HiveServer2/Hive QL with Kyuubi/Spark SQL. If an existing HiveServer2 environment is already there, copying the environment and reusing the keytab and principal of HiveServer2 is a convenient way.

Enable Hadoop Impersonation#

If background cluster is also an kerberized Hadoop cluster, we need to enable the impersonation capability of the superuser we use to start kyuubi server.

You can configure proxy user using properties hadoop.proxyuser.$superuser.hosts along with either or both of hadoop.proxyuser.$superuser.groups and hadoop.proxyuser.$superuser.users.

For instance, by specifying as below in core-site.xml, the superuser named admin can connect only from host1 and host2 to impersonate a user belonging to group1 and group2.

  1. <property>
  2. <name>hadoop.proxyuser.admin.hosts</name>
  3. <value>host1,host2</value>
  4. </property>
  5. <property>
  6. <name>hadoop.proxyuser.admin.groups</name>
  7. <value>group1,group2</value>
  8. </property>

Here,

  • admin is the principal(short name) used to start kyuubi servers
  • host1 and host2 are node addresses of kyuubi servers
  • group1 and group2 are groups of client users

These configurations need to be configured in the Hadoop cluster and refreshed to take effect.

If you are using the keytab of existing HiveServer2, this step can also be omitted

Configure the authentication properties

Configure the following properties to $KYUUBI_HOME/conf/kyuubi-defaults.conf on each node where kyuubi server is installed.

  1. kyuubi.authentication=KERBEROS
  2. kyuubi.kinit.principal=superuser/FQDN@REALM
  3. kyuubi.kinit.keytab=/path/to/kyuubi.keytab

These configurations also need to be set to enable KERBEROS authentication.

Refresh all the kyuubi server instances

Restart all the kyuubi server instances or Refresh Configurations to activate the settings.