CREATE Catalog

Hive catalog

This creates an Iceberg catalog named hive_catalog that can be configured using 'catalog-type'='hive', which loads tables from Hive metastore:

  1. CREATE CATALOG hive_catalog WITH (
  2. 'type'='iceberg',
  3. 'catalog-type'='hive',
  4. 'uri'='thrift://localhost:9083',
  5. 'clients'='5',
  6. 'property-version'='1',
  7. 'warehouse'='hdfs://nn:8020/warehouse/path'
  8. );

The following properties can be set if using the Hive catalog:

  • uri: The Hive metastore’s thrift URI. (Required)
  • clients: The Hive metastore client pool size, default value is 2. (Optional)
  • warehouse: The Hive warehouse location, users should specify this path if neither set the hive-conf-dir to specify a location containing a hive-site.xml configuration file nor add a correct hive-site.xml to classpath.
  • hive-conf-dir: Path to a directory containing a hive-site.xml configuration file which will be used to provide custom Hive configuration values. The value of hive.metastore.warehouse.dir from <hive-conf-dir>/hive-site.xml (or hive configure file from classpath) will be overwritten with the warehouse value if setting both hive-conf-dir and warehouse when creating iceberg catalog.
  • hadoop-conf-dir: Path to a directory containing core-site.xml and hdfs-site.xml configuration files which will be used to provide custom Hadoop configuration values.

Hadoop catalog

Iceberg also supports a directory-based catalog in HDFS that can be configured using 'catalog-type'='hadoop':

  1. CREATE CATALOG hadoop_catalog WITH (
  2. 'type'='iceberg',
  3. 'catalog-type'='hadoop',
  4. 'warehouse'='hdfs://nn:8020/warehouse/path',
  5. 'property-version'='1'
  6. );

The following properties can be set if using the Hadoop catalog:

  • warehouse: The HDFS directory to store metadata files and data files. (Required)

Execute the sql command USE CATALOG hadoop_catalog to set the current catalog.

REST catalog

This creates an iceberg catalog named rest_catalog that can be configured using 'catalog-type'='rest', which loads tables from a REST catalog:

  1. CREATE CATALOG rest_catalog WITH (
  2. 'type'='iceberg',
  3. 'catalog-type'='rest',
  4. 'uri'='https://localhost/'
  5. The following properties can be set if using the REST catalog:
  6. * `uri`: The URL to the REST Catalog (Required)
  7. * `credential`: A credential to exchange for a token in the OAuth2 client credentials flow (Optional)
  8. * `token`: A token which will be used to interact with the server (Optional)
  9. #### Custom catalog
  10. Flink also supports loading a custom Iceberg `Catalog` implementation by specifying the `catalog-impl` property:
  11. ```sql
  12. CREATE CATALOG my_catalog WITH (
  13. 'type'='iceberg',
  14. 'catalog-impl'='com.my.custom.CatalogImpl',
  15. 'my-additional-catalog-config'='my-value'
  16. );

Create through YAML config

Catalogs can be registered in sql-client-defaults.yaml before starting the SQL client.

  1. catalogs:
  2. - name: my_catalog
  3. type: iceberg
  4. catalog-type: hadoop
  5. warehouse: hdfs://nn:8020/warehouse/path

Create through SQL Files

The Flink SQL Client supports the -i startup option to execute an initialization SQL file to set up environment when starting up the SQL Client.

  1. -- define available catalogs
  2. CREATE CATALOG hive_catalog WITH (
  3. 'type'='iceberg',
  4. 'catalog-type'='hive',
  5. 'uri'='thrift://localhost:9083',
  6. 'warehouse'='hdfs://nn:8020/warehouse/path'
  7. );
  8. USE CATALOG hive_catalog;

Using -i <init.sql> option to initialize SQL Client session:

  1. /path/to/bin/sql-client.sh -i /path/to/init.sql

CREATE DATABASE

By default, Iceberg will use the default database in Flink. Using the following example to create a separate database in order to avoid creating tables under the default database:

  1. CREATE DATABASE iceberg_db;
  2. USE iceberg_db;

CREATE TABLE

  1. CREATE TABLE `hive_catalog`.`default`.`sample` (
  2. id BIGINT COMMENT 'unique id',
  3. data STRING NOT NULL
  4. ) WITH ('format-version'='2');

Table create commands support the commonly used Flink create clauses including:

  • PARTITION BY (column1, column2, ...) to configure partitioning, Flink does not yet support hidden partitioning.
  • COMMENT 'table document' to set a table description.
  • WITH ('key'='value', ...) to set table configuration which will be stored in Iceberg table properties.

Currently, it does not support computed column and watermark definition etc.

PRIMARY KEY

Primary key constraint can be declared for a column or a set of columns, which must be unique and do not contain null. It’s required for UPSERT mode.

  1. CREATE TABLE `hive_catalog`.`default`.`sample` (
  2. id BIGINT COMMENT 'unique id',
  3. data STRING NOT NULL,
  4. PRIMARY KEY(`id`) NOT ENFORCED
  5. ) WITH ('format-version'='2');

PARTITIONED BY

To create a partition table, use PARTITIONED BY:

  1. CREATE TABLE `hive_catalog`.`default`.`sample` (
  2. id BIGINT COMMENT 'unique id',
  3. data STRING NOT NULL
  4. )
  5. PARTITIONED BY (data)
  6. WITH ('format-version'='2');

Iceberg supports hidden partitioning but Flink doesn’t support partitioning by a function on columns. There is no way to support hidden partitions in the Flink DDL.

CREATE TABLE LIKE

To create a table with the same schema, partitioning, and table properties as another table, use CREATE TABLE LIKE.

  1. CREATE TABLE `hive_catalog`.`default`.`sample` (
  2. id BIGINT COMMENT 'unique id',
  3. data STRING
  4. );
  5. CREATE TABLE `hive_catalog`.`default`.`sample_like` LIKE `hive_catalog`.`default`.`sample`;

For more details, refer to the Flink CREATE TABLE documentation.

ALTER TABLE

Iceberg only support altering table properties:

  1. ALTER TABLE `hive_catalog`.`default`.`sample` SET ('write.format.default'='avro');

ALTER TABLE .. RENAME TO

  1. ALTER TABLE `hive_catalog`.`default`.`sample` RENAME TO `hive_catalog`.`default`.`new_sample`;

DROP TABLE

To delete a table, run:

  1. DROP TABLE `hive_catalog`.`default`.`sample`;