To read a OneTable synced target table (regardless of the table format) in Amazon Redshift, users have to create an external schema and refer to the external data catalog that contains the table. Redshift infers the table’s schema and format from the external catalog/database directly. For more information on creating external schemas, refer to Redshift docs.
Hudi and Iceberg tables
The following query creates an external schema onetable_synced_schema
using the Glue database glue_database_name
CREATE EXTERNAL SCHEMA onetable_synced_schema
FROM DATA CATALOG
DATABASE <glue_database_name>
IAM_ROLE 'arn:aws:iam::<accountId>:role/<roleName>'
CREATE EXTERNAL DATABASE IF NOT EXISTS;
The IAM role needs to have minimum access to Amazon S3 and AWS Glue Data Catalog. For more information refer to AWS docs.
Redshift can infer the tables present in the Glue database automatically. You can then query the tables using:
SELECT *
FROM onetable_synced_schema.<table_name>;
Delta Lake table
For Delta Lake, steps slightly vary because Redshift Spectrum relies on Delta Lake’s manifest file - a text file containing the list of data files to read for querying a Delta table.
You have two options to create and query Delta tables in Redshift Spectrum:
- Follow the steps in this article to set up a Redshift Spectrum to Delta Lake integration and query Delta tables directly from Amazon S3.
- While creating the Glue Crawler to crawl the OneTable synced Delta table, choose the
Create Symlink tables
option inAdd data source
pop-up window. This will add_symlink_format_manifest
folder with manifest files in the table root path.
You can then use a similar approach to query the Hudi and Iceberg tables mentioned above.
CREATE EXTERNAL SCHEMA onetable_synced_schema_delta
FROM DATA CATALOG
DATABASE <delta_glue_database_name>
IAM_ROLE 'arn:aws:iam::<accountId>:role/<roleName>'
CREATE EXTERNAL DATABASE IF NOT EXISTS;
SELECT *
FROM onetable_synced_schema_delta.<table_name>;