Features and Limitations - 《Apache XTable 0.1.0-beta1》

Features
Limitations and Compatibility Notes

Features

OneTable provides users with the ability to translate metadata from one table format to another.

OneTable provides two sync modes, “incremental” and “full.” The incremental mode is more lightweight and has better performance, especially on large tables. If there is anything that prevents the incremental mode from working properly, the tool will fall back to the full sync mode.

This sync provides users with the following:

Syncing of data files along with their column level statistics and partition metadata
Schema updates in the source are reflected in the target table metadata
Metadata maintenance for the target table formats.
- For Hudi, unreferenced files will be marked as cleaned to control the size of the metadata table.
- For Iceberg, snapshots will be expired after a configured amount of time.
- For Delta, the transaction log will be retained for a configured amount of time.

Limitations and Compatibility Notes

General

Only Copy-on-Write or Read-Optimized views of tables are currently supported. This means that only the underlying parquet files are synced but log files from Hudi and delete vectors from Delta and Iceberg are not captured by the sync.

Hudi

Hudi 0.14.0 is required when reading a Hudi target table. Users will also need to enable the metadata table (hoodie.metadata.enable=true) when reading the data.
Be sure to enable parquet.avro.write-old-list-structure=false for proper compatibility with lists when syncing from Hudi to Iceberg.
When using Hudi as the source for an Iceberg target, you may require field IDs set in the parquet schema. To enable that, follow the instructions here.

Delta

When using Delta as the source for an Iceberg target, you may require field IDs set in the parquet schema. To enable that, follow the instructions for enabling column mapping here.
When Delta is the source, Generated Columns are not synced to the target schema. For tables that are partitioned on Generated Columns, there is limited support. For example, we support date functions like transforming a timestamp to yyyy-MM-dd format. Please file a GitHub issue or pull-request for any cases that you think should be supported.