Architecture Supported table formats Supported engines Iceberg format Paimon format Mixed format User cases Self-managed streaming Lakehouse Stream-and-batch-fused data pipe...
Deployment models with supported concurrency controls Model A: Single writer with inline table services Single Writer Guarantees Model B: Single writer with async table services ...
When is Hudi useful for me or my organization? What are some non-goals for Hudi? What is incremental processing? Why does Hudi docs/talks keep talking about it? How is Hudi opti...
Writing Tables org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: Avro field ‘col1’ not found java.lang.UnsupportedOperationException: org.apache.parquet....
Background How is compaction different from clustering? Clustering Architecture Overall, there are 2 steps to clustering Schedule clustering Execute clustering Clustering Use...
Preparation when using Flink SQL Client Flink’s Python API Adding catalogs. Catalog Configuration Hive catalog Creating a table Writing Branch Writes Reading Type conversi...
http asynchronous write Write with Apache StreamPark™ http asynchronous write support type Configuration list of HTTP asynchronous write HTTP writes data asynchronously Other ...
Indexing Multi-modal Indexing Index Types in Hudi Global and Non-Global Indexes Configs Spark based configs Flink based configs Indexing Strategies Workload 1: Late arriving...