Spark DataSource API
The hudi-spark module offers the DataSource API to read a Hudi table into a Spark DataFrame.
A time-travel query example:
val tripsDF = spark.read.option("as.of.instant", "2021-07-28 14:11:08.000").format("hudi").load(basePath)tripsDF.where(tripsDF.fare > 20.0).show()
Daft
Daft supports reading Hudi tables using daft.read_hudi() function.
# Read Apache Hudi table into a Daft DataFrame.import daftdf = daft.read_hudi("some-table-uri")df = df.where(df["foo"] > 5)df.show()
Check out the Daft docs for Hudi integration.
