Querying with SQL

Querying Mixed-Format table by merge on read

Using Select statement to query on Mixed-Format tables.

  1. SELECT * FROM arctic_catalog.db.sample

The Mixed-Format connector will merge the data from BaseStore and ChangeStore.

Query on change store

For a Mixed-Format table with primary keys. you can query on ChangeStore by .change.

  1. SELECT * FROM arctic_catalog.db.sample.change
  2. +---+----+----+---------------+------------+--------------+
  3. | id|name|data|_transaction_id|_file_offset|_change_action|
  4. +---+----+----+---------------+------------+--------------+
  5. | 1|dddd|abcd| 3| 1| INSERT|
  6. | 1|dddd|abcd| 3| 2| DELETE|
  7. +---+----+----+---------------+------------+--------------+

The addition columns are:

  • _transaction_id: The transaction ID allocated by AMS during data write is assigned per SQL execution in batch mode and per checkpoint in streaming mode.
  • _file_offset:The order of data written with the same _transaction_id.
  • _change_action:The type of change record, INSERT or DELETE.

Querying with DataFrames

You can read the Mixed-Format table by Spark DataFrames:

  1. val df = spark.read.table("arctic_catalog.db.sample")
  2. df.count

And visit the ChangeStore by .change.

  1. val df = spark.read.table("arctic_catalog.db.sample.change")
  2. df.count