For Hudi storage on GCS, regional buckets provide an DFS API with strong consistency.

GCS Configs

There are two configurations required for Hudi GCS compatibility:

  • Adding GCS Credentials for Hudi
  • Adding required jars to classpath

GCS Credentials

Add the required configs in your core-site.xml from where Hudi can fetch them. Replace the fs.defaultFS with your GCS bucket name and Hudi should be able to read/write from the bucket.

  1. <property>
  2. <name>fs.defaultFS</name>
  3. <value>gs://hudi-bucket</value>
  4. </property>
  5. <property>
  6. <name>fs.gs.impl</name>
  7. <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem</value>
  8. <description>The FileSystem for gs: (GCS) uris.</description>
  9. </property>
  10. <property>
  11. <name>fs.AbstractFileSystem.gs.impl</name>
  12. <value>com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS</value>
  13. <description>The AbstractFileSystem for gs: (GCS) uris.</description>
  14. </property>
  15. <property>
  16. <name>fs.gs.project.id</name>
  17. <value>GCS_PROJECT_ID</value>
  18. </property>
  19. <property>
  20. <name>google.cloud.auth.service.account.enable</name>
  21. <value>true</value>
  22. </property>
  23. <property>
  24. <name>google.cloud.auth.service.account.email</name>
  25. <value>GCS_SERVICE_ACCOUNT_EMAIL</value>
  26. </property>
  27. <property>
  28. <name>google.cloud.auth.service.account.keyfile</name>
  29. <value>GCS_SERVICE_ACCOUNT_KEYFILE</value>
  30. </property>

GCS Libs

GCS hadoop libraries to add to our classpath

  • com.google.cloud.bigdataoss:gcs-connector:1.6.0-hadoop2