Step 1: Deployment SeaTunnel And Connectors

Before starting, make sure you have downloaded and deployed SeaTunnel as described in deployment

Step 2: Deployment And Config Spark

Please download Spark first(required version >= 2.4.0). For more information you could see Getting Started: standalone

Configure SeaTunnel: Change the setting in config/seatunnel-env.sh, it is base on the path your engine install at deployment. Change SPARK_HOME to the Spark deployment dir.

Step 3: Add Job Config File to define a job

Edit config/seatunnel.streaming.conf.template, which determines the way and logic of data input, processing, and output after seatunnel is started. The following is an example of the configuration file, which is the same as the example application mentioned above.

  1. env {
  2. parallelism = 1
  3. job.mode = "BATCH"
  4. }
  5. source {
  6. FakeSource {
  7. result_table_name = "fake"
  8. row.num = 16
  9. schema = {
  10. fields {
  11. name = "string"
  12. age = "int"
  13. }
  14. }
  15. }
  16. }
  17. transform {
  18. FieldMapper {
  19. source_table_name = "fake"
  20. result_table_name = "fake1"
  21. field_mapper = {
  22. age = age
  23. name = new_name
  24. }
  25. }
  26. }
  27. sink {
  28. Console {
  29. source_table_name = "fake1"
  30. }
  31. }

More information about config please check config concept

Step 4: Run SeaTunnel Application

You could start the application by the following commands

spark 2.4.x

  1. cd "apache-seatunnel-${version}"
  2. ./bin/start-seatunnel-spark-2-connector-v2.sh \
  3. --master local[4] \
  4. --deploy-mode client \
  5. --config ./config/v2.streaming.conf.template

spark3.x.x

  1. cd "apache-seatunnel-${version}"
  2. ./bin/start-seatunnel-spark-3-connector-v2.sh \
  3. --master local[4] \
  4. --deploy-mode client \
  5. --config ./config/v2.streaming.conf.template

See The Output: When you run the command, you could see its output in your console. You can think this is a sign that the command ran successfully or not.

The SeaTunnel console will prints some logs as below:

  1. fields : name, age
  2. types : STRING, INT
  3. row=1 : elWaB, 1984352560
  4. row=2 : uAtnp, 762961563
  5. row=3 : TQEIB, 2042675010
  6. row=4 : DcFjo, 593971283
  7. row=5 : SenEb, 2099913608
  8. row=6 : DHjkg, 1928005856
  9. row=7 : eScCM, 526029657
  10. row=8 : sgOeE, 600878991
  11. row=9 : gwdvw, 1951126920
  12. row=10 : nSiKE, 488708928
  13. row=11 : xubpl, 1420202810
  14. row=12 : rHZqb, 331185742
  15. row=13 : rciGD, 1112878259
  16. row=14 : qLhdI, 1457046294
  17. row=15 : ZTkRx, 1240668386
  18. row=16 : SGZCr, 94186144

What’s More

For now, you are already take a quick look about SeaTunnel with Spark, you could see connector to find all source and sink SeaTunnel supported. Or see SeaTunnel With Spark if you want to know more about SeaTunnel Run With Spark.

SeaTunnel have an own engine named Zeta and Zeta is the default engine of SeaTunnel. You can follow Quick Start to configure and run a data synchronization job.