Platform Installation Introduction

Purpose and Scope

This document aims to assist users in installing and initially configuring Apache StreamPark.

Target Audience

Intended for system developers and operators who need to deploy Apache StreamPark in their systems.

System Requirements

Reference: https://streampark.apache.org/docs/user-guide/deployment#environmental-requirements

Hardware Requirements

  • This document uses Linux: 3.10.0-957.el7.x86_6

1_hardware_requirement

Software Requirements

Notes:

  1. For installing StreamPark alone, Hadoop can be ignored.
  2. If using yarn application mode for executing Flink jobs, Hadoop is required.
    • JDK : 1.8+
    • MySQL : 5.6+
    • Flink : 1.12.0+
    • Hadoop : 2.7.0+
    • StreamPark : 2.0.0+

Software versions used in this document:

  • JDK: 1.8.0_181
  • MySQL: 5.7.26
  • Flink : 1.14.3-scala_2.12
  • Hadoop : 3.2.1

Main component dependencies: 2_main_components_dep

Pre-installation Preparation

JDK, MYSQL, HADOOP need to be installed by users themselves.

Download Flink

  1. cd /usr/local
  2. wget https://archive.apache.org/dist/flink/flink-1.14.3/flink-1.14.3-bin-scala_2.12.tgz

Unzip

  1. tar -zxvf flink-1.14.3-bin-scala_2.12.tgz

Rename

  1. mv flink-1.14.3 flink

Configure Flink environment variables

  1. # Set environment variables (vim ~/.bashrc), add the following content
  2. export FLINK_HOME=/usr/local/flink
  3. export PATH=$FLINK_HOME/bin:$PATH
  4. # Apply environment variable configuration
  5. source ~/.bashrc
  6. # Test (If it shows: 'Version: 1.14.3, Commit ID: 98997ea', it means configuration is successful)
  7. flink -v

3_flink_home

Introduce MySQL Dependency Package

Reason: Due to incompatibility between Apache 2.0 license and Mysql Jdbc driver license, users need to download the driver jar package themselves and place it in $STREAMPARK_HOME/lib, 8.x version recommended. Driver package version: mysql-connector-java-8.0.28.jar

  1. cp mysql-connector-java-8.0.28.jar /usr/local/streampark/lib

4_mysql_dep

Download Apache StreamPark™

Download URL: https://dlcdn.apache.org/incubator/streampark/2.0.0/apache-streampark_2.12-2.0.0-incubating-bin.tar.gz

Upload apache-streampark_2.12-2.0.0-incubating-bin.tar.gz to the server /usr/local path

5_streampark_install_pkg

Unzip

  1. tar -zxvf apache-streampark_2.12-2.0.0-incubating-bin.tar.gz

6_unpkg

Installation

Initialize System Data

Purpose: Create databases (tables) dependent on StreamPark component deployment, and pre-initialize the data required for its operation (e.g., web page menus, user information), to facilitate subsequent operations.

View Execution of StreamPark Metadata SQL File

Explanation:

  • StreamPark supports MySQL, PostgreSQL, H2
  • This document uses MySQL as an example; the PostgreSQL process is basically the same

Database creation script: /usr/local/apache-st

reampark_2.12-2.0.0-incubating-bin/script/schema/mysql-schema.sql

7_mysql_schema_file

Database creation script: /usr/local/apache-streampark_2.12-2.0.0-incubating-bin/script/data/mysql-data.sql

8_mysql_data_file

Connect to MySQL Database & Execute Initialization Script

  1. source /usr/local/apache-streampark_2.12-2.0.0-incubating-bin/script/schema/mysql-schema.sql

9_import_streampark_schema_file

  1. source source /usr/local/apache-streampark_2.12-2.0.0-incubating-bin/script/data/mysql-data.sql

10_import_streampark_data_file

View Execution Results

  1. show databases;

11_show_streampark_database

  1. use streampark;

12_use_streampark_db

  1. show tables;

13_show_streampark_db_tables

Apache StreamPark™ Configuration

Purpose: Configure the data sources needed for startup. Configuration file location: /usr/local/streampark/conf

14_streampark_conf_files

Configure MySQL Data Source

  1. vim application-mysql.yml

The database IP/port in username, password, url need to be changed to the user’s own environment information

  1. spring:
  2. datasource:
  3. username: Database username
  4. password: Database user password
  5. driver-class-name: com.mysql.cj.jdbc.Driver
  6. url: jdbc:mysql://Database IP address:Database port number/streampark?useSSL=false&useUnicode=true&characterEncoding=UTF-8&allowPublicKeyRetrieval=false&useJDBCCompliantTimezoneShift=true&useLegacyDatetimeCode=false&serverTimezone=GMT%2B8

Configure Application Port, HDFS Storage, Application Access Password, etc.

  1. vim application.yml

Key configuration items:

  1. server.port # 【Important】Default web access port 10000, can be changed if there is a conflict (e.g., hive service)
  2. knife4j.basic.enable # true means allowing access to Swagger API page
  3. knife4j.basic.password # Password required for accessing Swagger API page, enhancing interface security
  4. spring.profiles.activemysql # 【Important】Indicates which data source the system uses, this document uses mysql
  5. workspace.remote # Configure workspace information
  6. hadoop-user-name # If using hadoop, this user needs to have permission to operate hdfs, otherwise an “org.apache.hadoop.security.AccessControlException: Permission denied” exception will be reported
  7. ldap.password # The system login page offers two login modes: User password and ldap. Here you can configure ldap password

Main configuration example:

15_application_yml_server_port

If the flink job jar is too large, it may fail to upload, so consider modifying (max-file-size and max-request-size); of course, other factors in the actual environment should be considered: nginx restrictions, etc.

16_application_yml_spring_profile_active

Supports Knox configuration, some users have privately deployed Hadoop environments, accessible through Knox workspace: Configure workspace information (e.g., savepoint and checkpoint storage paths)

17_application_yml_streampark_workspace

ldap

18_application_yml_ldap

【Optional】Configuring Kerberos

Background: Enterprise-level Hadoop cluster environments have set security access mechanisms, such as Kerberos. StreamPark can also be configured with Kerberos, allowing Flink to authenticate through Kerberos and submit jobs to the Hadoop cluster.

Modifications are as follows:

  1. security.kerberos.login.enable=true
  2. security.kerberos.login.principal=Actual principal
  3. security.kerberos.login.krb5=/etc/krb5.conf
  4. security.kerberos.login.keytab=Actual keytab file
  5. java.security.krb5.conf=/etc/krb5.conf

19_kerberos_yml_config

Starting Apache StreamPark™

Enter the Apache StreamPark™ Installation Path on the Server

  1. cd /usr/local/streampark/

20_enter_streampark_dir

Start the Apache StreamPark™ Service

  1. ./bin/startup.sh

21_start_streampark_service

Check the startup logs Purpose: To confirm there are no error messages

  1. tail -100f log/streampark.out

22_check_service_starting_log

Verifying the Installation

  1. # If the page opens normally, it indicates a successful deployment.
  2. http://Deployed streampark service IP or domain:10000/
  3. admin/streampark

Normal Access to the Page

23_visit_streampark_web

System Logs in Normally

24_streampark_web_index_page

Common Issues

Cannot load driver class: com.mysql.cj.jdbc.Driver

Reason: Missing MySQL driver package, refer to “3.2. Introducing MySQL Dependency Package”

25_lack_mysql_driver_err

Reference Resources

https://streampark.apache.org/docs/user-guide/deployment/