Installing on Kubernetes

Running Superset on Kubernetes is supported with the provided Helm chart found in the official Superset helm repository.

Prerequisites

  • A Kubernetes cluster
  • Helm installed

Running

  1. Add the Superset helm repository
  1. helm repo add superset https://apache.github.io/superset
  2. "superset" has been added to your repositories
  1. View charts in repo
  1. helm search repo superset
  2. NAME CHART VERSION APP VERSION DESCRIPTION
  3. superset/superset 0.1.1 1.0 Apache Superset is a modern, enterprise-ready b...
  1. Configure your setting overrides

Just like any typical Helm chart, you’ll need to craft a values.yaml file that would define/override any of the values exposed into the default values.yaml, or from any of the dependent charts it depends on:

More info down below on some important overrides you might need.

  1. Install and run
  1. helm upgrade --install --values my-values.yaml superset superset/superset

You should see various pods popping up, such as:

  1. kubectl get pods
  2. NAME READY STATUS RESTARTS AGE
  3. superset-celerybeat-7cdcc9575f-k6xmc 1/1 Running 0 119s
  4. superset-f5c9c667-dw9lp 1/1 Running 0 4m7s
  5. superset-f5c9c667-fk8bk 1/1 Running 0 4m11s
  6. superset-init-db-zlm9z 0/1 Completed 0 111s
  7. superset-postgresql-0 1/1 Running 0 6d20h
  8. superset-redis-master-0 1/1 Running 0 6d20h
  9. superset-worker-75b48bbcc-jmmjr 1/1 Running 0 4m8s
  10. superset-worker-75b48bbcc-qrq49 1/1 Running 0 4m12s

The exact list will depend on some of your specific configuration overrides but you should generally expect:

  • N superset-xxxx-yyyy and superset-worker-xxxx-yyyy pods (depending on your supersetNode.replicaCount and supersetWorker.replicaCount values)
  • 1 superset-postgresql-0 depending on your postgres settings
  • 1 superset-redis-master-0 depending on your redis settings
  • 1 superset-celerybeat-xxxx-yyyy pod if you have supersetCeleryBeat.enabled = true in your values overrides
  1. Access it

The chart will publish appropriate services to expose the Superset UI internally within your k8s cluster. To access it externally you will have to either:

  • Configure the Service as a LoadBalancer or NodePort
  • Set up an Ingress for it - the chart includes a definition, but will need to be tuned to your needs (hostname, tls, annotations etc…)
  • Run kubectl port-forward superset-xxxx-yyyy :8088 to directly tunnel one pod’s port into your localhost

Depending how you configured external access, the URL will vary. Once you’ve identified the appropriate URL you can log in with:

  • user: admin
  • password: admin

Important settings

Security settings

Default security settings and passwords are included but you SHOULD override those with your own, in particular:

  1. postgresql:
  2. postgresqlPassword: superset

Make sure, you set a unique strong complex alphanumeric string for your SECRET_KEY and use a tool to help you generate a sufficiently random sequence.

  • To generate a good key you can run, openssl rand -base64 42
  1. configOverrides:
  2. secret: |
  3. SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'

If you want to change the previous secret key then you should rotate the keys. Default secret key for kubernetes deployment is thisISaSECRET_1234

  1. configOverrides:
  2. my_override: |
  3. PREVIOUS_SECRET_KEY = 'YOUR_PREVIOUS_SECRET_KEY'
  4. SECRET_KEY = 'YOUR_OWN_RANDOM_GENERATED_SECRET_KEY'
  5. init:
  6. command:
  7. - /bin/sh
  8. - -c
  9. - |
  10. . {{ .Values.configMountPath }}/superset_bootstrap.sh
  11. superset re-encrypt-secrets
  12. . {{ .Values.configMountPath }}/superset_init.sh

Superset uses Scarf Gateway to collect telemetry data. Knowing the installation counts for different Superset versions informs the project’s decisions about patching and long-term support. Scarf purges personally identifiable information (PII) and provides only aggregated statistics.

To opt-out of this data collection in your Helm-based installation, edit the repository: line in your helm/superset/values.yaml file, replacing apachesuperset.docker.scarf.sh/apache/superset with apache/superset to pull the image directly from Docker Hub.

Dependencies

Install additional packages and do any other bootstrap configuration in the bootstrap script. For production clusters it’s recommended to build own image with this step done in CI.

Superset requires a Python DB-API database driver and a SQLAlchemy dialect to be installed for each datastore you want to connect to.

See Install Database Drivers for more information.

The following example installs the Big Query and Elasticsearch database drivers so that you can connect to those datasources in your Superset installation:

  1. bootstrapScript: |
  2. #!/bin/bash
  3. pip install psycopg2==2.9.6 \
  4. sqlalchemy-bigquery==1.6.1 \
  5. elasticsearch-dbapi==0.2.5 &&\
  6. if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi

superset_config.py

The default superset_config.py is fairly minimal and you will very likely need to extend it. This is done by specifying one or more key/value entries in configOverrides, e.g.:

  1. configOverrides:
  2. my_override: |
  3. # This will make sure the redirect_uri is properly computed, even with SSL offloading
  4. ENABLE_PROXY_FIX = True
  5. FEATURE_FLAGS = {
  6. "DYNAMIC_PLUGINS": True
  7. }

Those will be evaluated as Helm templates and therefore will be able to reference other values.yaml variables e.g. {{ .Values.ingress.hosts[0] }} will resolve to your ingress external domain.

The entire superset_config.py will be installed as a secret, so it is safe to pass sensitive parameters directly… however it might be more readable to use secret env variables for that.

Full python files can be provided by running helm upgrade --install --values my-values.yaml --set-file configOverrides.oauth=set_oauth.py

Environment Variables

Those can be passed as key/values either with extraEnv or extraSecretEnv if they’re sensitive. They can then be referenced from superset_config.py using e.g. os.environ.get("VAR").

  1. extraEnv:
  2. SMTP_HOST: smtp.gmail.com
  3. SMTP_USER: user@gmail.com
  4. SMTP_PORT: "587"
  5. SMTP_MAIL_FROM: user@gmail.com
  6. extraSecretEnv:
  7. SMTP_PASSWORD: xxxx
  8. configOverrides:
  9. smtp: |
  10. import ast
  11. SMTP_HOST = os.getenv("SMTP_HOST","localhost")
  12. SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
  13. SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
  14. SMTP_USER = os.getenv("SMTP_USER","superset")
  15. SMTP_PORT = os.getenv("SMTP_PORT",25)
  16. SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")

System packages

If new system packages are required, they can be installed before application startup by overriding the container’s command, e.g.:

  1. supersetWorker:
  2. command:
  3. - /bin/sh
  4. - -c
  5. - |
  6. apt update
  7. apt install -y somepackage
  8. apt autoremove -yqq --purge
  9. apt clean
  10. # Run celery worker
  11. . {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker

Data sources

Data source definitions can be automatically declared by providing key/value yaml definitions in extraConfigs:

  1. extraConfigs:
  2. import_datasources.yaml: |
  3. databases:
  4. - allow_file_upload: true
  5. allow_ctas: true
  6. allow_cvas: true
  7. database_name: example-db
  8. extra: "{\r\n \"metadata_params\": {},\r\n \"engine_params\": {},\r\n \"\
  9. metadata_cache_timeout\": {},\r\n \"schemas_allowed_for_file_upload\": []\r\n\
  10. }"
  11. sqlalchemy_uri: example://example-db.local
  12. tables: []

Those will also be mounted as secrets and can include sensitive parameters.

Configuration Examples

Setting up OAuth

OAuth setup requires that the authlib Python library is installed. This can be done using pip by updating the bootstrapScript. See the Dependencies section for more information.

  1. extraEnv:
  2. AUTH_DOMAIN: example.com
  3. extraSecretEnv:
  4. GOOGLE_KEY: xxxxxxxxxxxx-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.apps.googleusercontent.com
  5. GOOGLE_SECRET: xxxxxxxxxxxxxxxxxxxxxxxx
  6. configOverrides:
  7. enable_oauth: |
  8. # This will make sure the redirect_uri is properly computed, even with SSL offloading
  9. ENABLE_PROXY_FIX = True
  10. from flask_appbuilder.security.manager import AUTH_OAUTH
  11. AUTH_TYPE = AUTH_OAUTH
  12. OAUTH_PROVIDERS = [
  13. {
  14. "name": "google",
  15. "icon": "fa-google",
  16. "token_key": "access_token",
  17. "remote_app": {
  18. "client_id": os.getenv("GOOGLE_KEY"),
  19. "client_secret": os.getenv("GOOGLE_SECRET"),
  20. "api_base_url": "https://www.googleapis.com/oauth2/v2/",
  21. "client_kwargs": {"scope": "email profile"},
  22. "request_token_url": None,
  23. "access_token_url": "https://accounts.google.com/o/oauth2/token",
  24. "authorize_url": "https://accounts.google.com/o/oauth2/auth",
  25. "authorize_params": {"hd": os.getenv("AUTH_DOMAIN", "")}
  26. },
  27. }
  28. ]
  29. # Map Authlib roles to superset roles
  30. AUTH_ROLE_ADMIN = 'Admin'
  31. AUTH_ROLE_PUBLIC = 'Public'
  32. # Will allow user self registration, allowing to create Flask users from Authorized User
  33. AUTH_USER_REGISTRATION = True
  34. # The default user self registration role
  35. AUTH_USER_REGISTRATION_ROLE = "Admin"

Enable Alerts and Reports

For this, as per the Alerts and Reports doc, you will need to:

Install a supported webdriver in the Celery worker

This is done either by using a custom image that has the webdriver pre-installed, or installing at startup time by overriding the command. Here’s a working example for chromedriver:

  1. supersetWorker:
  2. command:
  3. - /bin/sh
  4. - -c
  5. - |
  6. # Install chrome webdriver
  7. # See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/docs/installation/email_reports.mdx
  8. apt-get update
  9. apt-get install -y wget
  10. wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
  11. apt-get install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb
  12. wget https://chromedriver.storage.googleapis.com/88.0.4324.96/chromedriver_linux64.zip
  13. apt-get install -y zip
  14. unzip chromedriver_linux64.zip
  15. chmod +x chromedriver
  16. mv chromedriver /usr/bin
  17. apt-get autoremove -yqq --purge
  18. apt-get clean
  19. rm -f google-chrome-stable_current_amd64.deb chromedriver_linux64.zip
  20. # Run
  21. . {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker
Run the Celery beat

This pod will trigger the scheduled tasks configured in the alerts and reports UI section:

  1. supersetCeleryBeat:
  2. enabled: true
Configure the appropriate Celery jobs and SMTP/Slack settings
  1. extraEnv:
  2. SMTP_HOST: smtp.gmail.com
  3. SMTP_USER: user@gmail.com
  4. SMTP_PORT: "587"
  5. SMTP_MAIL_FROM: user@gmail.com
  6. extraSecretEnv:
  7. SLACK_API_TOKEN: xoxb-xxxx-yyyy
  8. SMTP_PASSWORD: xxxx-yyyy
  9. configOverrides:
  10. feature_flags: |
  11. import ast
  12. FEATURE_FLAGS = {
  13. "ALERT_REPORTS": True
  14. }
  15. SMTP_HOST = os.getenv("SMTP_HOST","localhost")
  16. SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
  17. SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
  18. SMTP_USER = os.getenv("SMTP_USER","superset")
  19. SMTP_PORT = os.getenv("SMTP_PORT",25)
  20. SMTP_PASSWORD = os.getenv("SMTP_PASSWORD","superset")
  21. SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM","superset@superset.com")
  22. SLACK_API_TOKEN = os.getenv("SLACK_API_TOKEN",None)
  23. celery_conf: |
  24. from celery.schedules import crontab
  25. class CeleryConfig:
  26. broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
  27. imports = (
  28. "superset.sql_lab",
  29. "superset.tasks.cache",
  30. "superset.tasks.scheduler",
  31. )
  32. result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
  33. task_annotations = {
  34. "sql_lab.get_sql_results": {
  35. "rate_limit": "100/s",
  36. },
  37. }
  38. beat_schedule = {
  39. "reports.scheduler": {
  40. "task": "reports.scheduler",
  41. "schedule": crontab(minute="*", hour="*"),
  42. },
  43. "reports.prune_log": {
  44. "task": "reports.prune_log",
  45. 'schedule': crontab(minute=0, hour=0),
  46. },
  47. 'cache-warmup-hourly': {
  48. "task": "cache-warmup",
  49. "schedule": crontab(minute="*/30", hour="*"),
  50. "kwargs": {
  51. "strategy_name": "top_n_dashboards",
  52. "top_n": 10,
  53. "since": "7 days ago",
  54. },
  55. }
  56. }
  57. CELERY_CONFIG = CeleryConfig
  58. reports: |
  59. EMAIL_PAGE_RENDER_WAIT = 60
  60. WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}/"
  61. WEBDRIVER_BASEURL_USER_FRIENDLY = "https://www.example.com/"
  62. WEBDRIVER_TYPE= "chrome"
  63. WEBDRIVER_OPTION_ARGS = [
  64. "--force-device-scale-factor=2.0",
  65. "--high-dpi-support=2.0",
  66. "--headless",
  67. "--disable-gpu",
  68. "--disable-dev-shm-usage",
  69. # This is required because our process runs as root (in order to install pip packages)
  70. "--no-sandbox",
  71. "--disable-setuid-sandbox",
  72. "--disable-extensions",
  73. ]

Load the Examples data and dashboards

If you are trying Superset out and want some data and dashboards to explore, you can load some examples by creating a my_values.yaml and deploying it as described above in the Configure your setting overrides step of the Running section. To load the examples, add the following to the my_values.yaml file:

  1. init:
  2. loadExamples: true