Setting up a Development Environment
The documentation in this section is a bit of a patchwork of knowledge representing the
multitude of ways that exist to run Superset (docker-compose
, just “docker”, on “metal”, using
a Makefile).
Now we have evolved to recommend and support
docker-compose
more actively as the main way to run Superset for development and preserve your sanity. Most people should stick to the first few sections - (“Fork & Clone”, “docker-compose” and “Installing Dev Tools”)
Fork and Clone
First, fork the repository on GitHub, then clone it.
Second, you can clone the main repository directly, but you won’t be able to send pull requests.
git clone git@github.com:your-username/superset.git
cd superset
docker-compose (recommended!)
Setting things up to squeeze an “hello world” into any part of Superset should be as simple as
docker-compose up
Note that:
- this will pull/build docker images and run a cluster of services, including:
- A Superset Flask web server, mounting the local python repo/code
- A Superset Celery worker, also mounting the local python repo/code
- A Superset Node service, mounting, compiling and bundling the JS/TS assets
- A Superset Node websocket service to power the async backend
- Postgres as the metadata database and to store example datasets, charts and dashboards whic should be populated upon startup
- Redis as the message queue for our async backend and caching backend
- It’ll load up examples into the database upon first startup
- all other details and pointers available in docker-compose.yml
- The local repository is mounted withing the services, meaning updating the code on the host will be reflected in the docker images
- Superset is served at localhost:8088/
- You can login with admin/admin
Since
docker-compose
is primarily designed to run a set of containers on a single host and can’t credibly support high availability as a result, we do not support nor recommend using ourdocker-compose
constructs to support production-type use-cases. For single host environments, we recommend using minikube along our installing on k8s documentation. configured to be secure.
Installing Development Tools
While docker-compose simplifies a lot of the setup, there are still many things you’ll want to set up locally to power your IDE, and things like commit hooks, linters, and test-runners. Note that you can do these things inside docker images with commands like
docker-compose exec superset_app bash
for instance, but many people like to run that tooling from their host.
Python environment
Assuming you already have a way to setup your python environments
like pyenv
, virtualenv
or something else, all you should have to
do is to install our dev, pinned python requirements bundle
pip install -r requirements/development.txt
Git Hooks
Superset uses Git pre-commit hooks courtesy of pre-commit. To install run the following:
pre-commit install
A series of checks will now run when you make a git commit.
Alternatives to docker-compose
This part of the documentation is a patchwork of information related to setting up development environments without
docker-compose
and are documented/supported to varying degrees. It’s been difficult to maintain this wide array of methods and insure they’re functioning across environments.
Flask server
OS Dependencies
Make sure your machine meets the OS dependencies before following these steps. You also need to install MySQL or MariaDB.
Ensure that you are using Python version 3.9, 3.10 or 3.11, then proceed with:
# Create a virtual environment and activate it (recommended)
python3 -m venv venv # setup a python3 virtualenv
source venv/bin/activate
# Install external dependencies
pip install -r requirements/development.txt
# Install Superset in editable (development) mode
pip install -e .
# Initialize the database
superset db upgrade
# Create an admin user in your metadata database (use `admin` as username to be able to load the examples)
superset fab create-admin
# Create default roles and permissions
superset init
# Load some data to play with.
# Note: you MUST have previously created an admin user with the username `admin` for this command to work.
superset load-examples
# Start the Flask dev web server from inside your virtualenv.
# Note that your page may not have CSS at this point.
# See instructions below how to build the front-end assets.
superset run -p 8088 --with-threads --reload --debugger --debug
Or you can install via our Makefile
# Create a virtual environment and activate it (recommended)
$ python3 -m venv venv # setup a python3 virtualenv
$ source venv/bin/activate
# install pip packages + pre-commit
$ make install
# Install superset pip packages and setup env only
$ make superset
# Setup pre-commit only
$ make pre-commit
Note: the FLASK_APP env var should not need to be set, as it’s currently controlled
via .flaskenv
, however if needed, it should be set to superset.app:create_app()
If you have made changes to the FAB-managed templates, which are not built the same way as the newer, React-powered front-end assets, you need to start the app without the --with-threads
argument like so:
superset run -p 8088 --reload --debugger --debug
Dependencies
If you add a new requirement or update an existing requirement (per the install_requires
section in setup.py
) you must recompile (freeze) the Python dependencies to ensure that for CI, testing, etc. the build is deterministic. This can be achieved via,
$ python3 -m venv venv
$ source venv/bin/activate
$ python3 -m pip install -r requirements/development.txt
$ pip-compile-multi --no-upgrade
When upgrading the version number of a single package, you should run pip-compile-multi
with the -P
flag:
$ pip-compile-multi -P my-package
To bring all dependencies up to date as per the restrictions defined in setup.py
and requirements/*.in
, run pip-compile-multi` without any flags:
$ pip-compile-multi
This should be done periodically, but it is recommended to do thorough manual testing of the application to ensure no breaking changes have been introduced that aren’t caught by the unit and integration tests.
Logging to the browser console
This feature is only available on Python 3. When debugging your application, you can have the server logs sent directly to the browser console using the ConsoleLog package. You need to mutate the app, by adding the following to your config.py
or superset_config.py
:
from console_log import ConsoleLog
def FLASK_APP_MUTATOR(app):
app.wsgi_app = ConsoleLog(app.wsgi_app, app.logger)
Then make sure you run your WSGI server using the right worker type:
gunicorn "superset.app:create_app()" -k "geventwebsocket.gunicorn.workers.GeventWebSocketWorker" -b 127.0.0.1:8088 --reload
You can log anything to the browser console, including objects:
from superset import app
app.logger.error('An exception occurred!')
app.logger.info(form_data)
Frontend
Frontend assets (TypeScript, JavaScript, CSS, and images) must be compiled in order to properly display the web UI. The superset-frontend
directory contains all NPM-managed frontend assets. Note that for some legacy pages there are additional frontend assets bundled with Flask-Appbuilder (e.g. jQuery and bootstrap). These are not managed by NPM and may be phased out in the future.
Prerequisite
nvm and node
First, be sure you are using the following versions of Node.js and npm:
Node.js
: Version 18npm
: Version 10
We recommend using nvm to manage your node environment:
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.37.0/install.sh | bash
incase it shows '-bash: nvm: command not found'
export NVM_DIR="$HOME/.nvm"
[ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh" # This loads nvm
[ -s "$NVM_DIR/bash_completion" ] && \. "$NVM_DIR/bash_completion" # This loads nvm bash_completion
cd superset-frontend
nvm install --lts
nvm use --lts
Or if you use the default macOS starting with Catalina shell zsh
, try:
sh -c "$(curl -fsSL https://raw.githubusercontent.com/nvm-sh/nvm/v0.37.0/install.sh)"
For those interested, you may also try out avn to automatically switch to the node version that is required to run Superset frontend.
Install dependencies
Install third-party dependencies listed in package.json
via:
# From the root of the repository
cd superset-frontend
# Install dependencies from `package-lock.json`
npm ci
Note that Superset uses Scarf to capture telemetry/analytics about versions being installed, including the scarf-js
npm package and an analytics pixel. As noted elsewhere in this documentation, Scarf gathers aggregated stats for the sake of security/release strategy, and does not capture/retain PII. You can read here about the scarf-js
package, and various means to opt out of it, but you can opt out of the npm package and the pixel by setting the SCARF_ANALYTICS
envinronment variable to false
or opt out of the pixel by adding this setting in superset-frontent/package.json
:
// your-package/package.json
{
// ...
"scarfSettings": {
"enabled": false
}
// ...
}
Build assets
There are three types of assets you can build:
npm run build
: the production assets, CSS/JSS minified and optimizednpm run dev-server
: local development assets, with sourcemaps and hot refresh supportnpm run build-instrumented
: instrumented application code for collecting code coverage from Cypress tests
If while using the above commands you encounter an error related to the limit of file watchers:
Error: ENOSPC: System limit for number of file watchers reached
The error is thrown because the number of files monitored by the system has reached the limit. You can address this this error by increasing the number of inotify watchers.
The current value of max watches can be checked with:
cat /proc/sys/fs/inotify/max_user_watches
Edit the file /etc/sysctl.conf to increase this value. The value needs to be decided based on the system memory (see this StackOverflow answer for more context).
Open the file in editor and add a line at the bottom specifying the max watches values.
fs.inotify.max_user_watches=524288
Save the file and exit editor. To confirm that the change succeeded, run the following command to load the updated value of max_user_watches from sysctl.conf:
sudo sysctl -p
Webpack dev server
The dev server by default starts at http://localhost:9000
and proxies the backend requests to http://localhost:8088
.
So a typical development workflow is the following:
- run Superset locally using Flask, on port
8088
— but don’t access it directly,# Install Superset and dependencies, plus load your virtual environment first, as detailed above.
superset run -p 8088 --with-threads --reload --debugger --debug
- in parallel, run the Webpack dev server locally on port
9000
,npm run dev-server
- access
http://localhost:9000
(the Webpack server, not Flask) in your web browser. This will use the hot-reloading front-end assets from the Webpack development server while redirecting back-end queries to Flask/Superset: your changes on Superset codebase — either front or back-end — will then be reflected live in the browser.
It’s possible to change the Webpack server settings:
# Start the dev server at http://localhost:9000
npm run dev-server
# Run the dev server on a non-default port
npm run dev-server -- --port=9001
# Proxy backend requests to a Flask server running on a non-default port
npm run dev-server -- --env=--supersetPort=8081
# Proxy to a remote backend but serve local assets
npm run dev-server -- --env=--superset=https://superset-dev.example.com
The --superset=
option is useful in case you want to debug a production issue or have to setup Superset behind a firewall. It allows you to run Flask server in another environment while keep assets building locally for the best developer experience.
Other npm commands
Alternatively, there are other NPM commands you may find useful:
npm run build-dev
: build assets in development mode.npm run dev
: built dev assets in watch mode, will automatically rebuild when a file changes
Docker (docker compose)
See docs here
Updating NPM packages
Use npm in the prescribed way, making sure that
superset-frontend/package-lock.json
is updated according to npm
-prescribed
best practices.
Feature flags
Superset supports a server-wide feature flag system, which eases the incremental development of features. To add a new feature flag, simply modify superset_config.py
with something like the following:
FEATURE_FLAGS = {
'SCOPED_FILTER': True,
}
If you want to use the same flag in the client code, also add it to the FeatureFlag TypeScript enum in @superset-ui/core. For example,
export enum FeatureFlag {
SCOPED_FILTER = "SCOPED_FILTER",
}
superset/config.py
contains DEFAULT_FEATURE_FLAGS
which will be overwritten by
those specified under FEATURE_FLAGS in superset_config.py
. For example, DEFAULT_FEATURE_FLAGS = { 'FOO': True, 'BAR': False }
in superset/config.py
and FEATURE_FLAGS = { 'BAR': True, 'BAZ': True }
in superset_config.py
will result
in combined feature flags of { 'FOO': True, 'BAR': True, 'BAZ': True }
.
The current status of the usability of each flag (stable vs testing, etc) can be found in RESOURCES/FEATURE_FLAGS.md
.
Git Hooks
Superset uses Git pre-commit hooks courtesy of pre-commit. To install run the following:
pip3 install -r requirements/development.txt
pre-commit install
A series of checks will now run when you make a git commit.
Alternatively it is possible to run pre-commit via tox:
tox -e pre-commit
Or by running pre-commit manually:
pre-commit run --all-files
Linting
Python
We use Pylint for linting which can be invoked via:
# for python
tox -e pylint
In terms of best practices please avoid blanket disabling of Pylint messages globally (via .pylintrc
) or top-level within the file header, albeit there being a few exceptions. Disabling should occur inline as it prevents masking issues and provides context as to why said message is disabled.
Additionally, the Python code is auto-formatted using Black which is configured as a pre-commit hook. There are also numerous editor integrations
TypeScript
cd superset-frontend
npm ci
# run eslint checks
npm run eslint -- .
# run tsc (typescript) checks
npm run type
If using the eslint extension with vscode, put the following in your workspace settings.json
file:
"eslint.workingDirectories": [
"superset-frontend"
]
Testing
Python Testing
All python tests are carried out in tox a standardized testing framework. All python tests can be run with any of the tox environments, via,
tox -e <environment>
For example,
tox -e py38
Alternatively, you can run all tests in a single file via,
tox -e <environment> -- tests/test_file.py
or for a specific test via,
tox -e <environment> -- tests/test_file.py::TestClassName::test_method_name
Note that the test environment uses a temporary directory for defining the SQLite databases which will be cleared each time before the group of test commands are invoked.
There is also a utility script included in the Superset codebase to run python integration tests. The readme can be found here
To run all integration tests for example, run this script from the root directory:
scripts/tests/run.sh
You can run unit tests found in ‘./tests/unit_tests’ for example with pytest. It is a simple way to run an isolated test that doesn’t need any database setup
pytest ./link_to_test.py
Frontend Testing
We use Jest and Enzyme to test TypeScript/JavaScript. Tests can be run with:
cd superset-frontend
npm run test
To run a single test file:
npm run test -- path/to/file.js
Integration Testing
We use Cypress for integration tests. Tests can be run by tox -e cypress
. To open Cypress and explore tests first setup and run test server:
export SUPERSET_CONFIG=tests.integration_tests.superset_test_config
export SUPERSET_TESTENV=true
export CYPRESS_BASE_URL="http://localhost:8081"
superset db upgrade
superset load_test_users
superset load-examples --load-test-data
superset init
superset run --port 8081
Run Cypress tests:
cd superset-frontend
npm run build-instrumented
cd cypress-base
npm install
# run tests via headless Chrome browser (requires Chrome 64+)
npm run cypress-run-chrome
# run tests from a specific file
npm run cypress-run-chrome -- --spec cypress/e2e/explore/link.test.ts
# run specific file with video capture
npm run cypress-run-chrome -- --spec cypress/e2e/dashboard/index.test.js --config video=true
# to open the cypress ui
npm run cypress-debug
# to point cypress to a url other than the default (http://localhost:8088) set the environment variable before running the script
# e.g., CYPRESS_BASE_URL="http://localhost:9000"
CYPRESS_BASE_URL=<your url> npm run cypress open
See superset-frontend/cypress_build.sh
.
As an alternative you can use docker compose environment for testing:
Make sure you have added below line to your /etc/hosts file:
127.0.0.1 db
If you already have launched Docker environment please use the following command to assure a fresh database instance:
docker compose down -v
Launch environment:
CYPRESS_CONFIG=true docker compose up
It will serve backend and frontend on port 8088.
Run Cypress tests:
cd cypress-base
npm install
npm run cypress open
Debugging Server App
Local
For debugging locally using VSCode, you can configure a launch configuration file .vscode/launch.json such as
{
"version": "0.2.0",
"configurations": [
{
"name": "Python: Flask",
"type": "python",
"request": "launch",
"module": "flask",
"env": {
"FLASK_APP": "superset",
"SUPERSET_ENV": "development"
},
"args": ["run", "-p 8088", "--with-threads", "--reload", "--debugger"],
"jinja": true,
"justMyCode": true
}
]
}
Raw Docker (without docker-compose)
Follow these instructions to debug the Flask app running inside a docker container. Note that this will run a barebones Superset web server,
First add the following to the ./docker-compose.yaml file
superset:
env_file: docker/.env
image: *superset-image
container_name: superset_app
command: ["/app/docker/docker-bootstrap.sh", "app"]
restart: unless-stopped
+ cap_add:
+ - SYS_PTRACE
ports:
- 8088:8088
+ - 5678:5678
user: "root"
depends_on: *superset-depends-on
volumes: *superset-volumes
environment:
CYPRESS_CONFIG: "${CYPRESS_CONFIG}"
Start Superset as usual
docker compose up
Install the required libraries and packages to the docker container
Enter the superset_app container
docker exec -it superset_app /bin/bash
root@39ce8cf9d6ab:/app#
Run the following commands inside the container
apt update
apt install -y gdb
apt install -y net-tools
pip install debugpy
Find the PID for the Flask process. Make sure to use the first PID. The Flask app will re-spawn a sub-process every time you change any of the python code. So it’s important to use the first PID.
ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 14:09 ? 00:00:00 bash /app/docker/docker-bootstrap.sh app
root 6 1 4 14:09 ? 00:00:04 /usr/local/bin/python /usr/bin/flask run -p 8088 --with-threads --reload --debugger --host=0.0.0.0
root 10 6 7 14:09 ? 00:00:07 /usr/local/bin/python /usr/bin/flask run -p 8088 --with-threads --reload --debugger --host=0.0.0.0
Inject debugpy into the running Flask process. In this case PID 6.
python3 -m debugpy --listen 0.0.0.0:5678 --pid 6
Verify that debugpy is listening on port 5678
netstat -tunap
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:5678 0.0.0.0:* LISTEN 462/python
tcp 0 0 0.0.0.0:8088 0.0.0.0:* LISTEN 6/python
You are now ready to attach a debugger to the process. Using VSCode you can configure a launch configuration file .vscode/launch.json like so.
{
"version": "0.2.0",
"configurations": [
{
"name": "Attach to Superset App in Docker Container",
"type": "python",
"request": "attach",
"connect": {
"host": "127.0.0.1",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/app"
}
]
}
]
}
VSCode will not stop on breakpoints right away. We’ve attached to PID 6 however it does not yet know of any sub-processes. In order to “wake up” the debugger you need to modify a python file. This will trigger Flask to reload the code and create a new sub-process. This new sub-process will be detected by VSCode and breakpoints will be activated.
Debugging Server App in Kubernetes Environment
To debug Flask running in POD inside kubernetes cluster. You’ll need to make sure the pod runs as root and is granted the SYS_TRACE capability.These settings should not be used in production environments.
securityContext:
capabilities:
add: ["SYS_PTRACE"]
See (set capabilities for a container)[https://kubernetes.io/docs/tasks/configure-pod-container/security-context/#set-capabilities-for-a-container] for more details.
Once the pod is running as root and has the SYS_PTRACE capability it will be able to debug the Flask app.
You can follow the same instructions as in the docker-compose. Enter the pod and install the required library and packages; gdb, netstat and debugpy.
Often in a Kubernetes environment nodes are not addressable from outside the cluster. VSCode will thus be unable to remotely connect to port 5678 on a Kubernetes node. In order to do this you need to create a tunnel that port forwards 5678 to your local machine.
kubectl port-forward pod/superset-<some random id> 5678:5678
You can now launch your VSCode debugger with the same config as above. VSCode will connect to to 127.0.0.1:5678 which is forwarded by kubectl to your remote kubernetes POD.
Storybook
Superset includes a Storybook to preview the layout/styling of various Superset components, and variations thereof. To open and view the Storybook:
cd superset-frontend
npm run storybook
When contributing new React components to Superset, please try to add a Story alongside the component’s jsx/tsx
file.
Tips
Adding a new datasource
Create Models and Views for the datasource, add them under superset folder, like a new my_models.py with models for cluster, datasources, columns and metrics and my_views.py with clustermodelview and datasourcemodelview.
Create DB migration files for the new models
Specify this variable to add the datasource model and from which module it is from in config.py:
For example:
ADDITIONAL_MODULE_DS_MAP = {'superset.my_models': ['MyDatasource', 'MyOtherDatasource']}
This means it’ll register MyDatasource and MyOtherDatasource in superset.my_models module in the source registry.
Visualization Plugins
The topic of authoring new plugins, whether you’d like to contribute it back or not has been well documented in the the documentation, and in this blog post.
To contribute a plugin to Superset, your plugin must meet the following criteria:
- The plugin should be applicable to the community at large, not a particularly specialized use case
- The plugin should be written with TypeScript
- The plugin should contain sufficient unit/e2e tests
- The plugin should use appropriate namespacing, e.g. a folder name of
plugin-chart-whatever
and a package name of@superset-ui/plugin-chart-whatever
- The plugin should use them variables via Emotion, as passed in by the ThemeProvider
- The plugin should provide adequate error handling (no data returned, malformed data, invalid controls, etc.)
- The plugin should contain documentation in the form of a populated
README.md
file - The plugin should have a meaningful and unique icon
- Above all else, the plugin should come with a commitment to maintenance from the original author(s)
Submissions will be considered for submission (or removal) on a case-by-case basis.
Adding a DB migration
Alter the model you want to change. This example will add a
Column
Annotations model.Generate the migration file
superset db migrate -m 'add_metadata_column_to_annotation_model'
This will generate a file in
migrations/version/{SHA}_this_will_be_in_the_migration_filename.py
.Upgrade the DB
superset db upgrade
The output should look like this:
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade 1a1d627ebd8e -> 40a0a483dd12, add_metadata_column_to_annotation_model.py
Add column to view
Since there is a new column, we need to add it to the AppBuilder Model view.
Test the migration’s
down
methodsuperset db downgrade
The output should look like this:
INFO [alembic.runtime.migration] Context impl SQLiteImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running downgrade 40a0a483dd12 -> 1a1d627ebd8e, add_metadata_column_to_annotation_model.py
Merging DB migrations
When two DB migrations collide, you’ll get an error message like this one:
alembic.util.exc.CommandError: Multiple head revisions are present for
given argument 'head'; please specify a specific target
revision, '<branchname>@head' to narrow to a specific head,
or 'heads' for all heads`
To fix it:
Get the migration heads
superset db heads
This should list two or more migration hashes. E.g.
1412ec1e5a7b (head)
67da9ef1ef9c (head)
Pick one of them as the parent revision, open the script for the other revision and update
Revises
anddown_revision
to the new parent revision. E.g.:--- a/67da9ef1ef9c_add_hide_left_bar_to_tabstate.py
+++ b/67da9ef1ef9c_add_hide_left_bar_to_tabstate.py
@@ -17,14 +17,14 @@
"""add hide_left_bar to tabstate
Revision ID: 67da9ef1ef9c
-Revises: c501b7c653a3
+Revises: 1412ec1e5a7b
Create Date: 2021-02-22 11:22:10.156942
"""
# revision identifiers, used by Alembic.
revision = "67da9ef1ef9c"
-down_revision = "c501b7c653a3"
+down_revision = "1412ec1e5a7b"
import sqlalchemy as sa
from alembic import op
Alternatively you may also run
superset db merge
to create a migration script just for merging the heads.superset db merge {HASH1} {HASH2}
Upgrade the DB to the new checkpoint
superset db upgrade