Gluten is a Spark plugin developed by Intel, designed to accelerate Apache Spark with native libraries. Currently, only CentOS 7/8 and Ubuntu 20.04/22.04, along with Spark 3.2/3.3/3.4, are supported. Users can employ the following methods to utilize the Gluten with Velox native libraries.
Building(with velox Backend)
Build gluten velox backend package
Git clone gluten project, use gluten build script buildbundle-veloxbe.sh
, and target package is in /path/to/gluten/package/target/
git clone https://github.com/oap-project/gluten.git
cd /path/to/gluten
## The script builds two jars for spark 3.2.x, 3.3.x, and 3.4.x.
./dev/buildbundle-veloxbe.sh
Usage
You can use Gluten to accelerate Spark by following steps.
Installing
Add gluten jar: copy /path/to/gluten/package/target/gluten-velox-bundle-spark3.x_2.12-*.jar $SPARK_HOME/jars/
or specified to spark.jars
configuration
Configure
Add the following minimal configuration into spark-defaults.conf
:
spark.plugins=io.glutenproject.GlutenPlugin
spark.memory.offHeap.size=20g
spark.memory.offHeap.enabled=true
spark.shuffle.manager=org.apache.spark.shuffle.sort.ColumnarShuffleManager
For more configuration can be found in the doc of Configuration.