With the release of version 3.0.0, Hadoop introduced the use of Erasure Coding (EC) in HDFS. By default HDFS achieves durability via block replication. Usually the replication count is 3, resulting in a storage overhead of 200%. Hadoop 3 introduced EC as a better way to achieve durability. EC behaves much like RAID 5 or 6…for k blocks of data, m blocks of parity data are generated, from which the original data can be recovered in the event of disk or node failures (erasures, in EC parlance). A typical EC scheme is Reed-Solomon 6-3, where 6 data blocks produce 3 parity blocks, an overhead of only 50%. In addition to doubling the available disk space, RS-6-3 is also more fault tolerant…a loss of 3 data blocks can be tolerated, whereas triple replication can only sustain a loss of two.
To use EC with Accumulo, it is highly recommended that you first rebuild Hadoop with support for Intel’s ISA-L library. Instructions for doing this can be found here
Important Warning
As noted here, the current EC implementation does not support hflush()
and hsync()
. These functions are no-ops, which means that EC coded files are not guaranteed to be written to disk after a sync or flush. For this reason, EC should never be used for the Accumulo write-ahead logs. Data loss may, and most likely will, occur. It is also recommended that tables in the accumulo
namespace (root
and metadata
for example) continue to use replication.
EC and Threads
Due to the striped nature of an EC encoded file, an EC enabled HDFS client is threaded. This becomes an issue when an Accumulo client or service is configured to use multiple threads to read or write to HDFS, and becomes especially problematic when doing bulk imports. By default, Accumulo will use eight times the number of cores on the client machine to scan the files to be imported and map them to tablet files. Each thread created to scan the input files will create on the order of k threads to perform parallel I/O. RS-10-4 on a 16 core machine, for instance, will spawn over a thousand threads to perform this operation. If sufficient memory is not available, this operation will fail without providing a meaningful error message to the user. This particular problem can be ameliorated by setting the bulk.threads
client property to 1C
(i.e. one thread per core), down from the default of 8C
. Similar care should be taken when setting other thread limits.
HDFS ec Command
Encoding policy in HDFS is set at the directory level, with children inheriting policies from their parents if not explicitly set. The encoding policy for a directory can be manipulated via the hdfs ec
command, documented here.
The first step is to determine which policies are configured for your HDFS instance. This is done via the -listPolicies
command. The following listing shows that there are 5 configured policies, of which only 3 (RS-10-4-1024k, RS-6-3-1024k, and RS-6-3-64k) are enabled for use.
$ hdfs ec -listPolicies
Erasure Coding Policies:
ErasureCodingPolicy=[Name=RS-10-4-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=10, numParityUnits=4]], CellSize=1048576, Id=5], State=ENABLED
ErasureCodingPolicy=[Name=RS-6-3-1024k, Schema=[ECSchema=[Codec=rs, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=1], State=ENABLED
ErasureCodingPolicy=[Name=RS-6-3-64k, Schema=[ECSchema=[Codec=rs, numDataUnits=6, numParityUnits=3, options=]], CellSize=65536, Id=65], State=ENABLED
ErasureCodingPolicy=[Name=RS-LEGACY-6-3-1024k, Schema=[ECSchema=[Codec=rs-legacy, numDataUnits=6, numParityUnits=3]], CellSize=1048576, Id=3], State=DISABLED
ErasureCodingPolicy=[Name=XOR-2-1-1024k, Schema=[ECSchema=[Codec=xor, numDataUnits=2, numParityUnits=1]], CellSize=1048576, Id=4], State=DISABLED
To set the encoding policy for a directory, use the -setPolicy
command.
$ hadoop fs -mkdir foo
$ hdfs ec -setPolicy -policy RS-6-3-64k -path foo
Set RS-6-3-64k erasure coding policy on foo
To get the encoding policy for a directory, use the -getPolicy
command.
$ hdfs ec -getPolicy -path foo
RS-6-3-64k
New directories created under foo
will inherit the EC policy.
$ hadoop fs -mkdir foo/bar
$ hdfs ec -getPolicy -path foo/bar
RS-6-3-64k
And changing the policy for a parent will also change its children. The -setPolicy
command here issues a warning that existing files will not be converted. To switch the policy for an existing file, you must create a new file (through a copy, for instance). For Accumulo, if you change the encoding policy for a table’s directories, you would then have to perform a major compaction on the table to convert the table’s RFiles to the desired encoding.
$ hdfs ec -setPolicy -policy RS-6-3-1024k -path foo
Set RS-6-3-1024k erasure coding policy on foo
Warning: setting erasure coding policy on a non-empty directory will not automatically convert existing files to RS-6-3-1024k erasure coding policy
$ hdfs ec -getPolicy -path foo
RS-6-3-1024k
$ hdfs ec -getPolicy -path foo/bar
RS-6-3-1024k
Configuring EC for a New Instance
If you wish to create a new instance with a single encoding policy for all tables, you simply need to change the encoding policy on the tables
directory after running accumulo init
(see Quick Start guide). To keep the tables in the accumulo
namespace using replication, you would then need to manually change them back to using replication. Assuming Accumulo is configured to use /accumulo
as its root, you would do the following:
$ hdfs ec -setPolicy -policy RS-6-3-64k -path /accumulo/tables
Set RS-6-3-64k erasure coding policy on /accumulo/tables
$ hdfs ec -setPolicy -replicate -path /accumulo/tables/\!0
Set replication erasure coding policy on /accumulo/tables/!0
$ hdfs ec -setPolicy -replicate -path /accumulo/tables/+r
Set replication erasure coding policy on /accumulo/tables/+r
$ hdfs ec -setPolicy -replicate -path /accumulo/tables/+rep
Set replication erasure coding policy on /accumulo/tables/+rep
Check that the policies are set correctly:
$ hdfs ec -getPolicy -path /accumulo/tables
RS-6-3-64k
$ hdfs ec -getPolicy -path /accumulo/tables/\!0
The erasure coding policy of /accumulo/tables/!0 is unspecified
Any directories subsequently created under /accumulo/tables
will be erasure coded.
Configuring EC for an Existing Instance
For an existing installation, the instructions are the same, but with the caveat that changing the encoding policy for an existing directory will not change the files within the directory. Converting existing tables to EC requires a major compaction to complete the process. For instance, to convert test.table1
to RS-6-3-64k, you would first find the table ID via the accumulo shell, use hdfs ec
to change the encoding for the directory /accumulo/tables/<tableID>
, and then compact the table.
$ accumulo shell
user@instance> tables -l
accumulo.metadata => !0
accumulo.replication => +rep
accumulo.root => +r
test.table1 => 3
test.table2 => 4
test.table3 => 5
trace => 1
user@instance> quit
$ hdfs ec -setPolicy -policy RS-6-3-64k -path /accumulo/tables/3
Set RS-6-3-64k erasure coding policy on /accumulo/tables/3
$ accumulo shell
user@instance> compact -t test.table1
Defining Custom EC Policies
Hadoop by default will enable only a single EC policy, which is determined by the value of the dfs.namenode.ec.system.default.policy
configuration setting. To enable an existing policy, use the hdfs ec -enablePolicy
command. To define custom policies, you must first edit the user_ec_policies.xml
file found in the Hadoop configuration directory, and then run the hdfs ec -addPolicies
command. For example, to add RS-6-3-64k as a policy, you first edit user_ec_policies.xml
and add the following:
<configuration>
<layoutversion>1</layoutversion>
<schemas>
<!-- schema id is only used to reference internally in this document -->
<schema id="RSk6m3">
<codec>rs</codec>
<k>6</k>
<m>3</m>
<options> </options>
</schema>
</schemas>
<policies>
<policy>
<schema>RSk6m3</schema>
<cellsize>65536</cellsize>
</policy>
</policies>
</configuration>
Here the schema “RSk6m3” defines a Reed-Solomon encoding with k\=6 data blocks and m\=3 parity blocks. This schema is then used to define a policy that uses RS-6-3 encoding with a stripe size of 64k. To add this policy:
$ hdfs ec -addPolicies -policyFile /hadoop/etc/hadoop/user_ec_policies.xml
2019-11-19 15:35:23,703 INFO util.ECPolicyLoader: Loading EC policy file /hadoop/etc/hadoop/user_ec_policies.xml
Add ErasureCodingPolicy RS-6-3-64k succeed.
To enable the policy:
$ hdfs ec -enablePolicy -policy RS-6-3-64k
Erasure coding policy RS-6-3-64k is enabled