It can be difficult to determine why some operations are taking longer than expected. For example, you may be looking up items with very low latency, but sometimes the lookups take much longer. Determining the cause of the delay is difficult because the system is distributed, and the typical lookup is fast.

Accumulo has been instrumented to record the time that various operations take when tracing is turned on. The fact that tracing is enabled follows all the requests made on behalf of the user throughout the distributed infrastructure of accumulo, and across all threads of execution.

Starting with version 2.1.0, Accumulo uses OpenTelemetry to collect and transport trace information to a back-end server that can display the trace information. The Accumulo Trace server process and trace table are no longer used. Also, the old trace configuration properties are now deprecated and not used.

Configuring Tracing

To collect traces, Accumulo needs two items:

One way to set the globalOpenTelemetry member variable is to use the OpenTelemetry Java Agent. Simply putting the agent jar on the classpath and configuring that agent.

Hadoop is also working on using OpenTelemetry. This is being tracked at https://issues.apache.org/jira/browse/HADOOP-15566.

Instrumenting a Client

Accumulo client operations will be traced as part of a client application operation if the client application is also instrumented using OpenTelemetry and invokes the Accumulo client operation in a Span. Client application developers can use the OpenTelemetry documentation to instrument the application. To collect traces in the client, Accumulo needs the io.opentelemetry.api.GlobalOpenTelemetry.globalOpenTelemetry member variable set to an OpenTelemetry instance.

Tracing from the Shell

You can enable tracing for operations run from the shell by using the trace on and trace off commands.