Metrics Reporting
As of 1.1.0 Iceberg supports the MetricsReporter
and the MetricsReport
APIs. These two APIs allow expressing different metrics reports while supporting a pluggable way of reporting these reports.
Type of Reports
ScanReport
A ScanReport
carries metrics being collected during scan planning against a given table. Amongst some general information about the involved table, such as the snapshot id or the table name, it includes metrics like:
- total scan planning duration
- number of data/delete files included in the result
- number of data/delete manifests scanned/skipped
- number of data/delete files scanned/skipped
- number of equality/positional delete files scanned
CommitReport
A CommitReport
carries metrics being collected after committing changes to a table (aka producing a snapshot). Amongst some general information about the involved table, such as the snapshot id or the table name, it includes metrics like:
- total duration
- number of attempts required for the commit to succeed
- number of added/removed data/delete files
- number of added/removed equality/positional delete files
- number of added/removed equality/positional deletes
Available Metrics Reporters
LoggingMetricsReporter
This is the default metrics reporter when nothing else is configured and its purpose is to log results to the log file. Example output would look as shown below:
INFO org.apache.iceberg.metrics.LoggingMetricsReporter - Received metrics report:
ScanReport{
tableName=scan-planning-with-eq-and-pos-delete-files,
snapshotId=2,
filter=ref(name="data") == "(hash-27fa7cc0)",
schemaId=0,
projectedFieldIds=[1, 2],
projectedFieldNames=[id, data],
scanMetrics=ScanMetricsResult{
totalPlanningDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT0.026569404S, count=1},
resultDataFiles=CounterResult{unit=COUNT, value=1},
resultDeleteFiles=CounterResult{unit=COUNT, value=2},
totalDataManifests=CounterResult{unit=COUNT, value=1},
totalDeleteManifests=CounterResult{unit=COUNT, value=1},
scannedDataManifests=CounterResult{unit=COUNT, value=1},
skippedDataManifests=CounterResult{unit=COUNT, value=0},
totalFileSizeInBytes=CounterResult{unit=BYTES, value=10},
totalDeleteFileSizeInBytes=CounterResult{unit=BYTES, value=20},
skippedDataFiles=CounterResult{unit=COUNT, value=0},
skippedDeleteFiles=CounterResult{unit=COUNT, value=0},
scannedDeleteManifests=CounterResult{unit=COUNT, value=1},
skippedDeleteManifests=CounterResult{unit=COUNT, value=0},
indexedDeleteFiles=CounterResult{unit=COUNT, value=2},
equalityDeleteFiles=CounterResult{unit=COUNT, value=1},
positionalDeleteFiles=CounterResult{unit=COUNT, value=1}},
metadata={
iceberg-version=Apache Iceberg 1.4.0-SNAPSHOT (commit 4868d2823004c8c256a50ea7c25cff94314cc135)}}
INFO org.apache.iceberg.metrics.LoggingMetricsReporter - Received metrics report:
CommitReport{
tableName=scan-planning-with-eq-and-pos-delete-files,
snapshotId=1,
sequenceNumber=1,
operation=append,
commitMetrics=CommitMetricsResult{
totalDuration=TimerResult{timeUnit=NANOSECONDS, totalDuration=PT0.098429626S, count=1},
attempts=CounterResult{unit=COUNT, value=1},
addedDataFiles=CounterResult{unit=COUNT, value=1},
removedDataFiles=null,
totalDataFiles=CounterResult{unit=COUNT, value=1},
addedDeleteFiles=null,
addedEqualityDeleteFiles=null,
addedPositionalDeleteFiles=null,
removedDeleteFiles=null,
removedEqualityDeleteFiles=null,
removedPositionalDeleteFiles=null,
totalDeleteFiles=CounterResult{unit=COUNT, value=0},
addedRecords=CounterResult{unit=COUNT, value=1},
removedRecords=null,
totalRecords=CounterResult{unit=COUNT, value=1},
addedFilesSizeInBytes=CounterResult{unit=BYTES, value=10},
removedFilesSizeInBytes=null,
totalFilesSizeInBytes=CounterResult{unit=BYTES, value=10},
addedPositionalDeletes=null,
removedPositionalDeletes=null,
totalPositionalDeletes=CounterResult{unit=COUNT, value=0},
addedEqualityDeletes=null,
removedEqualityDeletes=null,
totalEqualityDeletes=CounterResult{unit=COUNT, value=0}},
metadata={
iceberg-version=Apache Iceberg 1.4.0-SNAPSHOT (commit 4868d2823004c8c256a50ea7c25cff94314cc135)}}
RESTMetricsReporter
This is the default when using the RESTCatalog
and its purpose is to send metrics to a REST server at the /v1/{prefix}/namespaces/{namespace}/tables/{table}/metrics
endpoint as defined in the REST OpenAPI spec.
Sending metrics via REST can be controlled with the rest-metrics-reporting-enabled
(defaults to true
) property.
Implementing a custom Metrics Reporter
Implementing the MetricsReporter
API gives full flexibility in dealing with incoming MetricsReport
instances. For example, it would be possible to send results to a Prometheus endpoint or any other observability framework/system.
Below is a short example illustrating an InMemoryMetricsReporter
that stores reports in a list and makes them available:
public class InMemoryMetricsReporter implements MetricsReporter {
private List<MetricsReport> metricsReports = Lists.newArrayList();
@Override
public void report(MetricsReport report) {
metricsReports.add(report);
}
public List<MetricsReport> reports() {
return metricsReports;
}
}
Registering a custom Metrics Reporter
Via Catalog Configuration
The catalog property metrics-reporter-impl
allows registering a given MetricsReporter
by specifying its fully-qualified class name, e.g. metrics-reporter-impl=org.apache.iceberg.metrics.InMemoryMetricsReporter
.
Via the Java API during Scan planning
Independently of the MetricsReporter
being registered at the catalog level via the metrics-reporter-impl
property, it is also possible to supply additional reporters during scan planning as shown below:
TableScan tableScan =
table
.newScan()
.metricsReporter(customReporterOne)
.metricsReporter(customReporterTwo);
try (CloseableIterable<FileScanTask> fileScanTasks = tableScan.planFiles()) {
// ...
}