4. Metrics¶
4.1. BookKeeper Server Metrics¶
These metrics are available on the BookKeeper server.
Health Metrics
Metrics relating to daemon & service health.
Metric | Description | Abnormalities |
---|---|---|
rubix.bookkeeper.gauge.live_workers | The number of workers currently reporting to the master node. | Mismatch with number reported by engine (Presto, Spark, etc.) |
rubix.bookkeeper.gauge.caching_validated_workers | The number of workers reporting caching validation success. | Mismatch with live worker count (one or more workers failed validation) |
Cache Metrics
Metrics relating to cache interactions.
Metric | Description | Abnormalities |
---|---|---|
rubix.bookkeeper.gauge.cache_size_mb | The current size of the local cache in MB. | Cache size is bigger than configured capacity |
rubix.bookkeeper.gauge.available_cache_size_mb | The current disk space available for cache in MB. | |
rubix.bookkeeper.count.cache_eviction | The number of files removed from the local cache due to size constraints. | No cache evictions & cache has exceeded configured capacity |
rubix.bookkeeper.count.cache_invalidation | The number of files invalidated from the local cache when the source file has been modified. | |
rubix.bookkeeper.count.cache_expiry | The number of files removed from the local cache once expired. | |
rubix.bookkeeper.gauge.cache_hit_rate | The percentage of cache hits for the local cache. | Cache hit rate near 0% |
rubix.bookkeeper.gauge.cache_miss_rate | The percentage of cache misses for the local cache. | Cache miss rate near 100% |
rubix.bookkeeper.count.total_request | The total number of requests made to read data. | |
rubix.bookkeeper.count.cache_request | The number of requests made to read data cached locally. | No cache requests made |
rubix.bookkeeper.count.nonlocal_request | The number of requests made to read data from another node. | No non-local requests made |
rubix.bookkeeper.count.remote_request | The number of requests made to download data from the data store. | No remote requests made |
rubix.bookkeeper.count.total_async_request | The total number of requests made to download data asynchronously. | |
rubix.bookkeeper.count.processed_async_request | The total number of asynchronous download requests that have already been processed. | |
rubix.bookkeeper.gauge.async_queue_size | The current number of queued asynchronous download requests. | High queue size (requests not being processed) |
rubix.bookkeeper.count.async_downloaded_mb | The amount of data asynchronously
downloaded, in MB.
(If there are no cache evictions, this
should match cache_size_mb .) |
|
rubix.bookkeeper.count.async_download_time | Total time spent on downloading data in sec |
JVM Metrics
Metrics relating to JVM statistics, supplied by the Dropwizard Metrics metrics-jvm
module. (https://metrics.dropwizard.io/3.1.0/manual/jvm/)
Metric | Description | Abnormalities |
---|---|---|
rubix.bookkeeper.jvm.gc.* rubix.ldts.jvm.gc.* | Metrics relating to garbage collection (GarbageCollectorMetricSet) | |
rubix.bookkeeper.jvm.memory.* rubix.ldts.jvm.memory.* | Metrics relating to memory usage (MemoryUsageGaugeSet) | |
rubix.bookkeeper.jvm.threads.* rubix.ldts.jvm.threads.* | Metrics relating to thread states (CachedThreadStatesGaugeSet) |
4.2. Client side Metrics¶
These metrics are available on the client side i.e. Presto or Spark where the jobs to read data are run.
Client side Metrics is divided into two:
- Basic stats: These stats are available under name rubix:name=stats
- Detailed stats: These stats are available under name rubix:name=stats,type=detailed
If Rubix is used in embedded mode, an engine specific suffix is added to these names, e.g., Presto adds catalog=<catalog_name> suffix.
Following sections cover the metrics available under both these types in detail.
Basic stats
Metric | Description |
---|---|
mb_read_from_cache | Data read from cache by the client jobs |
mb_read_from_source | Data read from Source by the client jobs |
cache_hit | Cache Hit ratio, between 0 and 1 |
Detailed Stats
Data unit in all metrics above is MB