4.1. BookKeeper Server Metrics¶
These metrics are available on the BookKeeper server.
Metrics relating to daemon & service health.
|rubix.bookkeeper.gauge.live_workers||The number of workers currently reporting to the master node.||Mismatch with number reported by engine (Presto, Spark, etc.)|
|rubix.bookkeeper.gauge.caching_validated_workers||The number of workers reporting caching validation success.||Mismatch with live worker count (one or more workers failed validation)|
Metrics relating to cache interactions.
|rubix.bookkeeper.gauge.cache_size_mb||The current size of the local cache in MB.||Cache size is bigger than configured capacity|
|rubix.bookkeeper.gauge.available_cache_size_mb||The current disk space available for cache in MB.|
|rubix.bookkeeper.count.cache_eviction||The number of files removed from the local cache due to size constraints.||No cache evictions & cache has exceeded configured capacity|
|rubix.bookkeeper.count.cache_invalidation||The number of files invalidated from the local cache when the source file has been modified.|
|rubix.bookkeeper.count.cache_expiry||The number of files removed from the local cache once expired.|
|rubix.bookkeeper.gauge.cache_hit_rate||The percentage of cache hits for the local cache.||Cache hit rate near 0%|
|rubix.bookkeeper.gauge.cache_miss_rate||The percentage of cache misses for the local cache.||Cache miss rate near 100%|
|rubix.bookkeeper.count.total_request||The total number of requests made to read data.|
|rubix.bookkeeper.count.cache_request||The number of requests made to read data cached locally.||No cache requests made|
|rubix.bookkeeper.count.nonlocal_request||The number of requests made to read data from another node.||No non-local requests made|
|rubix.bookkeeper.count.remote_request||The number of requests made to download data from the data store.||No remote requests made|
|rubix.bookkeeper.count.total_async_request||The total number of requests made to download data asynchronously.|
|rubix.bookkeeper.count.processed_async_request||The total number of asynchronous download requests that have already been processed.|
|rubix.bookkeeper.gauge.async_queue_size||The current number of queued asynchronous download requests.||High queue size (requests not being processed)|
|rubix.bookkeeper.count.async_downloaded_mb||The amount of data asynchronously
downloaded, in MB.
(If there are no cache evictions, this
|rubix.bookkeeper.count.async_download_time||Total time spent on downloading data in sec|
Metrics relating to JVM statistics, supplied by the Dropwizard Metrics
metrics-jvm module. (https://metrics.dropwizard.io/3.1.0/manual/jvm/)
|rubix.bookkeeper.jvm.gc.* rubix.ldts.jvm.gc.*||Metrics relating to garbage collection (GarbageCollectorMetricSet)|
|rubix.bookkeeper.jvm.memory.* rubix.ldts.jvm.memory.*||Metrics relating to memory usage (MemoryUsageGaugeSet)|
|rubix.bookkeeper.jvm.threads.* rubix.ldts.jvm.threads.*||Metrics relating to thread states (CachedThreadStatesGaugeSet)|
4.2. Client side Metrics¶
These metrics are available on the client side i.e. Presto or Spark where the jobs to read data are run.
Client side Metrics is divided into two:
- Basic stats: These stats are available under name rubix:name=stats
- Detailed stats: These stats are available under name rubix:name=stats,type=detailed
If Rubix is used in embedded mode, an engine specific suffix is added to these names, e.g., Presto adds catalog=<catalog_name> suffix.
Following sections cover the metrics available under both these types in detail.
|mb_read_from_cache||Data read from cache by the client jobs|
|mb_read_from_source||Data read from Source by the client jobs|
|cache_hit||Cache Hit ratio, between 0 and 1|
Data unit in all metrics above is MB