4. Metrics

4.1. BookKeeper Server Metrics

These metrics are available on the BookKeeper server.

Health Metrics

Metrics relating to daemon & service health.

Metric Description Abnormalities
rubix.bookkeeper.gauge.live_workers The number of workers currently reporting to the master node. Mismatch with number reported by engine (Presto, Spark, etc.)
rubix.bookkeeper.gauge.caching_validated_workers The number of workers reporting caching validation success. Mismatch with live worker count (one or more workers failed validation)

Cache Metrics

Metrics relating to cache interactions.

Metric Description Abnormalities
rubix.bookkeeper.gauge.cache_size_mb The current size of the local cache in MB. Cache size is bigger than configured capacity
rubix.bookkeeper.gauge.available_cache_size_mb The current disk space available for cache in MB.  
rubix.bookkeeper.count.cache_eviction The number of files removed from the local cache due to size constraints. No cache evictions & cache has exceeded configured capacity
rubix.bookkeeper.count.cache_invalidation The number of files invalidated from the local cache when the source file has been modified.  
rubix.bookkeeper.count.cache_expiry The number of files removed from the local cache once expired.  
rubix.bookkeeper.gauge.cache_hit_rate The percentage of cache hits for the local cache. Cache hit rate near 0%
rubix.bookkeeper.gauge.cache_miss_rate The percentage of cache misses for the local cache. Cache miss rate near 100%
rubix.bookkeeper.count.total_request The total number of requests made to read data.  
rubix.bookkeeper.count.cache_request The number of requests made to read data cached locally. No cache requests made
rubix.bookkeeper.count.nonlocal_request The number of requests made to read data from another node. No non-local requests made
rubix.bookkeeper.count.remote_request The number of requests made to download data from the data store. No remote requests made
rubix.bookkeeper.count.total_async_request The total number of requests made to download data asynchronously.  
rubix.bookkeeper.count.processed_async_request The total number of asynchronous download requests that have already been processed.  
rubix.bookkeeper.gauge.async_queue_size The current number of queued asynchronous download requests. High queue size (requests not being processed)
rubix.bookkeeper.count.async_downloaded_mb The amount of data asynchronously downloaded, in MB. (If there are no cache evictions, this should match cache_size_mb.)  
rubix.bookkeeper.count.async_download_time Total time spent on downloading data in sec  

JVM Metrics

Metrics relating to JVM statistics, supplied by the Dropwizard Metrics metrics-jvm module. (https://metrics.dropwizard.io/3.1.0/manual/jvm/)

Metric Description Abnormalities
rubix.bookkeeper.jvm.gc.* rubix.ldts.jvm.gc.* Metrics relating to garbage collection (GarbageCollectorMetricSet)  
rubix.bookkeeper.jvm.memory.* rubix.ldts.jvm.memory.* Metrics relating to memory usage (MemoryUsageGaugeSet)  
rubix.bookkeeper.jvm.threads.* rubix.ldts.jvm.threads.* Metrics relating to thread states (CachedThreadStatesGaugeSet)  

4.2. Client side Metrics

These metrics are available on the client side i.e. Presto or Spark where the jobs to read data are run.

Client side Metrics is divided into two:

  1. Basic stats: These stats are available under name rubix:name=stats
  2. Detailed stats: These stats are available under name rubix:name=stats,type=detailed

If Rubix is used in embedded mode, an engine specific suffix is added to these names, e.g., Presto adds catalog=<catalog_name> suffix.

Following sections cover the metrics available under both these types in detail.

Basic stats

Metric Description
mb_read_from_cache Data read from cache by the client jobs
mb_read_from_source Data read from Source by the client jobs
cache_hit Cache Hit ratio, between 0 and 1

Detailed Stats

Data unit in all metrics above is MB