Understanding metrics

This topic covers the definition and structure of metrics in Prometheus, including detailed explanations of metric names, types (such as counters and gauges), and the use of labels for contextualizing data, along with practical examples relevant to the Genero Application Server (GAS).

What are metrics?

Metrics are quantitative measurements that provide insights into the performance and behavior of applications running on the GAS. By gathering these metrics, you can monitor various aspects such as response times, error rates, and resource utilization, enabling you to maintain optimal performance and troubleshoot issues.

Supported metrics

The GAS provides a range of built-in, supported metrics that are automatically collected and exposed through the Prometheus integration. These metrics cover key operational aspects of the GAS, including request handling, resource utilization, and performance monitoring. For a comprehensive list of all supported metrics and their descriptions, refer to the Supported metrics section.

Defining metrics

The BDL Prometheus API allows you to define your own metrics in your application code. For more information on using the API, refer to the The prometheus package in the Genero Business Development Language User Guide. It is essential to use clear and descriptive names, choose the appropriate metric type, and utilize labels effectively to provide context.

This structured approach enables better monitoring, analysis, and troubleshooting of applications running on the GAS. When defining a metric in Prometheus, there are several key components to consider: the metric name, the metric type, and the labels. Each of these elements plays a crucial role in how metrics are collected, stored, and queried.

Metric name:
The metric name is a unique identifier for the metric being collected. It should be descriptive enough to convey what the metric measures. In the Supported metrics section, which features built-in metrics, we have implemented a naming convention that follows a pattern including a prefix indicating the source or application (for example, "gas", "gip", "flm", "oidc", and "saml"), followed by a description of the metric.
In this example, fourjs_gas_request_count_total indicates that it tracks the total count of requests handled by the GAS. The suffix _total is commonly used in Prometheus to denote a cumulative counter metric.
Metric types
Prometheus supports several metric types. We use the following:
- Counter: A cumulative metric that represents a single numerical value that only increases (for example, total requests, timeout errors).
- Gauge: A metric that can go up and down (for example, current memory usage, number of DVMs running).
- Histogram: A metric that samples observations and counts them in configurable ranges, known as "buckets", that categorize the observed values. Each bucket counts the number of observations within its range, such as response times falling below 100 ms, 500 ms, and 1 second (for example, request duration).
Labels:
Labels are key-value pairs that provide additional context to the metric. They allow for more granular filtering and aggregation of metrics. Labels can represent various dimensions, such as application or service name, endpoint, HTTP method, or response status. Label names are all in lowercase, and are case sensitive.
- Example of a counter metric:
```
fourjs_gas_url_count_total{application="demo",url="/ua/r",status="200",verb="GET"} 4
```
  - Metric Name: fourjs_gas_url_count_total indicates that this is a counter metric measuring the total number of requests to a specific URL.
  - Labels:
    - application="demo": Specifies the application name, which helps differentiate metrics from different applications.
    - url="/ua/r": Indicates the specific URL endpoint being measured, allowing for analysis of request counts per endpoint.
    - status="200": Represents the HTTP status code returned by the request, providing insight into the success of the requests.
    - verb="GET": Indicates the HTTP method used for the request, which can help in analyzing traffic patterns based on request types.
  - Value: 4 represents the total count of requests that have been made to the specified URL with the given labels. As a counter, this value will only increase over time as more requests are made.
- Example of gauge metric:
```
fourjs_gas_memory_heap_bytes{name="dispatcher"} 256000
```
  - Metric Name: fourjs_gas_memory_heap_bytes indicates that this is a gauge metric measuring the heap memory usage in bytes.
  - Label: name="dispatcher": Specifies the name of the component or service being measured, in this case, the "dispatcher." This label allows for differentiation between various components within the GAS.
  - Value: 256000 represents the current heap memory usage in bytes at the time of the measurement. Since gauges can increase or decrease, this value can change over time based on the GAS's memory usage.
- Example of a histogram metric:
```
fourjs_gas_application_request_duration_seconds_bucket{application="demo",url="/ua/r",le="0.005"}
```
  - Metric name: fourjs_gas_application_request_duration_seconds_bucket indicates that this is a histogram metric for application request duration.
  - Labels:
    
    application="demo": Specifies the application name, which helps differentiate metrics from different applications.
    
    url="/ua/r": Indicates the URL endpoint of an application request being measured, allowing for analysis of performance per endpoint.
    
    le="0.005": Represents the upper bound of the bucket (in seconds) for the histogram, indicating that this bucket counts requests that took less than or equal to 0.005 seconds, which is equivalent to 5 milliseconds.