Operations Basics / Version 2207
Table Of ContentsThe following terms are used within this chapter:
- Alert
Automated alerts draw human attention to a particular system if a problem has been identified which requires human interaction. Alerts are often reported via email or messaging systems.
Typical Alerts are configured based on states, thresholds or trends.
Ideally monitoring does not raise false positive alerts as important alerts may be overseen if the noise is too high. Because of this many alerts are configured with some grace period between the detection of a problem and triggering an alert.
- Attribute
Typically, attributes either refer to configuration or to a service state. Examples for attributes are configured JDBC URLs, feature flags or the current runlevel of a server.
Alerts on configuration attributes typically signal a misconfiguration of the system. Alerts on state attributes are for example triggered, if a monitored service does not reach a desired state after start.
All Boolean values are attributes, as they either represent a configuration or a state.
- Counter
A specific metric with a value which may increase over time. Examples for counters are an uptime in seconds or the number of received events.
Alerts on counters typically signal an imminent overflow and will typically not vanish without administrative intervention. Alerts are typically configured, so that they raise an alarm some time before the actual overflow happens.
Other possible alerts monitor a given time span and raise an alarm when either nothing happened for a long time or the counter suddenly increases drastically.
- Gauge
A specific metric with a value which may go up and down over time. Examples for gauges are memory usage, number of pending events or current cache size.
Alerts on gauges typically signal an overload of the system or if expected load is missing. They may vanish without administrative intervention. Typical alerts on gauges add some grace period before an alarm is raised.
- Metric
A metric is a measurable value which may change over time. It is either a counter which will increase over time or a gauge which may go up and down over time. Typical examples are event counters or resource consumption.
A metric typically is expressed in a given unit and a distance can be defined between two values.
Regarding Boolean values similar definitions apply. A system configuration is an attribute, while a Boolean value signaling some state which may change back and forth over time is a gauge.