Skip to main content

ADR-009 Log structure

Date: 14-02-2024

Status

🤔 Proposed

Context

Regardless of platform or language, our services write logging information to the output stream. However, this logging information is inconsitently structured. This leads to a few problems:

  1. It is hard to interpret logs under standard operation, as logs from different containers with different structures are interspersed
  2. During an incident, it is hard to find related logs as the inconsistent structures require separate queries
  3. Our log-based alarms are complex to write and update as they have to query several different structures

Decision

We will define a standard logging structure that all teams will use. Teams are welcome to expand on the structure by adding extra fields, but must include all required fields in each log message and must ensure not to use reserved fields for anything other than their defined usage.

All logs must be formatted as JSON objects.

An example of a minimal log message:

{"time": "2024-02-14T12:34:23Z", "level": "CRITICAL", "msg": "Null pointer exception", "service_name": "opg-example"}

An example of a more complex log message:

{
  "time": "2024-02-14T13:39:01Z",
  "level": "INFO",
  "msg": "User permissions updated",
  "service_name": "opg-example",
  "trace_id": "1-581cf771-a006649127e371903a2de979",
  "request": { "method": "PUT", "path": "/user/133/permissions" },
  "location": { "file": "pages/user/edit_permission.go", "line": 156 },
  "actor_id": 48
}

As well as the required and reserved fields below, log messages can include any other fields (such as “location” and “actor_id” above).

Required fields

time must be an RFC3339-formatted date/time.

level must be an RFC5424-defined log level in uppercase: EMERGENCY, ALERT, CRITICAL, ERROR, WARNING, NOTICE, INFO, DEBUG. Services don’t need to use every log level, but alarms and stored queries should account for them all. Whilst all log levels can be used in code, the minimum logging level should be INFO in non-dev environments.

msg is the free form message.

service_name is the name of the container or lambda outputting the log.

Reserved fields

request, if provided, must be an object containing a method property of the request’s HTTP method, and a path property of the requested path. Additional properties are permitted.

trace_id, if provided, must be a string containing the AWS X-Ray trace ID of the orgininating request.

Consequences

Reading logs interspersed between different services and containers will be easier as they all follow the same structure. We will be able to write common queries and alarms that work across all of our services.

Each team will need to review their current logging instrumentation to ensure it matches this structure.

This page was last reviewed on 9 February 2024. It needs to be reviewed again on 9 February 2025 by the page owner #opg-webops-community .
This page was set to be reviewed before 9 February 2025 by the page owner #opg-webops-community. This might mean the content is out of date.