ADR-009 Log structure
Date: 14-02-2024
Status
🤔 Proposed
Context
Regardless of platform or language, our services write logging information to the output stream. However, this logging information is inconsitently structured. This leads to a few problems:
- It is hard to interpret logs under standard operation, as logs from different containers with different structures are interspersed
- During an incident, it is hard to find related logs as the inconsistent structures require separate queries
- Our log-based alarms are complex to write and update as they have to query several different structures
Decision
We will define a standard logging structure that all teams will use. Teams are welcome to expand on the structure by adding extra fields, but must include all required fields in each log message and must ensure not to use reserved fields for anything other than their defined usage.
All logs must be formatted as JSON objects.
An example of a minimal log message:
{"time": "2024-02-14T12:34:23Z", "level": "CRITICAL", "msg": "Null pointer exception", "service_name": "opg-example"}
An example of a more complex log message:
{
"time": "2024-02-14T13:39:01Z",
"level": "INFO",
"msg": "User permissions updated",
"service_name": "opg-example",
"trace_id": "1-581cf771-a006649127e371903a2de979",
"request": { "method": "PUT", "path": "/user/133/permissions" },
"location": { "file": "pages/user/edit_permission.go", "line": 156 },
"actor_id": 48
}
As well as the required and reserved fields below, log messages can include any other fields (such as “location” and “actor_id” above).
Required fields
time
must be an RFC3339-formatted date/time.
level
must be an RFC5424-defined log level in uppercase: EMERGENCY
, ALERT
, CRITICAL
, ERROR
, WARNING
, NOTICE
, INFO
, DEBUG
. Services don’t need to use every log level, but alarms and stored queries should account for them all. Whilst all log levels can be used in code, the minimum logging level should be INFO
in non-dev environments.
msg
is the free form message.
service_name
is the name of the container or lambda outputting the log.
Reserved fields
request
, if provided, must be an object containing a method
property of the request’s HTTP method, and a path
property of the requested path. Additional properties are permitted.
trace_id
, if provided, must be a string containing the AWS X-Ray trace ID of the orgininating request.
Consequences
Reading logs interspersed between different services and containers will be easier as they all follow the same structure. We will be able to write common queries and alarms that work across all of our services.
Each team will need to review their current logging instrumentation to ensure it matches this structure.