Baseline Logging Standards
This baseline aims to document the types of logging teams must, should and could have in place to cover common security scenarios.
Principles
This list is generalised. You can and should adjust to match the needs of your service, its data and its threat model.
All containers should log JSON according to our ADR for format (including startup messages).
Logging Baseline
You MUST have
These apply to all services.
| Log Type | Sub Types | Metrics enabled by logs | Covered Scenarios |
|---|---|---|---|
| Cloudtrail Logs / CloudTrail Events |
|
|
|
| Container HTTP Traffic |
|
|
|
| ALB logging (with Athena setup) | Standard ALB metrics |
|
|
| Authentication Audit |
|
|
|
| Network related |
|
|
|
| Container CPU or memory (Cloudwatch Container Insights log group) | Baseline tier |
|
|
| Application data activity event logs |
|
Special interest data/case access alerts |
|
| WAF traffic blocking anomalies | Blocked requests | Percentage above service specific baseline |
|
| DNS Queries (route53 query logs) | Requested DNS |
|
You SHOULD have
These will apply to most services
| Log Type | Sub Types | Metrics enabled by logs | Covered Scenarios |
|---|---|---|---|
| External auth system telemetry (from a provider like Entra/OneLogin if you use them) |
|
|
|
| High DB volume changes alerts |
|
||
| App-level user audit changes |
|
||
| Activity for app-level users |
|
Unusual number of logins from abnormal location or IP Access outside normal usage hours |
|
| Cost anomalies | Alert on increased resource based costs |
|
You COULD decide to have
These may not apply to all services
| Log Type | Sub Types | Metrics enabled by logs | Covered Scenarios |
|---|---|---|---|
|
|
I.e. Data Read and Operator access for audit, not app level |
|
|
| User/Account searches | People doing user search | High volume of user searched | A compromised user account is trying to get access to all other users information to compromise beyond original account (i.e. via user admin area in a back office app) |
Full List of scenarios
These are the scenarios initial log baseline was created to cover.
Scanning and reconnaissance
- A hostile actor is scanning your service for common vulnerabilities
- A hostile actor is trying to brute force your login page will a million passwords (spray attack)
- A hostile actor is using a list of passwords from a known breach against your service
- A hostile actor is sending malformed payloads in an attempt to break processing rules
Exfiltration
- A container in your application is compromised and a remote shell installed
- Data in your account is being exfiltrated via HTTP from a running container
- Data is being exfiltrated from your AWS account via DNS tunnelling from a container
- Data is being exfiltrated via NTP tunneling from a container
- A data logging tool is attempted to phone home after a dependency was compromised
- A user is downloading a larger number of records than normal or records they would not normally access
- A leaked infra admin account is downloading a database backup
Persistence
- An admin users account is compromised and the threat actor creates a number of new accounts to hide this
- A compromised dependency has installed a bitcoin miner
- External software is being downloaded and installed on a container in your app
- A compromised user account is trying to get access to all other users information to compromise beyond original account (i.e. via user admin area)
- An EC2 instance is spun up in an unusual location
- An admin user accesses the application at an unusual time
Ransomware
- An infra admin account is compromised and is trying to encrypt your data.
- Your S3 bucket KMS key is being deleted
- An S3 bucket is deleted outside a pipeline
Other Notes
AWS provides log anomaly detection, but this only really works on application logs. See AWS Log Anomaly Detection
We considered container process anomaly detection, checking only known processes should be running in containers. However that may be better handled by other tools like fixed file systems etc. Not included here for now, but we may revisit.
When changing logging or updating types of logging, check what is logged out by the change in a development environment before you update. For example, check PII or credentials not leaking into logs due to changes in configuration. Particularly where you are trying to meet a security checklist.
Baseline for WAF anomalies
When you are alerting for unusual WAF activity, the following is a good starting point (taken from Sirius and previous discussions with security colleagues)
~ 20 blocked requests in ten minutes, not a significant threat.
~ 100 blocked requests in ten minutes, investigate and see if it’s actually breaching, and if it’s from 1-2 IP’s
~ 500+ requests in ten minutes from multiple IP’s investigate and escalate.
Baseline for failed Login Attempt alerting
Dependent on service. TIme frame based on user cohort. I.e. X% of users within a window based on your known user behaviour.