Skip to main content

3. Mutual Transport Layer Security between services

Date: 2025-03-13

Status

Accepted

Context

We are currently preparing for a DORA audit, as well as working towards SOC2. DORA mandates "the encryption of data at rest and in transit" Article 6 point 2(a).

We also need to be sure to carefully manage cryptographic keys DORA states "Financial entities shall include in the cryptographic key management policy referred to in Article 6(2), point (d), requirements for managing cryptographic keys through their whole lifecycle, including generating, renewing, storing, backing up, archiving, retrieving, transmitting, retiring, revoking, and destroying those cryptographic keys." Article 7 point 1.

Whilst DORA only requires encryption, SOC2 requires encryption and authentication as noted in CC6.1 – Logical Access Security.

We are currently using HTTPS for all of our external facing services, and we do this via AWS ACM which manages our certificates. However, between services we communicate via plain text over TCP sockets mostly. We need to encrypt this traffic to comply with DORA and SOC2.

There are multiple ways to do this technically.

Simple shared passphrase between services

however this would not fulfil SOC2, and it would require some manual intervention to rotate the passphrase.

mTLS with Self signed certificates

We could use some self signed certificates between services too, but this would require us to manage the certificate authority and rotate these manually and securely manage the private key, expanding the DORA requirements significantly.

mTLS with AWS ACM Private CA with cert manager

This means that AWS ACM PA manages our certificates and private keys, and we can use AWS ACM to manage the lifecycle of these certificates. Cert manager will then issue and sign certificates against AWS ACM PA. This would be the most secure and easiest to manage, but it would require us to use AWS ACM PA which is a paid service at $400 PCM per CA (one per account).

mTLS deployment options

We are assuming this is a unique CA per env.

Service mesh

We could use a service mesh like istio, but this would be overkill for our current needs, it is very complex to configure and manage, and we do not have the headcount to accommodate this. Istio manages the certificates for you and rotates them automatically, however you still need to supply a CA to istio to sign the certificates.

AWS ACM PA + cert manager

We could do this with AWS ACM Private CA and cert manager, this is simpler to manage operationally but moves the complexity to the application and configuration code. We would have to use AWS ACM PA to move our certs our of the DORA scope anyway, so this could be the most logical choice.

mTLS certificate options

Client and server certificate per service pair

This means that each service has a client certificate created and a server certificate. The server serves with that certificate, and the other service use the services client certificate to connect. The server application code manually checks that the client subject alternative name matches the server subject alternative name as a 1:1 relationship.

This means we have to potentially use multiple different certificates in each pod, one per rpc client. But we can easily setup who can talk to who in configuration. Additionally, matching san to san means we can ensure the wrong services don't talk to each other.

Server certificates with client certificate SAN Whitelisting

Each service gets a client certificate, and the server service gets a server certificate. Each service uses a single client certificate to connect to all services. The server service then whitelists the client certificates based on their subject alternative name that are allowed to connect, this is done in code.

This means that the server has to manage the whitelist in code via a config flag. There are then no race conditions for services starting, as the client and server certs are made in the helm chart, so it should just work. This could still mean that a client could call the wrong destination service, and the server has no idea of the hostname so it wont reject it.

One shared certificate per env

This is the least secure, as all services share the same certificate, and if one service is compromised, all services are compromised. This is the easiest to implement as all services use the same certificate, but the least secure.

Decision

Use mTLS with AWS ACM Private CA and cert manager to manage the certificates and private keys between services. We will use Server certificates with client certificate SAN Whitelisting, and we will use the AWS ACM PA to manage the lifecycle of these certificates. They will be rotated automatically by cert manager every 180 days. We will need to ensure that when a service starts it configures the rpc tls client with the correct client cert for the service it plans to contact. This is quite a common approach, to share certificates per workload then push authentication to the application layer.

Consequences

  • spend $400 PCM on AWS ACM PA
  • all services will need to be updated to use mTLS
  • deploy cert manager
  • add crd to each service to request a certificate from cert manager for the services dns name
  • mount the secrets created by cert manager into the services
  • update the services to use the client and server certificates
  • every service mtls client needs to get the correct certs