Last updated 2026-05-03

ECS Fargate deploy lifecycle

AI-generated content

This document was generated by an AI assistant. Verify accuracy before relying on the details.

The new infra/ Pulumi project deploys Spring Boot services to AWS ECS Fargate. Each microservice runs as one ECS service, with autoscaling between minTasks and maxTasks (defined per env in infra/src/catalog/services.js). A push to main triggers pulumi up -s staging; a push to the production branch triggers prod deploy with manual approval. PR previews deploy to a dedicated adb-preview AWS account on /deploy PR-comment commands.

At a glance

Fact	Value
Compute	AWS ECS Fargate, capacity providers `FARGATE` + `FARGATE_SPOT` (Spot for non-prod)
Networking	VPC per env (3 AZs), private subnets for tasks, public ALB with HTTPS only, optional WAFv2 (prod)
Image registry	ECR in `adb-shared` AWS account
Image pull	Cross-account via repository policy scoped to the org
Image tag pattern	prod/staging: `<env>-<sha>`. preview: `pr-<n>-<sha>`.
Health check	ALB targets path `/actuator/health`, ECS task-level same
Deploy strategy	rolling, `deploymentMinimumHealthyPercent=50`, `deploymentMaximumPercent=200`
ECS Exec	Enabled (gated by IAM `Debug` permission set)
Logs	CloudWatch log group per service (`/ecs/adb-<env>/<service>`), retention 14 d non-prod / 90 d prod

Details

Stack layout

flowchart TB
    push[Push to main / production]
    gha[GitHub Actions: deploy-staging.yml or deploy-production.yml]
    pulumi[pulumi up -s staging|production]
    ecr[ECR: adb-shared]
    network[VPC + subnets + endpoints]
    cluster[ECS Cluster]
    services[ECS Services x9]
    alb[ALB + listener rules]
    sqs[SQS queues from catalog]

    push --> gha --> pulumi
    pulumi --> network
    pulumi --> cluster
    pulumi --> services
    pulumi --> alb
    pulumi --> sqs
    services -.pulls images.-> ecr
    services -.consumes.-> sqs
    alb -->|/api/<service>/*| services

Deploy paths

Trigger	Workflow	Stack	Approval
Push to `main`	`.github/workflows/deploy-staging.yml`	`staging`	none
Push to `production` branch	`.github/workflows/deploy-production.yml`	`production`	GitHub Environment manual approval
`/deploy` PR comment	`.github/workflows/pr-deploy.yml`	`pr-<N>` (created on demand)	none
`/destroy` PR comment	`.github/workflows/pr-destroy.yml`	`pr-<N>` (destroyed)	none
PR closed	`.github/workflows/pr-destroy.yml`	`pr-<N>` (destroyed)	none
nightly	`.github/workflows/nightly-cleanup.yml`	sweeps `pr-*` stacks whose PR closed >24h	none

Per-service plumbing

For every entry in services.js SERVICES, the env stack creates:

An ECR-scoped task definition with image adb-shared.dkr.ecr.eu-west-3.amazonaws.com/<image>:<env>-<sha>.
A task IAM role with read on its own queues, buckets, and secrets only (least privilege).
An ALB listener rule routing /api/<service>/* to a target group with the service's port.
Autoscaling on CPU 60% target, between sizing[<env>].minTasks and maxTasks.
A CloudWatch log group with the per-env retention.

The whole topology is reproducible: pulumi destroy and pulumi up rebuild it deterministically.

Local mirror

infra/docker/compose.yaml runs the same 9 services + MongoDB replica set + Keycloak + LocalStack (SQS/SNS/S3/Secrets Manager) + Mailpit. The LocalStack init script provisions queues using the same names the catalog produces, so messaging code works locally without changes.

Open questions

Container Insights is enabled but no dashboard is provisioned. Worth adding a per-env CloudWatch dashboard with p95 latency, 5xx rate, queue depth, and DLQ count.
We deploy from images tagged <env>-<sha> but don't currently restrict which SHAs can deploy. A future hardening: signed images + ECR scan-on-push gating.
Spring Boot startup is slow (~30–60 s); healthCheckGracePeriodSeconds on the ECS service may need tuning before the first prod cutover.