How to Build GDPR-Compliant AI Pipelines on AWS in Europe

You want to build AI-powered features for your European customers. You need to use LLMs, vector databases, and ML pipelines. But your data can’t leave the EU, and your compliance team needs documentation proving it.

Here’s how to architect GDPR-compliant AI pipelines on AWS, with everything running in the EU – specifically AWS Frankfurt (eu-central-1).

Why eu-central-1 (Frankfurt)

AWS Frankfurt is the most mature EU region for AI workloads:

Located in Germany – one of the strictest GDPR enforcement jurisdictions
Full service availability – SageMaker, Bedrock, Lambda, ECS, RDS, OpenSearch all available
AWS Bedrock – access to Claude, Llama, and other models with data processing agreements that cover EU data
Dedicated infrastructure – data physically resides in Frankfurt data centres

Alternative EU regions: eu-west-1 (Ireland), eu-west-2 (London), eu-south-1 (Milan), eu-north-1 (Stockholm). Frankfurt is preferred because German data protection authorities are the most active enforcers, so compliance with their standards typically satisfies all EU DPAs.

Architecture Overview

A typical GDPR-compliant AI pipeline on AWS eu-central-1:

User Request
  → API Gateway (eu-central-1)
    → Lambda / ECS (PII detection & anonymisation)
      → Bedrock / SageMaker (LLM inference)
        → Response de-anonymisation
          → DynamoDB / RDS (audit logging)
            → Response to user

Every component runs in eu-central-1. No data crosses regional boundaries.

Step 1: Lock Down the Region

Before writing any application code, enforce region restrictions at the AWS account level.

Service Control Policies (SCPs)

Create an SCP that prevents any service from launching outside EU regions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyNonEURegions",
      "Effect": "Deny",
      "Action": "*",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:RequestedRegion": [
            "eu-central-1",
            "eu-west-1",
            "eu-west-2",
            "eu-north-1",
            "eu-south-1"
          ]
        }
      }
    }
  ]
}

This ensures that even if someone accidentally configures a service in us-east-1, the request is denied. This is your first line of defence for data residency.

VPC Configuration

Create your VPC in eu-central-1
Use VPC endpoints for AWS services (S3, Bedrock, DynamoDB) to keep traffic on the AWS private network
Enable VPC Flow Logs for audit purposes

Step 2: PII Detection and Anonymisation

Before any data reaches an LLM, strip personally identifiable information.

Using Amazon Comprehend

Amazon Comprehend’s PII detection is available in eu-central-1 and can identify:

Names, addresses, phone numbers
Email addresses, credit card numbers
Dates of birth, SSNs, passport numbers

Custom Anonymisation Layer

For domain-specific PII (patient IDs, internal employee codes, custom identifiers), build a custom anonymisation service:

Detect – Use Comprehend + custom regex patterns to identify PII
Tokenise – Replace each PII entity with a unique token (e.g., [PERSON_001])
Store the mapping – Keep the token-to-PII mapping in an encrypted DynamoDB table (also in eu-central-1)
Send anonymised text to the LLM
De-tokenise – Replace tokens with original PII in the response

The mapping table should have a TTL (time-to-live) aligned with your data retention policy – typically 30-90 days for processing purposes.

Step 3: LLM Inference in the EU

Option A: AWS Bedrock (Recommended)

AWS Bedrock runs in eu-central-1 and provides access to:

Anthropic Claude – Sonnet, Haiku
Meta Llama – 3.1, 3.2
Amazon Titan – Text, embeddings

Key compliance features:

Data stays in-region – inference happens in Frankfurt
No model training on your data – Bedrock does not use customer data to train models
AWS BAA available – Business Associate Agreement for healthcare workloads
DPA included – AWS’s Data Processing Addendum covers GDPR requirements

Option B: Self-Hosted Models on SageMaker

For maximum control, deploy open-source models on SageMaker endpoints in eu-central-1:

Llama 3.1 70B – Strong general-purpose model
Mistral Large – European-built, strong multilingual capabilities
Domain fine-tuned models – Your own models trained on your data

Benefits: complete data isolation, no third-party processing, full control over model versions and updates.

Trade-off: higher cost (GPU instances), operational overhead, potentially lower capabilities than frontier APIs.

Option C: Zero-Retention API Configuration

If you must use external LLM APIs (OpenAI, Anthropic direct), configure zero-retention:

Anthropic: Enterprise plan with zero-retention DPA
OpenAI: API data usage policy (opt-out of training) + DPA

Ensure your API calls route through your EU VPC – use a proxy Lambda to log requests and enforce anonymisation before data leaves your infrastructure.

Step 4: Vector Databases for RAG

If you’re building retrieval-augmented generation pipelines, your vector database must also reside in the EU.

Amazon OpenSearch Serverless (eu-central-1)

Native vector search support
Serverless – no infrastructure management
Encryption at rest and in transit by default
Fine-grained access control with IAM

Amazon RDS for PostgreSQL + pgvector

Run PostgreSQL with the pgvector extension in eu-central-1
Full SQL capabilities alongside vector search
Familiar tooling for most engineering teams

Self-Hosted Options

Weaviate on EKS in eu-central-1
Qdrant on ECS in eu-central-1
Pinecone – check their EU region availability (currently limited)

Step 5: Encryption Everywhere

GDPR doesn’t explicitly require encryption, but it’s listed as an appropriate technical measure under Article 32. For AI pipelines handling personal data, implement:

At Rest

S3: SSE-S3 or SSE-KMS with customer-managed keys
DynamoDB: Encryption enabled by default, use KMS for customer-managed keys
RDS: Encrypted storage volumes + encrypted snapshots
SageMaker: Encrypted model artefacts and training data

In Transit

TLS 1.2+ on all API endpoints
VPC endpoints to keep traffic off the public internet
Certificate pinning for service-to-service communication

Key Management

Use AWS KMS in eu-central-1 for all encryption keys
Keys never leave the region
Enable key rotation
Audit all key usage via CloudTrail

Step 6: Audit Logging

GDPR’s accountability principle requires you to demonstrate compliance. Build comprehensive logging:

What to Log

Every AI inference request (timestamp, user ID, anonymised input, model used)
PII detection results (what was found and anonymised)
Access to personal data (who accessed what, when)
Data deletion events (right-to-erasure fulfilment)
Model version changes

Where to Log

CloudTrail – API-level audit trail for all AWS actions
CloudWatch Logs – Application-level logging in eu-central-1
DynamoDB – Structured audit records with TTL for retention management
S3 – Long-term audit archive with lifecycle policies

Retention

Align log retention with your GDPR data retention policy. Typically:

Operational logs: 30-90 days
Audit logs: 12-24 months
Compliance documentation: duration of processing + 3 years

Step 7: Right to Erasure (Article 17)

Your AI pipeline must support data deletion requests. This means:

Index all personal data – Know exactly where each person’s data lives across your pipeline
Purge from vector databases – Delete embeddings derived from the individual’s data
Clear anonymisation mappings – Delete token-to-PII mappings
Remove from logs – Redact or delete personal data from audit logs (keep anonymised records)
Confirm deletion – Document what was deleted and when

Cost Considerations

Running AI pipelines in eu-central-1 typically costs 5-10% more than us-east-1 due to regional pricing. For a typical enterprise workload:

Component	Monthly Estimate
Bedrock (Claude Sonnet, 1M tokens/day)	€2,000-4,000
SageMaker endpoint (Llama 70B, ml.g5.12xlarge)	€5,000-8,000
OpenSearch Serverless (vector store)	€500-1,500
Lambda + API Gateway	€200-500
DynamoDB (audit logs)	€100-300
Total	€3,000-14,000/mo

The cost premium for EU hosting is negligible compared to the cost of GDPR non-compliance (up to 4% of global annual turnover).

Common Pitfalls

1. Using a global CDN that caches personal data outside the EU

Ensure CloudFront distributions are configured with EU-only edge locations, or don’t cache responses containing personal data.

2. Forgetting about CloudWatch cross-region replication

Disable any cross-region log replication that might copy personal data outside EU regions.

3. Using third-party AI APIs without DPAs

Every external service that touches personal data needs a Data Processing Agreement. This includes vector database SaaS providers, embedding APIs, and evaluation tools.

4. Not accounting for model updates

When AWS updates Bedrock models, your system’s behaviour changes. Log model versions with every inference for audit traceability.

Next Steps

Building GDPR-compliant AI pipelines requires careful architecture from day one – retrofitting compliance into an existing pipeline is significantly more expensive and error-prone.

For a deeper dive into GDPR and AI, read our comprehensive guide to GenAI and GDPR compliance. To understand the broader regulatory landscape including the EU AI Act, see our EU AI Act compliance checklist.

See how we apply this in specific industries:

AI pipelines for healthcare — Patient data pipelines with GDPR special category handling
AI pipelines for SaaS — EU-hosted infrastructure for multi-tenant SaaS products

At HASORIX, we build compliant AI systems for European enterprises – from architecture to deployment to documentation. Talk to us about your AI pipeline.