You want to build AI-powered features for your European customers. You need to use LLMs, vector databases, and ML pipelines. But your data can’t leave the EU, and your compliance team needs documentation proving it.
Here’s how to architect GDPR-compliant AI pipelines on AWS, with everything running in the EU – specifically AWS Frankfurt (eu-central-1).
Why eu-central-1 (Frankfurt)
AWS Frankfurt is the most mature EU region for AI workloads:
- Located in Germany – one of the strictest GDPR enforcement jurisdictions
- Full service availability – SageMaker, Bedrock, Lambda, ECS, RDS, OpenSearch all available
- AWS Bedrock – access to Claude, Llama, and other models with data processing agreements that cover EU data
- Dedicated infrastructure – data physically resides in Frankfurt data centres
Alternative EU regions: eu-west-1 (Ireland), eu-west-2 (London), eu-south-1 (Milan), eu-north-1 (Stockholm). Frankfurt is preferred because German data protection authorities are the most active enforcers, so compliance with their standards typically satisfies all EU DPAs.
Architecture Overview
A typical GDPR-compliant AI pipeline on AWS eu-central-1:
User Request
→ API Gateway (eu-central-1)
→ Lambda / ECS (PII detection & anonymisation)
→ Bedrock / SageMaker (LLM inference)
→ Response de-anonymisation
→ DynamoDB / RDS (audit logging)
→ Response to user
Every component runs in eu-central-1. No data crosses regional boundaries.
Step 1: Lock Down the Region
Before writing any application code, enforce region restrictions at the AWS account level.
Service Control Policies (SCPs)
Create an SCP that prevents any service from launching outside EU regions:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "DenyNonEURegions",
"Effect": "Deny",
"Action": "*",
"Resource": "*",
"Condition": {
"StringNotEquals": {
"aws:RequestedRegion": [
"eu-central-1",
"eu-west-1",
"eu-west-2",
"eu-north-1",
"eu-south-1"
]
}
}
}
]
}
This ensures that even if someone accidentally configures a service in us-east-1, the request is denied. This is your first line of defence for data residency.
VPC Configuration
- Create your VPC in eu-central-1
- Use VPC endpoints for AWS services (S3, Bedrock, DynamoDB) to keep traffic on the AWS private network
- Enable VPC Flow Logs for audit purposes
Step 2: PII Detection and Anonymisation
Before any data reaches an LLM, strip personally identifiable information.
Using Amazon Comprehend
Amazon Comprehend’s PII detection is available in eu-central-1 and can identify:
- Names, addresses, phone numbers
- Email addresses, credit card numbers
- Dates of birth, SSNs, passport numbers
Custom Anonymisation Layer
For domain-specific PII (patient IDs, internal employee codes, custom identifiers), build a custom anonymisation service:
- Detect – Use Comprehend + custom regex patterns to identify PII
- Tokenise – Replace each PII entity with a unique token (e.g.,
[PERSON_001]) - Store the mapping – Keep the token-to-PII mapping in an encrypted DynamoDB table (also in eu-central-1)
- Send anonymised text to the LLM
- De-tokenise – Replace tokens with original PII in the response
The mapping table should have a TTL (time-to-live) aligned with your data retention policy – typically 30-90 days for processing purposes.
Step 3: LLM Inference in the EU
Option A: AWS Bedrock (Recommended)
AWS Bedrock runs in eu-central-1 and provides access to:
- Anthropic Claude – Sonnet, Haiku
- Meta Llama – 3.1, 3.2
- Amazon Titan – Text, embeddings
Key compliance features:
- Data stays in-region – inference happens in Frankfurt
- No model training on your data – Bedrock does not use customer data to train models
- AWS BAA available – Business Associate Agreement for healthcare workloads
- DPA included – AWS’s Data Processing Addendum covers GDPR requirements
Option B: Self-Hosted Models on SageMaker
For maximum control, deploy open-source models on SageMaker endpoints in eu-central-1:
- Llama 3.1 70B – Strong general-purpose model
- Mistral Large – European-built, strong multilingual capabilities
- Domain fine-tuned models – Your own models trained on your data
Benefits: complete data isolation, no third-party processing, full control over model versions and updates.
Trade-off: higher cost (GPU instances), operational overhead, potentially lower capabilities than frontier APIs.
Option C: Zero-Retention API Configuration
If you must use external LLM APIs (OpenAI, Anthropic direct), configure zero-retention:
- Anthropic: Enterprise plan with zero-retention DPA
- OpenAI: API data usage policy (opt-out of training) + DPA
Ensure your API calls route through your EU VPC – use a proxy Lambda to log requests and enforce anonymisation before data leaves your infrastructure.
Step 4: Vector Databases for RAG
If you’re building retrieval-augmented generation pipelines, your vector database must also reside in the EU.
Amazon OpenSearch Serverless (eu-central-1)
- Native vector search support
- Serverless – no infrastructure management
- Encryption at rest and in transit by default
- Fine-grained access control with IAM
Amazon RDS for PostgreSQL + pgvector
- Run PostgreSQL with the pgvector extension in eu-central-1
- Full SQL capabilities alongside vector search
- Familiar tooling for most engineering teams
Self-Hosted Options
- Weaviate on EKS in eu-central-1
- Qdrant on ECS in eu-central-1
- Pinecone – check their EU region availability (currently limited)
Step 5: Encryption Everywhere
GDPR doesn’t explicitly require encryption, but it’s listed as an appropriate technical measure under Article 32. For AI pipelines handling personal data, implement:
At Rest
- S3: SSE-S3 or SSE-KMS with customer-managed keys
- DynamoDB: Encryption enabled by default, use KMS for customer-managed keys
- RDS: Encrypted storage volumes + encrypted snapshots
- SageMaker: Encrypted model artefacts and training data
In Transit
- TLS 1.2+ on all API endpoints
- VPC endpoints to keep traffic off the public internet
- Certificate pinning for service-to-service communication
Key Management
- Use AWS KMS in eu-central-1 for all encryption keys
- Keys never leave the region
- Enable key rotation
- Audit all key usage via CloudTrail
Step 6: Audit Logging
GDPR’s accountability principle requires you to demonstrate compliance. Build comprehensive logging:
What to Log
- Every AI inference request (timestamp, user ID, anonymised input, model used)
- PII detection results (what was found and anonymised)
- Access to personal data (who accessed what, when)
- Data deletion events (right-to-erasure fulfilment)
- Model version changes
Where to Log
- CloudTrail – API-level audit trail for all AWS actions
- CloudWatch Logs – Application-level logging in eu-central-1
- DynamoDB – Structured audit records with TTL for retention management
- S3 – Long-term audit archive with lifecycle policies
Retention
Align log retention with your GDPR data retention policy. Typically:
- Operational logs: 30-90 days
- Audit logs: 12-24 months
- Compliance documentation: duration of processing + 3 years
Step 7: Right to Erasure (Article 17)
Your AI pipeline must support data deletion requests. This means:
- Index all personal data – Know exactly where each person’s data lives across your pipeline
- Purge from vector databases – Delete embeddings derived from the individual’s data
- Clear anonymisation mappings – Delete token-to-PII mappings
- Remove from logs – Redact or delete personal data from audit logs (keep anonymised records)
- Confirm deletion – Document what was deleted and when
Cost Considerations
Running AI pipelines in eu-central-1 typically costs 5-10% more than us-east-1 due to regional pricing. For a typical enterprise workload:
| Component | Monthly Estimate |
|---|---|
| Bedrock (Claude Sonnet, 1M tokens/day) | €2,000-4,000 |
| SageMaker endpoint (Llama 70B, ml.g5.12xlarge) | €5,000-8,000 |
| OpenSearch Serverless (vector store) | €500-1,500 |
| Lambda + API Gateway | €200-500 |
| DynamoDB (audit logs) | €100-300 |
| Total | €3,000-14,000/mo |
The cost premium for EU hosting is negligible compared to the cost of GDPR non-compliance (up to 4% of global annual turnover).
Common Pitfalls
1. Using a global CDN that caches personal data outside the EU
Ensure CloudFront distributions are configured with EU-only edge locations, or don’t cache responses containing personal data.
2. Forgetting about CloudWatch cross-region replication
Disable any cross-region log replication that might copy personal data outside EU regions.
3. Using third-party AI APIs without DPAs
Every external service that touches personal data needs a Data Processing Agreement. This includes vector database SaaS providers, embedding APIs, and evaluation tools.
4. Not accounting for model updates
When AWS updates Bedrock models, your system’s behaviour changes. Log model versions with every inference for audit traceability.
Next Steps
Building GDPR-compliant AI pipelines requires careful architecture from day one – retrofitting compliance into an existing pipeline is significantly more expensive and error-prone.
For a deeper dive into GDPR and AI, read our comprehensive guide to GenAI and GDPR compliance. To understand the broader regulatory landscape including the EU AI Act, see our EU AI Act compliance checklist.
See how we apply this in specific industries:
- AI pipelines for healthcare — Patient data pipelines with GDPR special category handling
- AI pipelines for SaaS — EU-hosted infrastructure for multi-tenant SaaS products
At HASORIX, we build compliant AI systems for European enterprises – from architecture to deployment to documentation. Talk to us about your AI pipeline.