The F5 AI Reference Architecture is a unified, hybrid‑multicloud framework that breaks AI systems into seven core building blocks to help organisations plan, secure, deliver, and scale AI workloads. It focuses heavily on AI runtime security, RAG security, traffic management, and distributed inference, and integrates OWASP LLM Top 10 and F5’s App Delivery Top 10.
F5 designed this architecture to address the new security and delivery challenges created by AI systems:
AI apps generate massive, unpredictable traffic
AI models require high‑performance load balancing
AI data pipelines need secure, resilient ingestion
AI workloads run across hybrid and multicloud environments
New threats such as model theft, data poisoning, and prompt injection are emerging
The architecture provides a blueprint for organisations to standardise and optimise AI deployments.
According to F5, the architecture organises AI/ML workflows into seven essential components:
Secure, high‑performance delivery of model inference workloads.
Patterns for securing retrieval pipelines, vector stores, and knowledge bases.
Controls for AI agents interacting with external APIs, tools, and services.
Governance, integrity, and security of the knowledge corpus used by RAG systems.
Secure data ingestion and model adaptation workflows.
Protection of training data, pipelines, and compute environments.
Secure SDLC for AI applications, including testing, evaluation, and red‑teaming.
These blocks map to the full lifecycle of AI systems—from data ingestion to deployment.
F5 integrates multiple security standards and threat models:
Covers prompt injection, insecure output handling, data leakage, and more.
Addresses hybrid‑multicloud delivery challenges such as fragmentation, latency, and inconsistent controls.
F5 provides runtime protections including:
AI Red Teaming
CASI Leaderboard for AI risk evaluation
Model interaction monitoring
F5 emphasises that modern AI workloads are:
Distributed across clouds, data centres, and edge
GPU‑intensive, requiring specialised traffic management
Data‑gravity dependent, meaning data location drives model placement
The architecture ensures consistent:
Security
Connectivity
Observability
Performance
across AWS, Azure, GCP, private cloud, and edge.
Secure, resilient data ingestion and pipeline protection.
AI runtime security, red‑teaming, and governance.
Traffic optimisation, GPUaaS, and distributed inference.
Secure connectivity for RAG, inference, and distributed models.
It provides:
A vendor‑neutral, cloud‑agnostic AI security and delivery blueprint
Deep focus on runtime security, which many frameworks overlook
Strong alignment with OWASP LLM Top 10
Practical guidance for RAG, agentic AI, and distributed inference
A model you can integrate into your Community AI Security Reference Architecture
Area
What F5 Provides
AI Delivery
High‑performance load balancing, GPU traffic optimisation
AI Security
Runtime security, red‑teaming, LLM risk evaluation
RAG Security
Corpus governance, retrieval security, inference accuracy
Agentic AI
Secure external tool/API integration
Hybrid Multicloud
Unified connectivity and security across environments
Lifecycle Coverage
Development → Training → Fine‑tuning → RAG → Inference