Airweave
Inside Airweave's Architecture
・
Context retrieval for AI agents is fundamentally an infrastructure problem. To build systems that can reliably search across dozens of data sources, scale to millions of records per user, and maintain low latency, you need architectural patterns designed specifically for these challenges.
This article explores how Airweave's distributed architecture solves the core problems of continuous data synchronization and fast semantic retrieval at scale.
Challenges
AI agents depend on context, but enterprise data lives in fragments. A single query like "What are my team's blockers this week?" requires information from Linear, Slack, Google Calendar, and email, each with its own API, authentication, rate limits, and data formats.
The common approach of embedding static documents and storing them in a vector database works for demos, but it fails in production. Real systems need:
Continuous sync: Data must stay fresh as sources change in real time
Permission awareness: Users should only see what they're authorized to access
Massive scale: Handle millions of records per user without degrading
Predictable latency: Return results in milliseconds, not seconds
Fault tolerance: Gracefully handle API failures and rate limits
Building this kind of system requires thinking beyond simple RAG pipelines. You need distributed architecture designed for durability, scalability, and separation of concerns.
Design Principles
Airweave's architecture follows three core principles that shape every component:
Separation of read and write paths: Search operations (reads) and data synchronization (writes) run independently. A surge in queries never slows down background syncs. An external API failure during sync never impacts search performance.
Horizontal scalability by default: Every component scales independently. Need to process more sources? Add sync workers. Need to handle more queries? Add API instances. Scale one dimension without affecting others.
Durability over speed for writes: Background syncs prioritize reliability and resumability. If a sync fails halfway through processing a million records, it resumes from the last checkpoint rather than starting over.
The Control Plane and Data Plane
Airweave separates concerns into two distinct layers:
The Control Plane (API)is a lightweight FastAPI service that handles authentication, authorization, and orchestration. It validates users, manages collection access, and schedules sync jobs, but it explicitly avoids heavy processing.
The API never directly contacts external data sources or performs transformations. It delegates all expensive operations to workers. This keeps the API responsive regardless of what's happening in background syncs.
The Data Plane (Sync Workers)consists of stateless, horizontally scalable processes that perform all heavy lifting: pulling data from external APIs, detecting changes, writing to databases, and publishing progress updates.
Workers are designed to be ephemeral. If a worker crashes mid-sync, the workflow engine automatically retries the job on another worker. Because workers are stateless, any worker can pick up any job, enabling efficient horizontal scaling.
Storage and Orchestration
Airweave uses different databases optimized for different access patterns:
PostgreSQL (Management Database)stores transactional data: user accounts, organizations, collections, source connection configurations, sync metadata, and system bookkeeping.
Vespa (Vector/Search Database)stores all searchable user data from external sources, optimized for semantic and hybrid retrieval operations.
This separation reflects fundamentally different workloads. PostgreSQL handles configuration changes and metadata updates (low volume, high consistency). Vespa handles semantic search and ranking (high volume, optimized for speed).
Two systems coordinate distributed operations:
Temporalserves as the workflow engine. It schedules sync jobs, distributes them to available workers, tracks progress, and handles retries. Each sync runs as a durable workflow that can resume from checkpoints if interrupted.
Redis Pub/Subenables real-time status updates. Workers broadcast progress events that the API streams to connected UIs, creating a responsive experience without polling.
The Read Path: How Search Works
When a user searches a collection, the flow is deliberately simple:
UI → API → Vespa → API → UI
User submits a search query with collection ID
API validates user access to that collection
API queries Vespa with search terms and permission filters
Vespa executes its retrieval pipeline (embeddings, ranking, filtering)
Results return to the user
The entire path is synchronous and typically completes very quickly. Critically, search never involves Temporal or workers. It's a direct read from the vector database. This isolation keeps latency low and predictable.
Permission filtering happens at query time by encoding access rules into Vespa's metadata layer. Users only receive results they're authorized to see based on organizational role and source-specific permissions.
The Write Path: How Syncs Work
The write path optimizes for durability and scale rather than immediate response:
UI → API → PostgreSQL → Temporal → Worker → Source → Databases
User creates a source connection (e.g., Google Drive)
API stores connection config in PostgreSQL
API schedules a sync workflow in Temporal and returns immediately
Temporal assigns the job to an available sync worker
Worker loads sync context from PostgreSQL
Worker pulls data from the external source
Worker detects changes by comparing against previous state
Worker writes to both databases:
New/updated records → Vespa (searchable immediately)
Sync metadata → PostgreSQL (for resumability)
Worker publishes progress to Redis for real-time UI updates
This asynchronous design means users never wait for syncs to complete. Data becomes searchable incrementally as it's processed.
Workers maintain checksums or modification timestamps for each item. On subsequent syncs:
Unchanged items are skipped entirely (no writes)
Modified items trigger updates to Vespa
Deleted items are tombstoned in Vespa
This delta detection dramatically reduces write volume and keeps incremental syncs fast after the initial full import.
Designing for Scale
The design choices described above solve specific infrastructure problems:
Independent scaling: API instances, sync workers, PostgreSQL, and Vespa all scale horizontally and independently. A surge in one dimension doesn't bottleneck others.
Graceful degradation: When external APIs fail or rate-limit requests, workers retry with exponential backoff. When workers crash, Temporal resumes workflows from checkpoints. The system prioritizes eventual consistency over immediate perfection.
Agent-optimized retrieval: Everything optimizes for fast, reliable search. Data is denormalized during ingestion so queries remain simple. Embeddings are computed at write-time, not read-time. Permission rules are pre-indexed rather than evaluated dynamically.
Operational isolation: Read and write paths run independently. Search failures don't impact syncs. Sync delays don't slow down queries. This separation prevents cascading failures and makes the system easier to reason about.
Understanding what breaks when components fail reveals the robustness built into the architecture:
API failure: Searches fail, but background syncs continue running. When the API recovers, all scheduled syncs have completed normally.
Worker failure: Searches work perfectly (read path unaffected), but new syncs pause. When workers recover, Temporal resumes workflows from their last checkpoint.
Temporal failure: Both searches and active syncs continue, but new syncs cannot be scheduled. When Temporal recovers, it catches up on missed schedules.
PostgreSQL failure: Searches work (they only need Vespa), but new syncs cannot start and metadata cannot update. Workers retry database writes until PostgreSQL recovers.
Vespa failure: Syncs continue (they write to Vespa with retries), but searches fail. This is the only single point of failure for the read path.
Redis failure: Everything works, but real-time UI updates stop. Users can still trigger syncs and search; they just lose live progress indicators.
The read/write split was a deliberate choice: the read path (search) is kept maximally available, while the write path (sync) can gracefully degrade and recover without data loss.
All in all, Airweave's architecture enables predictable scaling along multiple dimensions:
User growth: Add sync workers to handle more concurrent syncs. The API and databases scale independently of user count.
Data volume per user: Vespa handles billions of records efficiently. Workers process sources in batches with checkpoints, allowing syncs to pause and resume.
Source diversity: Adding new integrations (e.g., a new SaaS tool) only requires implementing a new worker module. Core infrastructure remains unchanged.
Query load: Add API replicas and Vespa read replicas. Searches are stateless and cache-friendly, enabling straightforward horizontal scaling.
Looking Ahead
Building reliable context retrieval for AI agents requires infrastructure designed specifically for continuous synchronization, distributed processing, and semantic search at scale.
The patterns described here (separation of read and write paths, durable workflows, horizontal scalability, and specialized storage) apply broadly beyond Airweave's specific implementation. Whether you're building RAG systems, AI assistants, or autonomous agents, you'll eventually encounter the same fundamental challenges:
How to keep data fresh across dozens of sources
How to scale to millions of records per user
How to serve context in milliseconds
How to handle failures gracefully
As AI agents take on more responsibility in production systems, the infrastructure connecting them to real-world data becomes as critical as the models themselves.
