Airweave

Inside Airweave's Architecture

Context retrieval for AI agents is fundamentally an infrastructure problem. To build systems that can reliably search across dozens of data sources, scale to millions of records per user, and maintain low latency, you need architectural patterns designed specifically for these challenges.

This article explores how Airweave's distributed architecture solves the core problems of continuous data synchronization and fast semantic retrieval at scale.

Challenges

AI agents depend on context, but enterprise data lives in fragments. A single query like "What are my team's blockers this week?" requires information from Linear, Slack, Google Calendar, and email, each with its own API, authentication, rate limits, and data formats.

The common approach of embedding static documents and storing them in a vector database works for demos, but it fails in production. Real systems need:

Continuous sync: Data must stay fresh as sources change in real time
Permission awareness: Users should only see what they're authorized to access
Massive scale: Handle millions of records per user without degrading
Predictable latency: Return results in milliseconds, not seconds
Fault tolerance: Gracefully handle API failures and rate limits

Building this kind of system requires thinking beyond simple RAG pipelines. You need distributed architecture designed for durability, scalability, and separation of concerns.

Design Principles

Airweave's architecture follows three core principles that shape every component:

Separation of read and write paths: Search operations (reads) and data synchronization (writes) run independently. A surge in queries never slows down background syncs. An external API failure during sync never impacts search performance.

Horizontal scalability by default: Every component scales independently. Need to process more sources? Add sync workers. Need to handle more queries? Add API instances. Scale one dimension without affecting others.

Durability over speed for writes: Background syncs prioritize reliability and resumability. If a sync fails halfway through processing a million records, it resumes from the last checkpoint rather than starting over.

The Control Plane and Data Plane

Airweave separates concerns into two distinct layers:

The Control Plane (API) is a lightweight FastAPI service that handles authentication, authorization, and orchestration. It validates users, manages collection access, and schedules sync jobs, but it explicitly avoids heavy processing.

The API never directly contacts external data sources or performs transformations. It delegates all expensive operations to workers. This keeps the API responsive regardless of what's happening in background syncs.

The Data Plane (Sync Workers) consists of stateless, horizontally scalable processes that perform all heavy lifting: pulling data from external APIs, detecting changes, writing to databases, and publishing progress updates.

Workers are designed to be ephemeral. If a worker crashes mid-sync, the workflow engine automatically retries the job on another worker. Because workers are stateless, any worker can pick up any job, enabling efficient horizontal scaling.

Storage and Orchestration

Airweave uses different databases optimized for different access patterns:

PostgreSQL (Management Database) stores transactional data: user accounts, organizations, collections, source connection configurations, sync metadata, and system bookkeeping.

Vespa (Vector/Search Database) stores all searchable user data from external sources, optimized for semantic and hybrid retrieval operations.

This separation reflects fundamentally different workloads. PostgreSQL handles configuration changes and metadata updates (low volume, high consistency). Vespa handles semantic search and ranking (high volume, optimized for speed).

Two systems coordinate distributed operations:

Temporal serves as the workflow engine. It schedules sync jobs, distributes them to available workers, tracks progress, and handles retries. Each sync runs as a durable workflow that can resume from checkpoints if interrupted.

Redis Pub/Sub enables real-time status updates. Workers broadcast progress events that the API streams to connected UIs, creating a responsive experience without polling.

The Read Path: How Search Works

When a user searches a collection, the flow is deliberately simple:

UI → API → Vespa → API → UI

  1. User submits a search query with collection ID

  2. API validates user access to that collection

  3. API queries Vespa with search terms and permission filters

  4. Vespa executes its retrieval pipeline (embeddings, ranking, filtering)

  5. Results return to the user

The entire path is synchronous and typically completes very quickly. Critically, search never involves Temporal or workers. It's a direct read from the vector database. This isolation keeps latency low and predictable.

Permission filtering happens at query time by encoding access rules into Vespa's metadata layer. Users only receive results they're authorized to see based on organizational role and source-specific permissions.

The Write Path: How Syncs Work

The write path optimizes for durability and scale rather than immediate response:

UI → API → PostgreSQL → Temporal → Worker → Source → Databases

  1. User creates a source connection (e.g., Google Drive)

  2. API stores connection config in PostgreSQL

  3. API schedules a sync workflow in Temporal and returns immediately

  4. Temporal assigns the job to an available sync worker

  5. Worker loads sync context from PostgreSQL

  6. Worker pulls data from the external source

  7. Worker detects changes by comparing against previous state

  8. Worker writes to both databases:

    • New/updated records → Vespa (searchable immediately)

    • Sync metadata → PostgreSQL (for resumability)

  9. Worker publishes progress to Redis for real-time UI updates

This asynchronous design means users never wait for syncs to complete. Data becomes searchable incrementally as it's processed.

Workers maintain checksums or modification timestamps for each item. On subsequent syncs:

  • Unchanged items are skipped entirely (no writes)

  • Modified items trigger updates to Vespa

  • Deleted items are tombstoned in Vespa

This delta detection dramatically reduces write volume and keeps incremental syncs fast after the initial full import.

Designing for Scale

The design choices described above solve specific infrastructure problems:

Independent scaling: API instances, sync workers, PostgreSQL, and Vespa all scale horizontally and independently. A surge in one dimension doesn't bottleneck others.

Graceful degradation: When external APIs fail or rate-limit requests, workers retry with exponential backoff. When workers crash, Temporal resumes workflows from checkpoints. The system prioritizes eventual consistency over immediate perfection.

Agent-optimized retrieval: Everything optimizes for fast, reliable search. Data is denormalized during ingestion so queries remain simple. Embeddings are computed at write-time, not read-time. Permission rules are pre-indexed rather than evaluated dynamically.

Operational isolation: Read and write paths run independently. Search failures don't impact syncs. Sync delays don't slow down queries. This separation prevents cascading failures and makes the system easier to reason about.

Understanding what breaks when components fail reveals the robustness built into the architecture:

API failure: Searches fail, but background syncs continue running. When the API recovers, all scheduled syncs have completed normally.

Worker failure: Searches work perfectly (read path unaffected), but new syncs pause. When workers recover, Temporal resumes workflows from their last checkpoint.

Temporal failure: Both searches and active syncs continue, but new syncs cannot be scheduled. When Temporal recovers, it catches up on missed schedules.

PostgreSQL failure: Searches work (they only need Vespa), but new syncs cannot start and metadata cannot update. Workers retry database writes until PostgreSQL recovers.

Vespa failure: Syncs continue (they write to Vespa with retries), but searches fail. This is the only single point of failure for the read path.

Redis failure: Everything works, but real-time UI updates stop. Users can still trigger syncs and search; they just lose live progress indicators.

The read/write split was a deliberate choice: the read path (search) is kept maximally available, while the write path (sync) can gracefully degrade and recover without data loss.

All in all, Airweave's architecture enables predictable scaling along multiple dimensions:

User growth: Add sync workers to handle more concurrent syncs. The API and databases scale independently of user count.

Data volume per user: Vespa handles billions of records efficiently. Workers process sources in batches with checkpoints, allowing syncs to pause and resume.

Source diversity: Adding new integrations (e.g., a new SaaS tool) only requires implementing a new worker module. Core infrastructure remains unchanged.

Query load: Add API replicas and Vespa read replicas. Searches are stateless and cache-friendly, enabling straightforward horizontal scaling.

Looking Ahead

Building reliable context retrieval for AI agents requires infrastructure designed specifically for continuous synchronization, distributed processing, and semantic search at scale.

The patterns described here (separation of read and write paths, durable workflows, horizontal scalability, and specialized storage) apply broadly beyond Airweave's specific implementation. Whether you're building RAG systems, AI assistants, or autonomous agents, you'll eventually encounter the same fundamental challenges:

  • How to keep data fresh across dozens of sources

  • How to scale to millions of records per user

  • How to serve context in milliseconds

  • How to handle failures gracefully

As AI agents take on more responsibility in production systems, the infrastructure connecting them to real-world data becomes as critical as the models themselves.

7 min read

Getting Started with Airweave

Building AI agents that need access to real-world data requires solving a fundamental problem: how do you give your agent reliable, up-to-date context from dozens of different sources without building custom integrations for each one?

This article walks through the core workflow of using Airweave to turn scattered data sources into a unified retrieval layer that AI agents can query in a single request. In essence, using Airweave follows a straightforward pattern:

  1. Create a collection (your searchable knowledge base)

  2. Add source connections (link your apps and databases)

  3. Wait for sync (Airweave pulls and indexes your data)

  4. Search and retrieve (query from your agent or application)

Each step builds on the last, and once configured, Airweave handles continuous synchronization automatically.

Collections

A Collection is a searchable knowledge base composed of entities from one or more source connections. Collections are what your AI agents actually query.

Think of a collection as a unified index across multiple data sources. You might create a collection called "Engineering Context" that includes:

  • GitHub issues and pull requests

  • Slack messages from your engineering channel

  • Notion documentation

  • Linear tickets

When your agent searches this collection, it retrieves relevant results from all connected sources in a single query, ranked by relevance regardless of where the data originated.

Collections are created through the SDK or API:

collection = airweave.collections.create(
    name="Engineering Context"
)

Once created, a collection has a unique readable_id that you'll use for all subsequent operations.

Source Connections

A Source Connection is a configured, authenticated instance of a connector linked to your specific account or workspace. It represents the actual live connection to your data using your credentials.

While Airweave supports many source types (Slack, GitHub, Notion, Google Drive, databases, and more), each source connection is specific to your account. You might have multiple connections to the same source type. For example, connecting to three different Slack workspaces or two separate GitHub organizations.

Creating a source connection requires:

  1. Selecting a connector: The source type you want to connect (e.g., "slack", "github", "notion")

  2. Authenticating: Providing credentials via OAuth or API keys

  3. Assigning to a collection: Linking the connection to an existing collection

source_connection = airweave.source_connections.create(
    name="My Stripe Connection",
    short_name="stripe",
    readable_collection_id=collection.readable_id,
    authentication={
        "credentials": {
            "api_key": "your_stripe_api_key"
        }
    }
)

For OAuth-based sources like Slack, Google Drive, or GitHub, Airweave handles the OAuth flow through the UI. For API-key-based sources like Stripe or custom databases, you provide the credentials directly.

Syncing

Once a source connection is created, Airweave immediately triggers an initial sync. This process:

  • Pulls all accessible data from the source

  • Transforms it into searchable entities

  • Chunks long content for better retrieval

  • Generates embeddings for semantic search

  • Indexes everything in Vespa

The initial sync can take time depending on data volume. A Slack workspace with years of messages might take several minutes. A large Google Drive with thousands of large documents could take longer.

After the initial sync completes, Airweave continues syncing on a schedule (configurable per connection) or can be triggered programmatically via the API. Incremental syncs are fast because Airweave only processes new or modified data.

You can monitor sync status through the dashboard or by checking the source connection object:

status = airweave.source_connections.get(
    source_connection_id=source_connection.id
)
print(status.status)

Searching

When an agent searches a collection, the query runs across all entities from all connected sources, returning the most relevant results regardless of where the data originally came from.

Search is where Airweave delivers value. Your agent sends a natural language query, and Airweave returns the most relevant context from across all connected sources.

results = airweave.collections.search(
    readable_id=collection.readable_id,
    query="What are the open bugs related to authentication?",
    limit=10
)

for result in results.results:
    print(f"Source: {result.source_name}")
    print(f"Content: {result.md_content}")
    print(f"Score: {result.score}")

Behind the scenes, Airweave runs a hybrid search combining:

  • Semantic search: Vector similarity using embeddings

  • Keyword search: BM25 for exact term matching

  • Reranking: LLM-based reranking for precision

Results include source attribution, so your agent knows exactly where each piece of information came from. This enables citation-backed responses and helps users verify facts.

Entities

An Entity is a single, searchable item extracted from a source. Entities are the atomic units of data that get indexed and returned in search results.

You don't interact with entities directly in most cases, but understanding them helps explain how Airweave works. When Airweave syncs a source connection, it extracts entities:

  • A Slack message becomes an entity

  • A GitHub codefile becomes an entity

  • A Notion page becomes an entity

  • A database row becomes an entity

Each entity carries metadata like timestamps, author information, source type, and links back to the original content. This metadata enables filtering and source attribution in search results.

Permission Awareness

One critical aspect of Airweave's design: it respects source-level permissions. When you authenticate a source connection, Airweave only syncs data your credentials can access.

For example:

  • A Slack connection only syncs channels the authenticated user can see

  • A GitHub connection only syncs repositories the token has access to

  • A Google Drive connection only syncs files the user can read

This means different users can have different collections with different source connections, each seeing only the data they're authorized to access.

Integration Patterns

Airweave integrates into AI applications through several interfaces:

SDK (Python/Node.js) Best for custom agents and applications. Full programmatic control over collections, source connections, and search.

REST API Direct HTTP access for any language or framework. Useful for integrations beyond the SDK languages.

MCP Server Model Context Protocol integration for tools like Claude Desktop. Enables agents to search Airweave collections as a native capability.

Framework Integrations Native support for popular agent frameworks like Vercel and LlamaIndex, enabling drop-in retrieval without custom code.

The choice depends on your stack, but all interfaces provide the same core functionality: create collections, add sources, search for context.

Looking Ahead

Airweave handles the infrastructure of context retrieval so you can focus on building capable agents. Once collections are configured and syncing, your agent has reliable access to up-to-date context without worrying about API quirks, rate limits, or keeping data fresh.

The patterns described here (collections, source connections, continuous sync, unified search) form the foundation for building agents that operate on real-world data rather than static snapshots. Whether you're building internal tools, customer-facing assistants, or autonomous agents, Airweave provides the retrieval layer that connects intelligence to information.

5 min read

Case Study: Error Monitoring Agent

Error monitoring tools send alerts. What engineering teams actually need is context: What code is involved? Did anyone work on this yet? Is this a new issue or is this a known regression?

This article walks through building an intelligent error monitoring agent that uses Airweave to transform raw error logs into enriched, actionable alerts. We'll cover the architecture, implementation patterns, and lessons learned from processing 40,000+ queries per month in production.

The full implementation is available at github.com/airweave-ai/error-monitoring-agent.

Problem Setting

Traditional error monitoring follows a simple pattern: error occurs, alert fires, engineer investigates. This breaks down at scale for several reasons:

Alert fatigue: A single underlying issue can generate hundreds of individual alerts. Engineers learn to ignore notifications or spend hours triaging duplicates.

Missing context: Error logs contain stack traces but lack the surrounding context engineers need. Which code is affected? Has this happened before? Is there already a ticket?

Manual correlation: Engineers manually search GitHub for relevant code, check Linear for existing tickets, and scan Slack for related discussions. This takes 10-15 minutes per error.

Reactive posture: By the time an alert reaches someone, customers have often already experienced the issue. There's no opportunity for proactive fixes.

For small teams maintaining complex systems, this overhead becomes unsustainable.

Architecture Overview

The error monitoring agent runs as a scheduled pipeline (every 5 minutes in production) with five core stages:

  1. Fetch and cluster errors from monitoring systems

  2. Search for context using Airweave across GitHub, Linear, and Slack

  3. Analyze severity and determine if this is new, ongoing, or a regression

  4. Determine suppression - should this trigger an alert or be silenced?

  5. Create alerts in Slack and Linear with full context

Each stage feeds into the next, progressively enriching raw errors with the context engineers need to act quickly.

Stage 1: Semantic Error Clustering

Raw error logs are noisy. A database timeout might generate 50 identical stack traces within minutes. The first step is grouping errors by root cause rather than treating each occurrence as distinct.

Multi-Stage Clustering

The agent uses a four-stage clustering approach:

Stage 1: Strict Clustering - Group by exact module + function + line number match. This catches identical stack traces immediately.

Stage 2: Regex Pattern Clustering - Group by error type extracted via regex patterns. For example, "429", "rate limit", and "too many requests" all map to a "RateLimit" error type. Errors matching the same pattern type with 2+ occurrences form a cluster.

Stage 3: LLM Semantic Clustering - (Optional) Use Claude or GPT-4 to identify remaining unclustered errors with similar root causes but different surface presentations. The LLM returns groupings like [[0, 1, 3], [2], [4, 5]] and then a second LLM call generates a human-readable signature (50-150 chars) for each multi-error group.

Stage 4: Cluster Merging - Only runs when there are 3+ clusters. Uses the LLM to decide which clusters to merge. Falls back to merging clusters with the same extracted error type if no LLM is available.

This reduces 500 raw logs to approximately 10-15 distinct clusters worth investigating.

Implementation Pattern

from pipeline.clustering import ErrorClusterer

# Fetch raw errors from Azure Log Analytics (or Sentry, etc.)
raw_errors = await data_source.fetch_errors(
    window_minutes=30,
    limit=100
)

# Multi-stage clustering (strict → regex → LLM → merge)
clusterer = ErrorClusterer()
clusters = await clusterer.cluster_errors(
    errors=raw_errors
)

# Result: 100 errors → ~8 clusters
for cluster in clusters:
    print(f"Cluster: {cluster['signature']}")
    print(f"Count: {cluster['error_count']}")
    print(f"First seen: {cluster['first_occurrence']}")

The clustering logic maintains state between runs to track whether a cluster is new, ongoing, or a regression of a previously fixed issue.

Stage 2: Context Search with Airweave

Once errors are clustered, the agent needs context. This is where Airweave transforms the workflow.

Multi-Source Search Strategy

For each error cluster, the agent performs three parallel searches:

GitHub search - Find code files and functions related to the error. Returns file paths with line numbers and relevant code snippets.

Linear search - Check for existing tickets about this issue. If found, link to the ticket instead of creating a duplicate.

Slack search - Surface past discussions, incident threads, or solutions from previous occurrences.

GitHub and Linear sync continuously into the Airweave collection. Slack uses federated search, querying the Slack API at search time and merging results via Reciprocal Rank Fusion. All three searches run through the same unified interface.

Implementation Pattern

from pipeline.search import ContextSearcher

searcher = ContextSearcher()

# Search three sources per cluster
context_results = await searcher.search_context(clusters)
# Returns github, linear, and docs results per cluster

Under the hood, search_context calls a private method three times per cluster:

async def _search_source(self, query, source_filter=None, limit=5):
    if source_filter:
        # Use advanced search with filter
        response = await self.client.collections.search_advanced(
            readable_id=self.collection_readable_id,
            query=query,
            filter={
                "must": [
                    {"key": "source_name", "match": {"value": source_filter}}
                ]
            },
            limit=limit
        )
    else:
        # Search all sources
        response = await self.client.collections.search(
            readable_id=self.collection_readable_id,
            query=query,
            limit=limit
        )

For each cluster, the searcher performs three parallel searches:

query = f"{cluster['signature']} {cluster['sample_message']}"[:500]

# GitHub for related code
github_results = await self._search_source(
    query=query, source_filter="GitHub", limit=5
)

# Linear for existing tickets  
linear_results = await self._search_source(
    query=query, source_filter="Linear", limit=3
)

# Slack/docs for past discussions
docs_results = await self._search_source(
    query=query, source_filter=None, limit=3
)

The search results include full metadata: file paths, Linear ticket IDs, Slack thread URLs. This context gets attached to each cluster for the next stage.

Why This Works

Without Airweave, this context gathering would require:

  • Custom GitHub API integration to search code

  • Linear API client to query tickets semantically

  • Slack API wrapper to search message history

  • Manual correlation logic to rank results

Airweave handles all of this through a single unified interface. The agent sends three search queries and receives ranked, relevant results from each source, regardless of whether the data is synced (GitHub, Linear) or federated (Slack).

More importantly, Airweave provides semantic search rather than keyword matching. A keyword search across GitHub and Linear APIs would miss results where the wording differs. Airweave's vector search can match "database pool exhausted" to a Linear ticket titled "DB connection limits under load" - the kind of connection engineers make intuitively but keyword search cannot.

Stage 3: Severity Analysis and Status Determination

With context attached, the agent now determines severity and whether to alert.

Severity Classification

The agent uses Claude to analyze each cluster and assign a severity level:

S1 - Critical: Complete service outage, data loss/corruption, security breach, ALL users affected
S2 - High: Major feature broken, affecting multiple users
S3 - Medium: Minor feature degraded, workaround available
S4 - Low: Cosmetic issue, no user impact

The prompt is explicitly calibrated to be conservative - most errors should land at S3 or S4. Only genuine outages or data loss scenarios warrant S1.

The LLM receives the error details, stack trace, and Airweave context to make this determination.

severity_prompt = f"""
Analyze this error cluster and assign severity (S1-S4):

Error: {cluster['signature']}
Message: {cluster['sample_message']}
Occurrences: {cluster['error_count']} in last 30 min
Stack trace: {cluster['stack_trace']}

Context from GitHub:
{github_results.summary}

Context from Linear:
{linear_results.summary}

Provide severity (S1-S4) and reasoning.
"""

analysis = await llm.complete(severity_prompt)
cluster['severity'] = analysis.severity
cluster['reasoning'] = analysis.reasoning

Status Tracking

The agent maintains state to track error signatures across runs:

NEW - First time this error signature has been seen. Always creates an alert and Linear ticket.

ONGOING - Error signature exists with an open Linear ticket. Suppresses alerts but adds a comment to the existing ticket with updated context.

REGRESSION - Error signature was previously resolved (ticket closed) but has returned. Reopens the ticket and sends a high-priority alert.

This status logic prevents alert spam while ensuring critical issues never get missed.

Stage 4: Suppression Logic

With severity and status determined, the agent now decides whether to alert. Not every error cluster triggers a notification.

Smart Suppression

The agent applies suppression rules in priority order (first match wins):

  1. Muted? If the error signature is muted (manually by an engineer), suppress - regardless of severity.

  2. S1/S2 severity? Always alert, overriding all other suppression rules.

  3. NEW status? First occurrence of this error signature - always alert.

  4. REGRESSION? Previously fixed issue has returned - always alert.

  5. ONGOING with open ticket? Suppress to avoid spam. The existing ticket tracks it.

  6. Alerted within 24 hours? Suppress if we already notified about this signature recently.

  7. Default: Alert.

The ordering is deliberate. Mutes are respected first (engineers made an explicit choice), but S1/S2 severity and regressions always punch through everything else. This ensures critical issues are never silently dropped.

Mute matching goes beyond exact strings. The agent uses a SemanticMatcher that compares new error signatures against active mutes using LLM-based semantic comparison. If an engineer mutes "database connection timeout," the agent will also suppress "DB pool exhausted" if the LLM judges them similar enough. The same semantic matching applies to finding existing Linear tickets - the agent can link a new error to a ticket even when the wording differs.

Stage 5: Enriched Alerts

The final stage creates alerts in Slack and Linear with all context attached.

Slack Notification Format

Each Slack message includes:

  • Error type and message

  • Severity level with color coding

  • Affected organizations (if multi-tenant)

  • Code context with clickable GitHub links

  • Linear ticket status (new, existing, reopened)

  • Mute controls (inline buttons to suppress)

await slack.send_alert(
    channel=SLACK_CHANNEL_ID,
    severity=cluster['severity'],
    error_type=cluster['signature'],
    message=cluster['sample_message'],
    github_context=github_results,
    linear_ticket=linear_ticket,
    mute_signature=cluster['signature']
)

Linear Ticket Creation

For new errors, the agent creates a Linear ticket with:

  • Title: Error type and brief description

  • Description: Full error details, stack trace, affected organizations

  • Priority: Mapped from severity (S1→Urgent, S2→High, S3→Medium, S4→Low)

  • Attachments: Links to relevant GitHub files and Slack threads

For existing tickets, it adds a comment with new occurrences and updated context.

Production Deployment

The agent supports two deployment modes: as a cron-triggered script for simple setups, or as a FastAPI server with REST and WebSocket endpoints for real-time visualization.

Scheduling Pattern

# run_monitoring.py
import asyncio
from main import run_pipeline, PipelineConfig

async def main():
    # Linear/Slack enablement is controlled via environment variables:
    #   LINEAR_ENABLED=true, LINEAR_API_KEY=..., LINEAR_TEAM_ID=...
    #   SLACK_ENABLED=true, SLACK_BOT_TOKEN=..., SLACK_CHANNEL_ID=...
    
    config = PipelineConfig(
        use_sample_data=False  # Use real error source (Azure, Sentry, etc.)
    )
    
    result = await run_pipeline(config)

if __name__ == "__main__":
    asyncio.run(main())

In production, the agent runs as a FastAPI server with REST and WebSocket endpoints. The script above is a simplified standalone entrypoint for cron-based scheduling. The server-based architecture also powers a real-time pipeline visualization UI via WebSocket.

Run via cron every 5 minutes:

*/5 * * * * cd /path/to/agent && source

State Management

The agent maintains JSON-based state files to track:

  • Error signatures and their status (new/ongoing/regression)

  • Last alert timestamps for suppression logic

  • Muted error patterns

  • Linear ticket IDs mapped to error signatures

This state persists between runs, enabling the status tracking described earlier.

Results and Impact

Deploying this agent in production delivered measurable improvements:

Volume: Handles 40,000+ Airweave queries per month across GitHub, Linear, and Slack searches.

Alert reduction: 500 raw errors per day reduced to 15-20 actionable alerts (depending on error distribution), cutting noise by 95%.

Response time: Average time from error occurrence to engineer awareness dropped from hours to minutes.

Proactive fixes: Team often resolves issues before customers report them, then proactively notifies affected users.

Context efficiency: Engineers jump directly to relevant code and existing tickets instead of spending 10-15 minutes searching manually.

Key Implementation Lessons

Use Airweave Source Filtering

When searching for context, filtering by source type dramatically improves relevance:

# Good: Targeted search with filter
github_results = await client.collections.search_advanced(
    readable_id=collection_readable_id,
    query=error_context,
    filter={
        "must": [
            {"key": "source_name", "match": {"value": "GitHub"}}
        ]
    },
    limit=5
)

# Less effective: Search all sources
all_results = await client.collections.search(
    readable_id=collection_readable_id,
    query=error_context,
    limit=15  # Returns mixed results from all sources
)

Cluster Before Searching

Running Airweave searches on individual errors is inefficient. Cluster first, then search once per cluster:

  • ❌ Bad: 500 errors × 3 searches = 1,500 Airweave queries

  • ✅ Good: 500 errors → 10 clusters × 3 searches = 30 Airweave queries

The actual compression ratio depends on your error distribution. Homogeneous failures (e.g., a single endpoint timing out) compress dramatically, while diverse errors across unrelated systems compress less.

LLM Analysis After Context Gathering

Don't use LLMs to determine severity from error logs alone. First gather context via Airweave, then pass everything to the LLM:

# The LLM sees the full picture
analysis = await llm.analyze(
    error=cluster,
    github_context=github_results,
    linear_context=linear_results,
    slack_context=slack_results
)

This produces far more accurate severity assessments than analyzing errors in isolation.

Maintain Clear State

Error monitoring without state creates duplicate tickets and repeated alerts. Track signatures, statuses, and alert timestamps persistently:

# state.py
class StateManager:
    def get_signature_status(self, signature: str) -> str:
        """Returns: 'new', 'ongoing', or 'regression'"""
        
    def record_alert(self, signature: str):
        """Track when we last alerted for this signature"""
        
    def is_muted(self, signature: str) -> bool:
        """Check if engineers muted this error"""

Graceful Degradation

The agent works at every configuration level. Without an LLM key, clustering falls back to regex patterns and severity uses rule-based heuristics. Without Airweave, the pipeline still clusters and analyzes errors - it just lacks external context. Without Slack or Linear configured, alerts render as previews. This means teams can adopt the agent incrementally: start with clustering alone, add Airweave when ready, enable Slack/Linear when the output is trusted.

Looking Ahead

Building an error monitoring agent demonstrates how Airweave enables a new class of autonomous tools. Rather than building custom integrations for GitHub, Linear, and Slack, the agent queries a single unified interface.

This pattern extends beyond error monitoring. Any workflow that requires context from multiple sources (customer support, incident response, code review, documentation generation) can use the same approach: connect sources to Airweave, then query for context as needed.

The key insight is that context retrieval should be infrastructure, not custom code. When you treat it as infrastructure, building intelligent agents becomes straightforward: focus on the logic (clustering, analysis, alerting) rather than the plumbing (API integrations, authentication, data sync).

The complete error monitoring agent implementation, including all code examples from this article, is available as an open-source project at github.com/airweave-ai/error-monitoring-agent.

10 min read

Case Study: Slack Knowledge Assistant

Case Study: Slack Knowledge Assistant

Every team has the same problem: information is scattered across tools. The answer to "how does our authentication system work?" lives partly in a Notion doc, partly in a GitHub PR, partly in a Slack thread from three months ago, and partly in a Linear ticket. When someone asks in Slack, the response is either silence or a 10-minute scavenger hunt.

This article walks through building an open-source Slack bot that answers questions by searching across all of your connected tools using Airweave. We cover the architecture, the pipeline design, and the patterns that make it work well in practice.

The full implementation is available at github.com/airweave-ai/slack-knowledge-assistant.

What It Does

The assistant is a Slack bot. Mention it in a channel or send it a DM, and it searches across all of your company's connected sources (GitHub, Notion, Linear, Google Drive, Slack itself, and anything else synced to Airweave), generates an answer grounded in what it finds, and replies with source citations linking back to the original documents.

The bot reacts with a thinking emoji, searches, generates an answer, and posts a rich reply with citations. If a teammate replies in the thread while the bot is still working, it adapts its response so it doesn't repeat what a human already said.

It also handles threaded conversations. Ask a follow-up like "what about the API?" in the same thread, and the assistant rewrites your question into a standalone search query using the conversation history as context. This makes it feel like a real conversation rather than a series of isolated lookups.

Architecture

The assistant is a FastAPI application that receives Slack events via webhook. The pipeline has six stages, each feeding into the next:


The event handler's only job is to acknowledge the Slack event within three seconds (Slack's timeout requirement) and hand off the actual work to a background thread. Everything interesting happens in the pipeline.

Query Contextualization

The most important design decision in the assistant is how it handles follow-up questions. In a thread, users ask things like "what about the API?" or "who built that?" These questions are meaningless without the preceding conversation.

Before searching, the assistant sends the full thread history to a fast model (Claude Haiku) with a simple instruction: rewrite the user's follow-up as a standalone search query. If the thread started with "How does our authentication system work?" and the follow-up is "what about the API?", the rewritten query becomes something like "authentication system API implementation."

This is a small step that has a large impact on search quality. Without it, Airweave would receive "what about the API?" as the search query and return generic API results. With it, the search is grounded in the conversation's actual topic.

# Simplified: contextualize follow-up using thread history
query = await contextualize_query(
    user_message="what about the API?",
    thread_history=thread_messages,
    model="claude-3-5-haiku-latest"
)
# Result: "authentication system API implementation details"

Searching with Airweave

Once the query is contextualized, the assistant searches the Airweave collection. This is a single API call that searches across every connected source:

response = await client.collections.search(
    readable_id=collection_id,
    query=contextualized_query,
    limit=10
)

Behind the scenes, Airweave runs hybrid search (combining semantic similarity with keyword matching) and reranks results for precision. The assistant receives ranked results from GitHub, Notion, Linear, Slack, and any other connected source, each with metadata including the source type, original URL, and relevance score.

This is the step that replaces what would otherwise be a sprawling set of custom API integrations. Without Airweave, you would need a GitHub search client, a Notion search client, a Linear search client, and logic to merge and rank their results. With Airweave, it's one call.

Source-Aware Answer Generation

The assistant doesn't just dump search results into a prompt. It generates an answer using Claude, passing the search results as context, and then polishes the output specifically for Slack.

The polishing step is worth highlighting. The assistant adapts its language based on where information comes from:

  • Notion results get framed as "This is documented in..."

  • GitHub results become "This is implemented in..."

  • Linear or Jira results become "This is tracked in..."

  • Slack results become "This was discussed in..."

This source awareness makes answers feel natural rather than robotic. It also builds trust, because readers can immediately tell whether they're looking at official documentation, actual code, a ticket, or a casual conversation.

Each source citation includes a link back to the original document, so readers can verify or dig deeper.

Handling Concurrent Human Replies

There's a subtle timing problem with any Slack bot: what happens when a teammate answers the question while the bot is still processing? Without handling this, the bot posts a response that repeats what a human already said, which feels redundant and annoying.

The assistant solves this by checking the thread for new replies just before posting its answer. If a human has responded in the meantime, the assistant revises its answer to acknowledge the human reply and add only the additional context that the human didn't cover.

This is a small detail that significantly improves the experience in active channels where humans and the bot are both responding.

Confidence Grading

Not all search results are equally reliable. A well-maintained Notion doc is more authoritative than a casual Slack message from six months ago. The assistant grades its confidence based on the quality, recency, and source type of the results it found.

When confidence is low (for example, when the only results are tangentially related Slack messages), the assistant signals this in its response rather than presenting uncertain information with false confidence. This is important for building team trust in the assistant over time.

Deployment

The assistant is a standard FastAPI application. It can be deployed anywhere that runs Python: Railway, Render, Fly.io, or a simple Docker container.

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Configuration is handled entirely through environment variables:

Variable

Description

SLACK_BOT_TOKEN

Slack bot token (xoxb-...)

SLACK_SIGNING_SECRET

Slack app signing secret

AIRWEAVE_API_KEY

Airweave API key

AIRWEAVE_COLLECTION_ID

Collection readable ID

ANTHROPIC_API_KEY

Anthropic API key

For local development, run the server with uvicorn and expose it via ngrok so Slack can reach your event handler.

Key Patterns

Several patterns from this implementation apply broadly to any agent built on Airweave:

Rewrite before searching. If your agent handles multi-turn conversations, always contextualize the user's message before sending it to Airweave. The difference in search quality between "what about the API?" and "authentication system API implementation" is the difference between useless and useful results.

Let Airweave handle the source complexity. The assistant's codebase contains zero source-specific search logic. No GitHub client, no Notion client, no Linear client. Adding a new source (say, Confluence) means connecting it to the Airweave collection. The assistant code doesn't change at all.

Adapt output to source type. When presenting results to users, use the source metadata that Airweave provides. Framing information differently based on whether it came from documentation, code, a ticket, or a conversation makes answers more credible and easier to act on.

Handle the real-world timing issues. Bots that ignore what happens while they're processing feel broken. Check for concurrent activity before posting, and adapt accordingly.

Looking Ahead

The Slack Knowledge Assistant demonstrates a pattern that extends beyond Slack. Any messaging or collaboration interface (Discord, Teams, a custom chat UI) can use the same pipeline: receive a question, contextualize it, search Airweave, generate a grounded answer, and present it with source citations.

The assistant's entire value comes from the quality of the context it retrieves. The pipeline logic (query rewriting, answer generation, formatting) is straightforward. What makes it useful is having a single, continuously updated retrieval layer across all of the tools a team actually uses.

The complete implementation is available at github.com/airweave-ai/slack-knowledge-assistant.

6 min read

Webhooks

The previous articles in this series covered a pull-based workflow: create a collection, add source connections, let Airweave sync your data, then search. This works well for getting started, but production systems rarely operate in request-response mode. Your agent pipeline, your dashboard, your alerting system, they all need to know when fresh context is available, not just that it exists somewhere.

This article covers how Airweave's webhook system enables event-driven context pipelines that react to sync lifecycle changes in real time.

The Polling Problem

The most common approach to tracking sync status is polling: hit the API every few seconds, check if the sync completed, and proceed when it has. This is simple to implement and works fine during development.

It breaks down in production for a few reasons. First, most polls return nothing useful. If a sync takes three minutes and you poll every five seconds, 35 out of 36 requests are wasted. At scale, across dozens of source connections syncing on different schedules, this adds up to significant unnecessary API load.

Second, polling introduces latency gaps. If you poll every 30 seconds to be efficient, your downstream pipeline might wait up to 30 seconds after a sync completes before it even notices. For agents that serve user-facing queries, that delay means serving stale context when fresh context is already available.

Third, and most importantly, polling gives you no visibility into failures. If a sync fails silently between polling intervals, your pipeline continues operating as if nothing happened. The agent keeps serving results from the last successful sync without any indication that its context source is degraded. This is a form of context rot that's especially dangerous because it's invisible.

Event-Driven Sync with Webhooks

Airweave solves this with webhooks: instead of asking whether something happened, you register an endpoint and Airweave tells you the moment it does.

A Webhook is an HTTP callback triggered by a system event. When the event occurs, the source system sends an HTTP POST request to a pre-registered URL with a payload describing what happened.

In Airweave, webhooks are tied to the sync job lifecycle. Every sync transitions through a series of states, and each transition can fire a webhook event:

Event

When it fires

sync.pending

Job created and waiting to start

sync.running

Job begins processing

sync.completed

All data synced without errors

sync.failed

Job encountered an error

sync.cancelled

Job was manually cancelled

Most production integrations only subscribe to sync.completed and sync.failed. The others are useful for granular progress tracking (powering real-time UI indicators, for example) but aren't required for reactive pipelines.

When an event fires, Airweave delivers it via Svix, which handles retries, signature verification, and delivery guarantees. The delivery flow looks like this:

Each delivery is an HTTP POST with a JSON payload containing the event type, job ID, collection details, source type, and timestamp:

{
  "event_type": "sync.completed",
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "collection_readable_id": "engineering-context-ab123",
  "collection_name": "Engineering Context",
  "source_connection_id": "660e8400-e29b-41d4-a716-446655440001",
  "source_type": "github",
  "status": "completed",
  "timestamp": "2025-01-15T14:30:00Z"
}

The payload includes both collection_readable_id and source_type, which means your handler can route events precisely. You might handle a GitHub sync completion differently from a Slack sync completion, even within the same collection.

Patterns for Reactive Context Pipelines

Webhooks on their own are just a delivery mechanism. The value comes from what you build on top of them. Here are four patterns that show up consistently in production deployments.

Cache Invalidation

If your agent maintains a retrieval cache (storing recent search results to avoid repeated queries), you need a reliable signal to bust that cache when underlying data changes. Without webhooks, you either set aggressive TTLs (wasting the cache) or conservative TTLs (serving stale results).

With a sync.completed webhook, the answer is simple: invalidate cache entries for the affected collection the moment new data lands. Your agent serves cached results right up until fresh context is available, then seamlessly switches over. No polling, no arbitrary TTLs, no stale windows.

Failure Alerting

A sync.failed event is your signal that an agent's context source is degraded. The right response depends on your system, but common patterns include sending a Slack notification to the engineering channel, creating an incident in PagerDuty, or flagging the affected collection as stale in your agent's metadata so it can warn users that results may be outdated.

This is especially important for agents that operate autonomously. If a GitHub sync fails and your error monitoring agent (as described in the case study article) continues running, it will analyze errors without fresh code context. A webhook-triggered alert ensures someone knows about the degradation before it compounds.

Chained Pipelines

Some workflows depend on sequential data freshness. For example, your error monitoring pipeline might need both GitHub code context and Linear ticket context to be current before running an analysis cycle.

With webhooks, you can chain these dependencies: when the GitHub source completes syncing, check whether the Linear source has also completed recently. If both are fresh, trigger the analysis pipeline. If not, wait for the second event. This turns a cron-based "run every 5 minutes and hope everything is fresh" approach into a precise, event-driven pipeline that runs exactly when its dependencies are satisfied.

Audit and Observability

Every webhook event is a structured log of when each source was last synced, whether it succeeded, and how long it took (derivable from the gap between sync.running and sync.completed timestamps). Routing these events to your logging system gives you a complete audit trail of context freshness across all collections.

This is valuable for debugging ("when was the last time the Slack source synced successfully?") and for compliance scenarios where you need to prove that your agent's context was current at the time it produced a given output.

A Minimal Webhook Handler

To tie these patterns together, here's a minimal FastAPI handler that receives Airweave webhook events and routes them by type:

from fastapi import FastAPI, Request, Response

app = FastAPI()

@app.post("/webhooks/airweave")
async def handle_webhook(request: Request):
    payload = await request.json()
    event_type = payload["event_type"]

    if event_type == "sync.completed":
        collection_id = payload["collection_readable_id"]
        source_type = payload["source_type"]
        # Invalidate cache, trigger downstream pipeline, log event
        await on_sync_completed(collection_id, source_type)

    elif event_type == "sync.failed":
        # Alert team, mark collection as degraded
        await on_sync_failed(payload)

    return Response(status_code=200)

In production, you would also verify the Svix signature headers (svix-id, svix-timestamp, svix-signature) to ensure payloads are authentic. The webhooks setup guide covers the full verification flow.

Looking Ahead

The error monitoring case study in this series runs on a 5-minute cron schedule. With webhooks, that same pipeline could become fully event-driven: trigger a reanalysis the moment fresh code context or ticket data lands in the collection, rather than waiting for the next scheduled run.

This shift from scheduled to reactive is a broader pattern in context engineering. As agents take on more responsibility and operate over longer time horizons, the systems feeding them context need to be equally responsive. Webhooks are the mechanism that makes that possible.

5 min read

MCP Server

AI agents are only useful if they can access the right information at the right time. The previous articles covered how Airweave syncs data into collections and how your code can search those collections via the SDK or REST API. But there's a growing class of AI applications where you don't write the search logic yourself. Tools like Cursor, Claude Desktop, VS Code Copilot, and the OpenAI Agent Builder all have their own agent loops. They decide when to search, what to search for, and how to use the results.

The question becomes: how do you give these agents access to your Airweave collections without building custom middleware?

The Integration Gap

When you build a custom agent using the SDK, you control the entire flow. You decide when to call collections.search(), how to format the results, and what to do with them. This works well for purpose-built systems like the error monitoring agent described earlier in this series.

But most developers also use general-purpose AI assistants throughout their day. You ask Cursor to refactor a function, and it needs to understand your codebase conventions. You ask Claude Desktop to draft a response to a customer, and it needs context from your support tickets. You ask an OpenAI agent to summarize project status, and it needs data from Linear and Slack.

Each of these assistants has its own way of discovering and calling external tools. Without a shared protocol, connecting Airweave to each one would require writing a separate integration for every client. This is the problem the Model Context Protocol solves.

What is MCP?

The Model Context Protocol (MCP) is an open standard for connecting AI applications to external data sources and tools. It defines how an AI assistant discovers available tools, understands their parameters, and calls them during its reasoning loop.

Think of MCP as a USB-C port for AI agents. Just as USB-C standardized how devices connect to peripherals, MCP standardizes how AI assistants connect to external capabilities. An MCP server exposes a set of tools. An MCP client (the AI assistant) discovers those tools and calls them when needed.

This matters for context retrieval because it turns search from something you code into something the agent does natively. Instead of writing glue code that intercepts the agent's reasoning and injects search results, the agent itself decides when to search your Airweave collection, formulates the query, and incorporates the results into its response.

How the Airweave MCP Server Works

Airweave ships an MCP server that exposes your collections as searchable tools. When an AI assistant connects to it, the assistant sees a tool called search-{collection} (for example, search-engineering-context) and can call it with a natural language query at any point during its reasoning.

The server supports two deployment modes:

Local mode runs as a process on your machine, communicating over stdio. This is the standard setup for desktop AI clients like Cursor, Claude Desktop, and VS Code. You configure it with your API key and collection ID, and the assistant discovers it automatically.

{
  "mcpServers": {
    "airweave-search": {
      "command": "npx",
      "args": ["-y", "airweave-mcp-search"],
      "env": {
        "AIRWEAVE_API_KEY": "your-api-key",
        "AIRWEAVE_COLLECTION": "your-collection-id"
      }
    }
  }
}

Hosted mode runs as a stateless HTTP service at https://mcp.airweave.ai/mcp, designed for cloud-based AI platforms. Each request is fully independent, with authentication and collection selection happening via HTTP headers. No sessions, no server-side state. This is the setup for platforms like the OpenAI Agent Builder, where you can't run a local process.

In both modes, the MCP server is a thin wrapper around the Airweave SDK. It validates the agent's parameters, calls the search API, and formats results for the assistant. The architecture is deliberately simple:

The server never caches results or maintains state between requests. Every search hits the live collection, which means results always reflect the latest synced data.

What the Agent Can Do

The primary tool, search-{collection}, accepts the same parameters as Airweave's search API. The agent can control the search strategy (hybrid, neural, or keyword), set result limits, apply recency bias, enable reranking, and choose between raw results or an AI-generated summary.

What makes this powerful is that the agent maps natural language to these parameters automatically. When a developer asks Cursor "find the most recent docs about authentication," the assistant translates that into a search call with an appropriate recency bias. When someone asks "give me a summary of our onboarding flow," the assistant sets response_type: "completion" to get a synthesized answer rather than raw chunks.

This is the key difference from SDK-based integration. With the SDK, you decide the search parameters at development time. With MCP, the agent adapts its search strategy to each query at runtime. It can start broad, narrow based on initial results, and adjust parameters on the fly, all within its own reasoning loop.

Where This Fits

MCP and SDK-based integration serve different use cases, and most teams end up using both.

Use MCP when the AI assistant controls the reasoning loop. Cursor deciding when to search your codebase. Claude Desktop pulling context while drafting a document. An OpenAI agent answering questions about your project. In these cases, you want the assistant to discover and use your collection as a native capability without you writing search logic.

Use the SDK when you control the reasoning loop. A custom pipeline that searches specific sources with specific filters at specific times, like the error monitoring agent. Programmatic workflows where search is one step in a larger orchestration. Scenarios where you need precise control over query construction and result processing.

The two approaches share the same underlying infrastructure. Whether a search comes from Cursor via MCP or from your Python script via the SDK, it hits the same Airweave search API, queries the same Vespa index, and returns results ranked by the same hybrid retrieval pipeline. The difference is who decides when and how to search.

A Practical Example

Consider a team that has connected GitHub, Linear, and Slack to an Airweave collection called engineering-context. Here's what their setup enables:

A developer working in Cursor asks: "What's the context behind the rate limiting changes in the payments service?" Cursor calls search-engineering-context with that query. Airweave returns relevant GitHub PRs, Linear tickets, and Slack discussions. Cursor synthesizes the context and explains the history behind the changes, with source attribution. The same MCP server works identically in Claude Desktop, VS Code, or any other MCP-compatible client.

The same collection also powers their error monitoring agent via the SDK, which runs on a schedule and searches with specific source filters (source_name: "GitHub", source_name: "Linear").

One collection, two access patterns (MCP for interactive assistants, SDK for programmatic pipelines), zero custom integration code per client.

Looking Ahead

MCP is still a young protocol, but adoption is accelerating across AI clients. As more assistants support MCP natively, the value of having your organizational context available through a single MCP server compounds. Every new AI tool your team adopts gets immediate access to the same retrieval layer.

This connects to a broader theme running through this series: context retrieval works best when it's treated as shared infrastructure rather than something each application builds from scratch. MCP makes that infrastructure accessible not just to your code, but to every AI assistant or application in your workflow.

6 min read