/

/

Webhooks

/

/

Webhooks

Airweave

Webhooks

5 min read

5 min read

5 min read

5 min read

The previous articles in this series covered a pull-based workflow: create a collection, add source connections, let Airweave sync your data, then search. This works well for getting started, but production systems rarely operate in request-response mode. Your agent pipeline, your dashboard, your alerting system, they all need to know when fresh context is available, not just that it exists somewhere.

This article covers how Airweave's webhook system enables event-driven context pipelines that react to sync lifecycle changes in real time.

The Polling Problem

The most common approach to tracking sync status is polling: hit the API every few seconds, check if the sync completed, and proceed when it has. This is simple to implement and works fine during development.

It breaks down in production for a few reasons. First, most polls return nothing useful. If a sync takes three minutes and you poll every five seconds, 35 out of 36 requests are wasted. At scale, across dozens of source connections syncing on different schedules, this adds up to significant unnecessary API load.

Second, polling introduces latency gaps. If you poll every 30 seconds to be efficient, your downstream pipeline might wait up to 30 seconds after a sync completes before it even notices. For agents that serve user-facing queries, that delay means serving stale context when fresh context is already available.

Third, and most importantly, polling gives you no visibility into failures. If a sync fails silently between polling intervals, your pipeline continues operating as if nothing happened. The agent keeps serving results from the last successful sync without any indication that its context source is degraded. This is a form of context rot that's especially dangerous because it's invisible.

Event-Driven Sync with Webhooks

Airweave solves this with webhooks: instead of asking whether something happened, you register an endpoint and Airweave tells you the moment it does.

A Webhook is an HTTP callback triggered by a system event. When the event occurs, the source system sends an HTTP POST request to a pre-registered URL with a payload describing what happened.

In Airweave, webhooks are tied to the sync job lifecycle. Every sync transitions through a series of states, and each transition can fire a webhook event:

Event

When it fires

sync.pending

Job created and waiting to start

sync.running

Job begins processing

sync.completed

All data synced without errors

sync.failed

Job encountered an error

sync.cancelled

Job was manually cancelled

Most production integrations only subscribe to sync.completed and sync.failed. The others are useful for granular progress tracking (powering real-time UI indicators, for example) but aren't required for reactive pipelines.

When an event fires, Airweave delivers it via Svix, which handles retries, signature verification, and delivery guarantees. The delivery flow looks like this:

Each delivery is an HTTP POST with a JSON payload containing the event type, job ID, collection details, source type, and timestamp:

{
  "event_type": "sync.completed",
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "collection_readable_id": "engineering-context-ab123",
  "collection_name": "Engineering Context",
  "source_connection_id": "660e8400-e29b-41d4-a716-446655440001",
  "source_type": "github",
  "status": "completed",
  "timestamp": "2025-01-15T14:30:00Z"
}
{
  "event_type": "sync.completed",
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "collection_readable_id": "engineering-context-ab123",
  "collection_name": "Engineering Context",
  "source_connection_id": "660e8400-e29b-41d4-a716-446655440001",
  "source_type": "github",
  "status": "completed",
  "timestamp": "2025-01-15T14:30:00Z"
}
{
  "event_type": "sync.completed",
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "collection_readable_id": "engineering-context-ab123",
  "collection_name": "Engineering Context",
  "source_connection_id": "660e8400-e29b-41d4-a716-446655440001",
  "source_type": "github",
  "status": "completed",
  "timestamp": "2025-01-15T14:30:00Z"
}
{
  "event_type": "sync.completed",
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "collection_readable_id": "engineering-context-ab123",
  "collection_name": "Engineering Context",
  "source_connection_id": "660e8400-e29b-41d4-a716-446655440001",
  "source_type": "github",
  "status": "completed",
  "timestamp": "2025-01-15T14:30:00Z"
}

The payload includes both collection_readable_id and source_type, which means your handler can route events precisely. You might handle a GitHub sync completion differently from a Slack sync completion, even within the same collection.

Patterns for Reactive Context Pipelines

Webhooks on their own are just a delivery mechanism. The value comes from what you build on top of them. Here are four patterns that show up consistently in production deployments.

Cache Invalidation

If your agent maintains a retrieval cache (storing recent search results to avoid repeated queries), you need a reliable signal to bust that cache when underlying data changes. Without webhooks, you either set aggressive TTLs (wasting the cache) or conservative TTLs (serving stale results).

With a sync.completed webhook, the answer is simple: invalidate cache entries for the affected collection the moment new data lands. Your agent serves cached results right up until fresh context is available, then seamlessly switches over. No polling, no arbitrary TTLs, no stale windows.

Failure Alerting

A sync.failed event is your signal that an agent's context source is degraded. The right response depends on your system, but common patterns include sending a Slack notification to the engineering channel, creating an incident in PagerDuty, or flagging the affected collection as stale in your agent's metadata so it can warn users that results may be outdated.

This is especially important for agents that operate autonomously. If a GitHub sync fails and your error monitoring agent (as described in the case study article) continues running, it will analyze errors without fresh code context. A webhook-triggered alert ensures someone knows about the degradation before it compounds.

Chained Pipelines

Some workflows depend on sequential data freshness. For example, your error monitoring pipeline might need both GitHub code context and Linear ticket context to be current before running an analysis cycle.

With webhooks, you can chain these dependencies: when the GitHub source completes syncing, check whether the Linear source has also completed recently. If both are fresh, trigger the analysis pipeline. If not, wait for the second event. This turns a cron-based "run every 5 minutes and hope everything is fresh" approach into a precise, event-driven pipeline that runs exactly when its dependencies are satisfied.

Audit and Observability

Every webhook event is a structured log of when each source was last synced, whether it succeeded, and how long it took (derivable from the gap between sync.running and sync.completed timestamps). Routing these events to your logging system gives you a complete audit trail of context freshness across all collections.

This is valuable for debugging ("when was the last time the Slack source synced successfully?") and for compliance scenarios where you need to prove that your agent's context was current at the time it produced a given output.

A Minimal Webhook Handler

To tie these patterns together, here's a minimal FastAPI handler that receives Airweave webhook events and routes them by type:

from fastapi import FastAPI, Request, Response

app = FastAPI()

@app.post("/webhooks/airweave")
async def handle_webhook(request: Request):
    payload = await request.json()
    event_type = payload["event_type"]

    if event_type == "sync.completed":
        collection_id = payload["collection_readable_id"]
        source_type = payload["source_type"]
        # Invalidate cache, trigger downstream pipeline, log event
        await on_sync_completed(collection_id, source_type)

    elif event_type == "sync.failed":
        # Alert team, mark collection as degraded
        await on_sync_failed(payload)

    return Response(status_code=200)
from fastapi import FastAPI, Request, Response

app = FastAPI()

@app.post("/webhooks/airweave")
async def handle_webhook(request: Request):
    payload = await request.json()
    event_type = payload["event_type"]

    if event_type == "sync.completed":
        collection_id = payload["collection_readable_id"]
        source_type = payload["source_type"]
        # Invalidate cache, trigger downstream pipeline, log event
        await on_sync_completed(collection_id, source_type)

    elif event_type == "sync.failed":
        # Alert team, mark collection as degraded
        await on_sync_failed(payload)

    return Response(status_code=200)
from fastapi import FastAPI, Request, Response

app = FastAPI()

@app.post("/webhooks/airweave")
async def handle_webhook(request: Request):
    payload = await request.json()
    event_type = payload["event_type"]

    if event_type == "sync.completed":
        collection_id = payload["collection_readable_id"]
        source_type = payload["source_type"]
        # Invalidate cache, trigger downstream pipeline, log event
        await on_sync_completed(collection_id, source_type)

    elif event_type == "sync.failed":
        # Alert team, mark collection as degraded
        await on_sync_failed(payload)

    return Response(status_code=200)
from fastapi import FastAPI, Request, Response

app = FastAPI()

@app.post("/webhooks/airweave")
async def handle_webhook(request: Request):
    payload = await request.json()
    event_type = payload["event_type"]

    if event_type == "sync.completed":
        collection_id = payload["collection_readable_id"]
        source_type = payload["source_type"]
        # Invalidate cache, trigger downstream pipeline, log event
        await on_sync_completed(collection_id, source_type)

    elif event_type == "sync.failed":
        # Alert team, mark collection as degraded
        await on_sync_failed(payload)

    return Response(status_code=200)

In production, you would also verify the Svix signature headers (svix-id, svix-timestamp, svix-signature) to ensure payloads are authentic. The webhooks setup guide covers the full verification flow.

Looking Ahead

The error monitoring case study in this series runs on a 5-minute cron schedule. With webhooks, that same pipeline could become fully event-driven: trigger a reanalysis the moment fresh code context or ticket data lands in the collection, rather than waiting for the next scheduled run.

This shift from scheduled to reactive is a broader pattern in context engineering. As agents take on more responsibility and operate over longer time horizons, the systems feeding them context need to be equally responsive. Webhooks are the mechanism that makes that possible.

On this page

No headings found on page