Airweave

The Airweave CLI

10 min read

10 min read

The previous articles in this series covered how to interact with Airweave through the SDK, the REST API, and the MCP Server. Each of these interfaces is designed for a specific context: the SDK for Python and Node.js applications, the API for language-agnostic HTTP access, and MCP for AI assistants that manage their own reasoning loops.

But there is a fourth interface that sits closer to the developer's and agent's workflows than any of these: the terminal. Whether you are debugging a query, verifying that a sync completed, or wiring Airweave into a shell script, the terminal is often the fastest path between a question and an answer.

This article introduces the Airweave CLI, a standalone command-line tool that brings the full Airweave workflow (authentication, collection management, source connections, search, and sync orchestration) into your terminal.

Why a CLI?

There is an increasingly clear pattern in how AI agents interact with external tools: many of them just shell out to a CLI. Rather than negotiating protocol handshakes, managing session state, or discovering tool schemas at runtime, an agent can invoke a CLI command as a subprocess, capture the JSON output, and move on. It is the simplest possible integration surface: a process that takes arguments and returns structured data on stdout.

This pattern has gained significant traction as developers build more agentic systems. MCP is powerful when the AI client supports it natively, but not every agent runtime speaks MCP. Not every orchestration framework has an MCP client. And even when MCP is available, a CLI call is often more predictable and easier to debug. There is no connection lifecycle, no tool discovery step, no protocol version mismatch. The agent runs a command, reads the output, and that's it.

The Airweave CLI was built with this pattern in mind. It is designed to serve both human developers working interactively and AI agents that need a reliable, scriptable interface to Airweave's search and management capabilities. The design follows two principles:

Immediacy: Every Airweave operation should be one command away. No boilerplate, no imports, no request bodies.

Dual-mode output: The CLI adapts its behavior based on who is using it. In an interactive terminal, it renders rich output with colors, spinners, and formatted tables. When piped to another command or run in a non-TTY environment, it automatically outputs clean JSON. This makes the same tool useful for both human developers and automated agents without any configuration changes.

The dual-mode design is what makes CLI-based agent integration work cleanly. A developer troubleshooting search relevance sees readable, formatted results. An AI agent calling airweave search as a subprocess gets structured JSON it can parse directly. A CI/CD pipeline triggering a sync gets the same structured output.

Installation

The CLI is distributed as a standalone Python package on PyPI:

It is also available via npm:

Or installed from source for contributors and those who want to pin to a specific commit. After installation, verify with:

The CLI has minimal dependencies: typer for command parsing, rich for terminal UI (spinners, tables, panels, markdown rendering), questionary for interactive prompts, and httpx for HTTP communication. It supports Python 3.9 through 3.13 and does not require the main Airweave repository or any backend services to be installed locally.

Authentication

Before using the CLI, you authenticate with your Airweave account. There are two methods, each suited to a different context.

Device Code login (OAuth) uses the Auth0 Device Code flow. The CLI displays a URL and a one-time code in your terminal. You open the URL in any browser (even on a different device), enter the code, and sign in. The CLI polls for completion and stores the resulting access token and organization ID in ~/.airweave/config.json. If your account belongs to multiple organizations, the CLI prompts you to select one. This is the recommended method for personal development environments.

API key login prompts for your API key, base URL, and an optional default collection. The key is validated against the API before being saved. This is the method for headless environments, CI/CD pipelines, and remote servers. You can also skip the prompt entirely by setting the AIRWEAVE_API_KEY environment variable.

You can check your current authentication state at any time with airweave auth status, which prints the active email, organization, base URL, and default collection. To clear stored credentials, use airweave auth logout.

The CLI resolves configuration in a strict priority order: CLI flags take precedence over environment variables, which take precedence over the config file, which falls back to defaults. This means you can override any stored configuration for a single command without modifying your saved settings.

Three environment variables control the core configuration:

Variable

Description

AIRWEAVE_API_KEY

API key (overrides config file)

AIRWEAVE_BASE_URL

API base URL (default: https://api.airweave.ai)

AIRWEAVE_COLLECTION

Default collection readable ID

Searching

Search is the primary use case. Once authenticated with a default collection set, searching is a single command:

If you have multiple collections, specify which one to search:

The --top-k flag (short: -k) controls how many results are returned, defaulting to 10. Behind the scenes, the CLI calls the same hybrid search pipeline described in earlier articles: semantic similarity via embeddings, keyword matching via BM25, and cross-encoder reranking.

In interactive mode, the CLI renders a rich output: an AI-generated answer (when available) followed by scored result panels, each showing the source type, relevance score, a content snippet, and a URL back to the original document. In piped mode, the same data is emitted as a JSON object with a results array, making it trivial to chain with other tools:

airweave search "refund policy" | jq -r '.results[0]
airweave search "refund policy" | jq -r '.results[0]
airweave search "refund policy" | jq -r '.results[0]
airweave search "refund policy" | jq -r '.results[0]

You can force JSON output even in an interactive terminal with the --json flag:

airweave search "deploy steps" --json | jq '.results[0]
airweave search "deploy steps" --json | jq '.results[0]
airweave search "deploy steps" --json | jq '.results[0]
airweave search "deploy steps" --json | jq '.results[0]

This piping capability is what makes the CLI useful beyond manual exploration. A shell script can search Airweave for context, extract the top result, and pass it as input to an LLM API call, all in a single pipeline. An AI coding agent running in a terminal environment can invoke airweave search as a subprocess and parse the JSON output directly.

Managing Collections

The CLI provides full collection lifecycle management. You can list all collections visible to your account, create new ones, and inspect their details.

Listing collections gives you a quick overview of your workspace:

Creating a collection requires a display name. You can optionally provide a custom readable ID; if omitted, one is auto-generated:

Inspecting a specific collection returns its metadata, including the readable ID, creation timestamp, and connected source count:

Managing Sources

Source connections represent authenticated links between a collection and an external data source. The CLI lets you list existing connections, add new ones, and trigger syncs.

Listing source connections for a collection:

Adding a new source connection requires specifying the connector type, the target collection, and a display name. For API-key-based sources, you pass credentials directly via the --credentials flag as a JSON string. For OAuth-based sources, the CLI returns an auth_url that you open in your browser to complete authorization:

Source-specific configuration can be passed via the --config flag as a JSON string when the connector supports additional options beyond authentication.

By default, adding a source connection triggers an immediate sync. You can suppress this with --no-sync if you want to configure the connection first and sync later.

Triggering a sync manually on an existing source connection:

airweave sources sync <source-connection-id>
airweave sources sync <source-connection-id>
airweave sources sync <source-connection-id>
airweave sources sync <source-connection-id>

For situations where you need a complete re-index rather than an incremental update, use the --force flag:

airweave sources sync <source-connection-id>
airweave sources sync <source-connection-id>
airweave sources sync <source-connection-id>
airweave sources sync <source-connection-id>

A forced sync discards the incremental state and re-processes all data from the source. This is useful after schema changes or if you suspect the incremental state has drifted.

Self-Hosted Instances

The CLI works with both the hosted Airweave platform and self-hosted deployments. To point the CLI at your own instance, pass the base URL during authentication:

Or set the environment variable:

For self-hosted instances with custom Auth0 configurations, additional environment variables (AIRWEAVE_AUTH0_DOMAIN, AIRWEAVE_AUTH0_CLIENT_ID, AIRWEAVE_AUTH0_AUDIENCE) allow full control over the OAuth flow.

Where the CLI Fits

The CLI, SDK, REST API, and MCP Server all hit the same underlying Airweave infrastructure. A search from the CLI queries the same Vespa index and runs the same hybrid retrieval pipeline as a search from the SDK or MCP Server. The difference is in ergonomics and context of use.

Use the CLI for agents when your agent runtime does not support MCP, when you want the simplest possible integration path, or when you need predictable subprocess-based tool calling. An agent that can run shell commands can use Airweave immediately. No SDK installation, no protocol negotiation, no client library. The agent calls airweave search "query" --json, parses the output, and has its context. This is increasingly the preferred pattern for agentic systems that orchestrate multiple external tools, because CLIs are the lowest-common-denominator interface that works everywhere.

Use the CLI for exploration when you want to test search queries interactively, verify that a sync completed, create collections and add sources as part of an onboarding workflow, or debug search results by inspecting the raw JSON output.

Use the CLI in automation when you need Airweave in a shell script, a CI/CD pipeline, a Makefile, or any environment where installing the SDK is impractical. The JSON output mode makes the CLI a clean interface for scripting.

Use the SDK when you need programmatic control in a Python or Node.js application. Custom error handling, async operations, advanced search filters, and tight integration with your application logic.

Use MCP when an AI assistant that natively supports MCP (Cursor, Claude Desktop, VS Code Copilot) should search your collections as part of its reasoning loop. MCP lets the assistant discover the tool and decide when to call it. For agents that do support MCP natively, it remains the richer protocol. But for everything else, the CLI is the pragmatic choice.

These interfaces are complementary. A common workflow is to prototype a search query in the CLI, verify the results look right, then move the query into SDK code for production use. Another is to give an autonomous agent CLI access to Airweave search alongside other CLI tools it already uses, keeping the integration surface uniform and debuggable.

A Practical Example

Consider a developer who has just connected their GitHub and Notion sources to an engineering-docs collection via the web UI. They want to verify that the sync worked and test a few search queries before integrating search into their agent.

First, authenticate and set the default collection:

During the login prompt, they enter their API key and set engineering-docs as the default collection. Now they can search without specifying the collection each time:

The CLI returns formatted results from both GitHub code files and Notion documentation. The developer refines their query:

Better results. They want to see the raw JSON to understand the score distribution:

airweave search "rate limiter implementation token bucket" --json | jq '[.results[] | {source: .source_name, score: .score}]
airweave search "rate limiter implementation token bucket" --json | jq '[.results[] | {source: .source_name, score: .score}]
airweave search "rate limiter implementation token bucket" --json | jq '[.results[] | {source: .source_name, score: .score}]
airweave search "rate limiter implementation token bucket" --json | jq '[.results[] | {source: .source_name, score: .score}]

This outputs a clean array of source names and scores, confirming that the most relevant code file ranks highest. Satisfied, the developer moves the query into their agent's SDK code, knowing exactly what results to expect.

Looking Ahead

The CLI brings Airweave's full capability set into the environment where developers and agents both operate: the terminal. It is designed to be fast for exploration, reliable for automation, and transparent in its output.

The trend toward CLI-based agent tooling reflects a broader insight: the best integration surface for an AI agent is often the simplest one. Protocols like MCP offer rich capabilities when the client supports them, but a CLI command that takes arguments and returns JSON works with any agent runtime, any orchestration framework, and any programming language. It is the universal interface.

As Airweave's feature set grows, the CLI grows with it. Every new API capability (new search modes, new source types, new collection configuration options) becomes available as a CLI command, following the same pattern: immediate access, dual-mode output, and predictable configuration resolution.

For developers building agents that need to search across multiple data sources, the CLI is the quickest way to give those agents access to Airweave, whether the agent is a sophisticated orchestration system or a simple loop that shells out to external tools.

On this page

No headings found on page