Knwler is an enterprise document intelligence platform that extracts structured knowledge graphs from documents using Large Language Models. It identifies entities, relationships, and topics from PDFs, text files, and Markdown — producing interactive reports and exports for graph analytics platforms.

Can Knwler run fully on-premise?

Yes. Knwler supports fully air-gapped operation via Ollama with open-weight models. No data leaves your infrastructure, making it suitable for classified and regulated environments with strict data sovereignty requirements.

What languages does Knwler support?

Knwler auto-detects document language and supports English, German, French, Spanish, and Dutch out of the box. Additional languages can be added by extending a single configuration file.

What export formats are available?

Knwler exports to JSON, GML, GraphML, and interactive HTML. These can be imported directly into Neo4j, Gephi, yEd, Memgraph, SurrealDB, or used to generate vector embeddings for semantic search.

How much does it cost to process a document?

Using cloud LLMs (OpenAI GPT-4o), processing costs approximately $0.20 per 20-page document. Running on-premise with local models is completely free after initial setup. Intelligent caching means re-runs cost nothing.

Google Gemini

Knwler supports Google Gemini as an LLM backend for all pipeline stages — schema discovery, extraction, consolidation, community labelling, and enrichments. It also has a dedicated Gemini Batch API processor that cuts costs by 50 % when processing large document collections.

Quick Start

1. Get an API key

Create a key at Google AI Studio and export it:

export GEMINI_API_KEY="your-key-here"

2. Extract a single document

python main.py extract -f document.pdf --backend gemini

That’s it. Knwler resolves the default model (gemini-3.1-flash-lite-preview), the base URL, and the API key automatically.

How Gemini is integrated

Knwler uses Gemini’s OpenAI-compatible endpoint, so no separate SDK is needed for standard extraction — just httpx (already a dependency).

flowchart LR
    KW[Knwler pipeline] -->|POST /chat/completions| GE["https://generativelanguage.googleapis.com\n/v1beta/openai"]
    GE -->|JSON response| KW
    KW --> C[(~/.knwler/cache/)]

Every request is cached by a hash of (prompt, model, temperature, num_predict), so re-runs on the same content are free.

Default models

Role	Model
Schema discovery	`gemini-3.1-flash-lite-preview`
Graph extraction	`gemini-3.1-flash-lite-preview`

You can override either with --discovery-model / --extraction-model:

python main.py extract -f report.pdf --backend gemini \
    --discovery-model gemini-3-flash-preview \
    --extraction-model gemini-3-flash-preview

CLI options

All standard extract flags work with the Gemini backend:

Flag	Default	Description
`--backend gemini`	—	Select Gemini
`--extraction-model`	`gemini-3.1-flash-lite-preview`	Model for chunk extraction
`--discovery-model`	`gemini-3.1-flash-lite-preview`	Model for schema/language discovery
`--max-tokens`	`400`	Chunk size in tokens
`--max-concurrent`	`8`	Parallel API calls
`--no-cache`	—	Disable response caching

Examples

# Minimal
python main.py extract -f paper.pdf --backend gemini

# Fetch a webpage and extract
python main.py fetch https://en.wikipedia.org/wiki/Alan_Turing --backend gemini

# Larger model, bigger chunks
python main.py extract -f report.pdf --backend gemini \
    --extraction-model gemini-3-flash-preview \
    --max-tokens 600

# Disable caching (always hit the API)
python main.py extract -f notes.md --backend gemini --no-cache

Programmatic API

from knwler.api import extract_file
from knwler.config import Config

config = Config(
    backend="gemini",
    # api_key="...",  # or use GEMINI_API_KEY env var
    # extraction_model="gemini-3-flash-preview",
)

graph = await extract_file("report.pdf", config)
print(graph["title"])

You can also pass a custom base URL if you are using a proxy or a different Gemini-compatible endpoint:

config = Config(
    backend="gemini",
    base_url="https://my-proxy.example.com/v1beta/openai",
)

Batch Processing

For bulk document processing, knwler ships a dedicated Gemini Batch processor that submits all LLM calls through the Gemini Batch API — delivering a 50 % cost reduction compared to real-time calls.

Additional dependency

uv add google-genai

Pipeline overview

The batch processor runs three sequential rounds. Within each round all documents are processed in a single batch job submitted to Google’s Batch API, waited on, then parsed before the next round begins.

sequenceDiagram
    participant CLI as batch-gemini CLI
    participant DB as SQLite state
    participant GBA as Gemini Batch API
    participant FS as Output files

    CLI->>DB: scan & chunk all documents
    CLI->>GBA: Round 1 — language + schema requests (JSONL)
    GBA-->>CLI: poll until JOB_STATE_SUCCEEDED
    CLI->>DB: store language + schemas

    CLI->>GBA: Round 2 — title + summary + rephrase + extraction (JSONL)
    GBA-->>CLI: poll until JOB_STATE_SUCCEEDED
    CLI->>DB: store titles, summaries, rephrases, per-chunk graphs

    CLI->>GBA: Round 3 — consolidation summaries + community labels (JSONL)
    GBA-->>CLI: poll until JOB_STATE_SUCCEEDED
    CLI->>DB: store final consolidated graph

    CLI->>FS: write graph.json + index.html per document

Each round is resumable — if the process is interrupted, re-running the same command picks up from where it left off via the SQLite state database.

Start or resume processing

python main.py batch-gemini run \
    --input ./documents \
    --output ./results

Check pipeline status

python main.py batch-gemini status \
    --input ./documents \
    --output ./results

Use specific models

python main.py batch-gemini run \
    --input ./documents \
    --output ./results \
    --discovery-model gemini-3-flash-preview \
    --extraction-model gemini-3-flash-preview

Consolidate all graphs at the end

python main.py batch-gemini run \
    --input ./documents \
    --output ./results \
    --consolidate

Or consolidate manually afterwards:

python main.py consolidate --dir ./results --output ./merged

Supported file types

.pdf, .txt, .md, .text, .markdown

State database

A file called batch_gemini.db is created in the output directory. It stores:

Per-document text, chunks, language, schema, extracted graphs, and final output.
Per-round batch job names, statuses, and timings.

Important: If you add new documents or want to reprocess from scratch, delete the output directory first. The database is designed for crash recovery, not incremental updates.

Environment variables

Variable	Purpose
`GEMINI_API_KEY`	Primary API key (checked first)
`GOOGLE_API_KEY`	Fallback API key (batch processor only)

Polling behaviour

The batch processor polls with exponential backoff:

Initial interval: 30 seconds
Maximum interval: 5 minutes
Terminal states: JOB_STATE_SUCCEEDED, JOB_STATE_FAILED, JOB_STATE_CANCELLED, JOB_STATE_EXPIRED

Google commits to processing batch jobs within 24 hours; in practice most jobs complete in minutes.

Cost comparison

Mode	Relative cost	Latency
Real-time (`extract`)	1×	Immediate
Batch (`batch-gemini`)	~0.5×	Minutes – hours

Use real-time extraction for interactive use and single documents. Use batch processing when handling tens or hundreds of documents.