Google Gemini

Knwler supports Google Gemini as an LLM backend for all pipeline stages — schema discovery, extraction, consolidation, community labelling, and enrichments. It also has a dedicated Gemini Batch API processor that cuts costs by 50 % when processing large document collections.


Quick Start

1. Get an API key

Create a key at Google AI Studio and export it:

export GEMINI_API_KEY="your-key-here"

2. Extract a single document

python main.py extract -f document.pdf --backend gemini

That’s it. Knwler resolves the default model (gemini-3.1-flash-lite-preview), the base URL, and the API key automatically.


How Gemini is integrated

Knwler uses Gemini’s OpenAI-compatible endpoint, so no separate SDK is needed for standard extraction — just httpx (already a dependency).

flowchart LR
    KW[Knwler pipeline] -->|POST /chat/completions| GE["https://generativelanguage.googleapis.com\n/v1beta/openai"]
    GE -->|JSON response| KW
    KW --> C[(~/.knwler/cache/)]

Every request is cached by a hash of (prompt, model, temperature, num_predict), so re-runs on the same content are free.


Default models

Role Model
Schema discovery gemini-3.1-flash-lite-preview
Graph extraction gemini-3.1-flash-lite-preview

You can override either with --discovery-model / --extraction-model:

python main.py extract -f report.pdf --backend gemini \
    --discovery-model gemini-3-flash-preview \
    --extraction-model gemini-3-flash-preview

CLI options

All standard extract flags work with the Gemini backend:

Flag Default Description
--backend gemini Select Gemini
--extraction-model gemini-3.1-flash-lite-preview Model for chunk extraction
--discovery-model gemini-3.1-flash-lite-preview Model for schema/language discovery
--max-tokens 400 Chunk size in tokens
--max-concurrent 8 Parallel API calls
--no-cache Disable response caching

Examples

# Minimal
python main.py extract -f paper.pdf --backend gemini

# Fetch a webpage and extract
python main.py fetch https://en.wikipedia.org/wiki/Alan_Turing --backend gemini

# Larger model, bigger chunks
python main.py extract -f report.pdf --backend gemini \
    --extraction-model gemini-3-flash-preview \
    --max-tokens 600

# Disable caching (always hit the API)
python main.py extract -f notes.md --backend gemini --no-cache

Programmatic API

from knwler.api import extract_file
from knwler.config import Config

config = Config(
    backend="gemini",
    # api_key="...",  # or use GEMINI_API_KEY env var
    # extraction_model="gemini-3-flash-preview",
)

graph = await extract_file("report.pdf", config)
print(graph["title"])

You can also pass a custom base URL if you are using a proxy or a different Gemini-compatible endpoint:

config = Config(
    backend="gemini",
    base_url="https://my-proxy.example.com/v1beta/openai",
)

Batch Processing

For bulk document processing, knwler ships a dedicated Gemini Batch processor that submits all LLM calls through the Gemini Batch API — delivering a 50 % cost reduction compared to real-time calls.

Additional dependency

uv add google-genai

Pipeline overview

The batch processor runs three sequential rounds. Within each round all documents are processed in a single batch job submitted to Google’s Batch API, waited on, then parsed before the next round begins.

sequenceDiagram
    participant CLI as batch-gemini CLI
    participant DB as SQLite state
    participant GBA as Gemini Batch API
    participant FS as Output files

    CLI->>DB: scan & chunk all documents
    CLI->>GBA: Round 1 — language + schema requests (JSONL)
    GBA-->>CLI: poll until JOB_STATE_SUCCEEDED
    CLI->>DB: store language + schemas

    CLI->>GBA: Round 2 — title + summary + rephrase + extraction (JSONL)
    GBA-->>CLI: poll until JOB_STATE_SUCCEEDED
    CLI->>DB: store titles, summaries, rephrases, per-chunk graphs

    CLI->>GBA: Round 3 — consolidation summaries + community labels (JSONL)
    GBA-->>CLI: poll until JOB_STATE_SUCCEEDED
    CLI->>DB: store final consolidated graph

    CLI->>FS: write graph.json + index.html per document

Each round is resumable — if the process is interrupted, re-running the same command picks up from where it left off via the SQLite state database.

Start or resume processing

python main.py batch-gemini run \
    --input ./documents \
    --output ./results

Check pipeline status

python main.py batch-gemini status \
    --input ./documents \
    --output ./results

Use specific models

python main.py batch-gemini run \
    --input ./documents \
    --output ./results \
    --discovery-model gemini-3-flash-preview \
    --extraction-model gemini-3-flash-preview

Consolidate all graphs at the end

python main.py batch-gemini run \
    --input ./documents \
    --output ./results \
    --consolidate

Or consolidate manually afterwards:

python main.py consolidate --dir ./results --output ./merged

Supported file types

.pdf, .txt, .md, .text, .markdown

State database

A file called batch_gemini.db is created in the output directory. It stores:

  • Per-document text, chunks, language, schema, extracted graphs, and final output.
  • Per-round batch job names, statuses, and timings.

Important: If you add new documents or want to reprocess from scratch, delete the output directory first. The database is designed for crash recovery, not incremental updates.


Environment variables

Variable Purpose
GEMINI_API_KEY Primary API key (checked first)
GOOGLE_API_KEY Fallback API key (batch processor only)

Polling behaviour

The batch processor polls with exponential backoff:

  • Initial interval: 30 seconds
  • Maximum interval: 5 minutes
  • Terminal states: JOB_STATE_SUCCEEDED, JOB_STATE_FAILED, JOB_STATE_CANCELLED, JOB_STATE_EXPIRED

Google commits to processing batch jobs within 24 hours; in practice most jobs complete in minutes.


Cost comparison

Mode Relative cost Latency
Real-time (extract) Immediate
Batch (batch-gemini) ~0.5× Minutes – hours

Use real-time extraction for interactive use and single documents. Use batch processing when handling tens or hundreds of documents.