flowchart LR
KW[Knwler pipeline] -->|POST /chat/completions| GE["https://generativelanguage.googleapis.com\n/v1beta/openai"]
GE -->|JSON response| KW
KW --> C[(~/.knwler/cache/)]
Google Gemini
Knwler supports Google Gemini as an LLM backend for all pipeline stages — schema discovery, extraction, consolidation, community labelling, and enrichments. It also has a dedicated Gemini Batch API processor that cuts costs by 50 % when processing large document collections.
Quick Start
1. Get an API key
Create a key at Google AI Studio and export it:
export GEMINI_API_KEY="your-key-here"2. Extract a single document
python main.py extract -f document.pdf --backend geminiThat’s it. Knwler resolves the default model (gemini-3.1-flash-lite-preview), the base URL, and the API key automatically.
How Gemini is integrated
Knwler uses Gemini’s OpenAI-compatible endpoint, so no separate SDK is needed for standard extraction — just httpx (already a dependency).
Every request is cached by a hash of (prompt, model, temperature, num_predict), so re-runs on the same content are free.
Default models
| Role | Model |
|---|---|
| Schema discovery | gemini-3.1-flash-lite-preview |
| Graph extraction | gemini-3.1-flash-lite-preview |
You can override either with --discovery-model / --extraction-model:
python main.py extract -f report.pdf --backend gemini \
--discovery-model gemini-3-flash-preview \
--extraction-model gemini-3-flash-previewCLI options
All standard extract flags work with the Gemini backend:
| Flag | Default | Description |
|---|---|---|
--backend gemini |
— | Select Gemini |
--extraction-model |
gemini-3.1-flash-lite-preview |
Model for chunk extraction |
--discovery-model |
gemini-3.1-flash-lite-preview |
Model for schema/language discovery |
--max-tokens |
400 |
Chunk size in tokens |
--max-concurrent |
8 |
Parallel API calls |
--no-cache |
— | Disable response caching |
Examples
# Minimal
python main.py extract -f paper.pdf --backend gemini
# Fetch a webpage and extract
python main.py fetch https://en.wikipedia.org/wiki/Alan_Turing --backend gemini
# Larger model, bigger chunks
python main.py extract -f report.pdf --backend gemini \
--extraction-model gemini-3-flash-preview \
--max-tokens 600
# Disable caching (always hit the API)
python main.py extract -f notes.md --backend gemini --no-cacheProgrammatic API
from knwler.api import extract_file
from knwler.config import Config
config = Config(
backend="gemini",
# api_key="...", # or use GEMINI_API_KEY env var
# extraction_model="gemini-3-flash-preview",
)
graph = await extract_file("report.pdf", config)
print(graph["title"])You can also pass a custom base URL if you are using a proxy or a different Gemini-compatible endpoint:
config = Config(
backend="gemini",
base_url="https://my-proxy.example.com/v1beta/openai",
)Batch Processing
For bulk document processing, knwler ships a dedicated Gemini Batch processor that submits all LLM calls through the Gemini Batch API — delivering a 50 % cost reduction compared to real-time calls.
Additional dependency
uv add google-genaiPipeline overview
The batch processor runs three sequential rounds. Within each round all documents are processed in a single batch job submitted to Google’s Batch API, waited on, then parsed before the next round begins.
sequenceDiagram
participant CLI as batch-gemini CLI
participant DB as SQLite state
participant GBA as Gemini Batch API
participant FS as Output files
CLI->>DB: scan & chunk all documents
CLI->>GBA: Round 1 — language + schema requests (JSONL)
GBA-->>CLI: poll until JOB_STATE_SUCCEEDED
CLI->>DB: store language + schemas
CLI->>GBA: Round 2 — title + summary + rephrase + extraction (JSONL)
GBA-->>CLI: poll until JOB_STATE_SUCCEEDED
CLI->>DB: store titles, summaries, rephrases, per-chunk graphs
CLI->>GBA: Round 3 — consolidation summaries + community labels (JSONL)
GBA-->>CLI: poll until JOB_STATE_SUCCEEDED
CLI->>DB: store final consolidated graph
CLI->>FS: write graph.json + index.html per document
Each round is resumable — if the process is interrupted, re-running the same command picks up from where it left off via the SQLite state database.
Start or resume processing
python main.py batch-gemini run \
--input ./documents \
--output ./resultsCheck pipeline status
python main.py batch-gemini status \
--input ./documents \
--output ./resultsUse specific models
python main.py batch-gemini run \
--input ./documents \
--output ./results \
--discovery-model gemini-3-flash-preview \
--extraction-model gemini-3-flash-previewConsolidate all graphs at the end
python main.py batch-gemini run \
--input ./documents \
--output ./results \
--consolidateOr consolidate manually afterwards:
python main.py consolidate --dir ./results --output ./mergedSupported file types
.pdf, .txt, .md, .text, .markdown
State database
A file called batch_gemini.db is created in the output directory. It stores:
- Per-document text, chunks, language, schema, extracted graphs, and final output.
- Per-round batch job names, statuses, and timings.
Important: If you add new documents or want to reprocess from scratch, delete the output directory first. The database is designed for crash recovery, not incremental updates.
Environment variables
| Variable | Purpose |
|---|---|
GEMINI_API_KEY |
Primary API key (checked first) |
GOOGLE_API_KEY |
Fallback API key (batch processor only) |
Polling behaviour
The batch processor polls with exponential backoff:
- Initial interval: 30 seconds
- Maximum interval: 5 minutes
- Terminal states:
JOB_STATE_SUCCEEDED,JOB_STATE_FAILED,JOB_STATE_CANCELLED,JOB_STATE_EXPIRED
Google commits to processing batch jobs within 24 hours; in practice most jobs complete in minutes.
Cost comparison
| Mode | Relative cost | Latency |
|---|---|---|
Real-time (extract) |
1× | Immediate |
Batch (batch-gemini) |
~0.5× | Minutes – hours |
Use real-time extraction for interactive use and single documents. Use batch processing when handling tens or hundreds of documents.