Knwler is an enterprise document intelligence platform that extracts structured knowledge graphs from documents using Large Language Models. It identifies entities, relationships, and topics from PDFs, text files, and Markdown — producing interactive reports and exports for graph analytics platforms.

Can Knwler run fully on-premise?

Yes. Knwler supports fully air-gapped operation via Ollama with open-weight models. No data leaves your infrastructure, making it suitable for classified and regulated environments with strict data sovereignty requirements.

What languages does Knwler support?

Knwler auto-detects document language and supports English, German, French, Spanish, and Dutch out of the box. Additional languages can be added by extending a single configuration file.

What export formats are available?

Knwler exports to JSON, GML, GraphML, and interactive HTML. These can be imported directly into Neo4j, Gephi, yEd, Memgraph, SurrealDB, or used to generate vector embeddings for semantic search.

How much does it cost to process a document?

Using cloud LLMs (OpenAI GPT-4o), processing costs approximately $0.20 per 20-page document. Running on-premise with local models is completely free after initial setup. Intelligent caching means re-runs cost nothing.

Knwler CLI

Knwler can be used via your terminal or as a Python package. See the API page for more details on how to use individual methods.

The CLI contains a lot of functionality and rather than listing all possible actions, you will find below concrete examples. Note that if you have installed Knwler via pipx you can simply use knwler ... rather than uv run main.py ....

There is a demo which uses the Human Rights declaration as short pdf:

uv run main.py demo

This will run the whole pipeline with all options:

download the pdf
parse the pdf to markdown
chunk the text
infer a schema
rephrase the chunks
extract a little knowledge graph for every chunk
consolidate the graphs
extract a title
extract a summary
render a report (html).

The cli_demo.py contains all of this and can serve as a guide on how to use Knwler as a Python package. You can use any of the steps individually or leave out what you don’t need (e.g. the rephrasing of chunks).

Of course, the main reason you would try Knwler is to extract your own document:

uv run main.py -f https://knwler.com/pdfs/HumanRights.pdf

and if you have a local file:

uv run main.py -f ./HumanRights.pdf

This will output lots of files in a results directory:

graph.gml a format convenient for graph visualization and graph analytics
graph.json contains everything you need for downstream tasks
index.html a rendering of the graph.json data
log.htnl the log in html format
log.txt the log in text format.

You can output things to a different directory with

uv run main.py -f ./HumanRights.pdf --output ./stuff

Every time you run the same document a new directory will be created, you can override this with:

uv run main.py -f ./HumanRights.pdf --output ./stuff --overwrite

The above commands will all use Ollama as backend. If you want OpenAI instead simply use:

uv run main.py -f ./HumanRights.pdf --output ./stuff --overwrite --openai

and make sure the API key is in your environment. If not, use

export OPENAIAPI_KEY=sk-....

and similarly for Anthropic:

uv run main.py -f ./HumanRights.pdf --output ./stuff --overwrite --anthropic

which, according to our benchmarks, will give you the highest qualirty output.

The rendered HTML uses a template and you can switch to another one via:

uv run main.py -f ./HumanRights.pdf --output ./stuff --overwrite --anthropic --template columns

Note that this will use the cached LLM exchanges and you won’t have to pay again for a different style.

If you wish to understand the quality a model’s output you can use the benchmark utility:

uv run main.py benchmark run

The grid.json file contains the items which will be benchmarked. This will render a comprehensive report and sort the results based on a knowledge yield score.

Performance Tips

Run Ollama with OLLAMA_NUM_PARALLEL=8 ollama serve to fully saturate --concurrent requests locally.
Tune --max-tokens (default 400) — smaller values create more chunks and finer-grained graphs at the cost of more LLM calls; larger values do the opposite.
The cache is enabled by default. Re-runs with different export flags (e.g. adding --html-report) are instant and free.
For large document sets, use --dir with --consolidate to batch-process and merge in a single command.