Ollama
Ollama is just a convenient local LLM service, you can use LMStudio or any other service. The default model is Qwen 2.5 but here as well, experiment and see what works best for you. We have done lots of benchmarks and bigger models are not better, sometimes quite the opposite. Small models of 3 or 7 billion parameters will be fine and a lot faster. Thinking, in particular, is really standing in the way of graph extraction. Whatever you do, don’t enable thinking and don’t use advanced MOE models.
There are two Ollama model parameters you can set:
--extraction-model: the model used for summary, title and graph extraction. Default isqwen2.5:7b.--discovery-model: used for language and schema detection. Default isqwen2.5:14b.
Tip: When running Ollama locally, launch it via CLI with parallel processing for best throughput:
OLLAMA_NUM_PARALLEL=8 ollama serveAdjust the number based on your machine specs (8 is suitable for a Mac M4 Pro with 64 GB RAM).
Linux
When installing on Linux
curl -fsSL https://ollama.com/install.sh | shOllama will run automatically and you only need to pull the default models to run Knwler:
ollama pull qwen2.5:3b
ollama pull qwen2.5:14b