Every AI-powered Blender tool needs a language model behind it. For most of the past two years, that meant sending your prompts to a cloud API — OpenAI, Anthropic, or similar. You pay per request, your data travels to external servers, and you need an internet connection to work.
Ollama changes that equation. It lets you run capable AI models locally on your own machine, completely free, with no API keys and no data leaving your computer. For Blender artists and indie developers who care about privacy, cost, or just want to work offline, it is a practical alternative that has matured significantly in 2026.
This guide covers everything you need to get Ollama running for Blender workflows — installation, model selection, hardware requirements, and practical use cases.
What Ollama Actually Is
Ollama is an open-source tool that runs large language models (LLMs) locally on your machine. Think of it as a local version of the ChatGPT API, but running entirely on your own hardware. It handles model downloading, memory management, GPU acceleration, and provides a simple API that other tools can connect to.
The key benefits for Blender users:
- Free. No per-request costs. No subscription. No usage limits.
- Private. Your prompts, scene descriptions, and project details never leave your machine.
- Offline. Works without internet once models are downloaded.
- Fast for local. GPU acceleration means response times of 1-5 seconds for most queries on modern hardware.
- Simple API. Compatible with tools built for the OpenAI API format, which most Blender AI integrations support.
The tradeoff is that local models are generally less capable than the largest cloud models. A 7B parameter model running on your GPU will not match GPT-4 or Claude on complex reasoning tasks. But for the specific tasks Blender artists need — generating material descriptions, writing short Python scripts, suggesting node setups — smaller models are often more than sufficient.
Installation
Ollama is straightforward to install on all major platforms.
macOS
# Using Homebrew
brew install ollama
# Or download from ollama.com
curl -fsSL https://ollama.com/install.sh | sh
Linux
curl -fsSL https://ollama.com/install.sh | sh
Windows
Download the installer from ollama.com. Run it. The installation adds Ollama as a system service that runs in the background.
Verify Installation
After installing, open a terminal and run:
ollama --version
Then start the Ollama service (on Linux/macOS it may start automatically):
ollama serve
This starts the local API server on http://localhost:11434. Leave this running in the background.
Choosing the Right Models for 3D Work
Not all models are equally useful for Blender workflows. Here are the models we have found most effective for 3D-related tasks, ranked by practical utility.
Llama 3 (8B) — Best All-Rounder
ollama pull llama3
Llama 3 8B is the best general-purpose model for most Blender tasks. It handles natural language material descriptions, basic Python scripting, and workflow questions well. Response quality is solid for its size, and it runs comfortably on 8GB+ VRAM GPUs.
Good for: Material descriptions, workflow questions, general Blender knowledge, light scripting tasks, summarizing documentation.
Memory: ~5GB VRAM for the 8B model.
CodeLlama (7B/13B) — Best for Python Scripting
ollama pull codellama
# Or for more capability at the cost of more memory:
ollama pull codellama:13b
If your primary use case is generating Blender Python scripts — automating tasks, creating operators, writing material node setup code — CodeLlama is the better choice. It produces more syntactically correct Python and understands code structure better than general-purpose models.
Good for: Blender Python API scripts, addon development, node setup automation, batch processing scripts.
Memory: ~4.5GB VRAM (7B), ~8GB VRAM (13B).
Mistral (7B) — Fast and Capable
ollama pull mistral
Mistral 7B is slightly faster than Llama 3 on most hardware while maintaining good quality. It is a solid alternative if Llama 3 feels slow on your machine, or if you want snappier responses for interactive workflows.
Good for: Quick queries, material suggestions, parameter recommendations, interactive chat workflows.
Memory: ~4.5GB VRAM.
Llama 3 (70B) — Maximum Quality
ollama pull llama3:70b
If you have 48GB+ VRAM (or enough system RAM for CPU inference at slower speeds), the 70B model is significantly more capable. It writes better code, produces more nuanced material descriptions, and handles complex multi-step instructions more reliably.
Good for: Complex scripting tasks, detailed technical explanations, multi-step workflows. Only practical on high-end workstations.
Memory: ~40GB VRAM (or 40GB+ system RAM for CPU-only, but expect 10-30x slower inference).
Hardware Requirements
Ollama works on a range of hardware, but performance varies dramatically.
GPU Inference (Recommended)
| GPU VRAM | Recommended Models | Response Speed |
|---|---|---|
| 6GB | Mistral 7B, CodeLlama 7B | 15-25 tokens/sec |
| 8GB | Llama 3 8B, Mistral 7B | 20-35 tokens/sec |
| 12GB | Any 7-8B model comfortably | 30-50 tokens/sec |
| 16GB | 13B models, quantized 30B | 20-40 tokens/sec |
| 24GB+ | 30B models, quantized 70B | 15-30 tokens/sec |
Both NVIDIA and AMD GPUs are supported. NVIDIA generally has better performance due to more mature CUDA support, but AMD ROCm support has improved substantially.
CPU-Only Inference
If you do not have a dedicated GPU or your VRAM is too limited, Ollama can run on CPU using system RAM. Expect 3-10x slower inference compared to GPU. For quick queries this is still usable (5-15 seconds per response). For generating long scripts, it gets tedious.
Minimum: 8GB RAM for 7B models. 16GB recommended for comfortable operation alongside Blender.
Apple Silicon
Ollama runs well on Apple M-series chips. The unified memory architecture means models can use all available RAM efficiently. An M1 Pro with 16GB handles 7-8B models smoothly. M2/M3 Max with 32-64GB can run larger models.
Setting Up the Local API
Ollama exposes an OpenAI-compatible API by default. This is important because most Blender AI tools that support cloud APIs can connect to Ollama with minimal configuration.
The API runs on http://localhost:11434 by default. You can test it with curl:
curl http://localhost:11434/api/generate -d '{
"model": "llama3",
"prompt": "Write a Blender Python script that creates a Principled BSDF material with a base color of dark red."
}'
For tools that use the OpenAI API format, Ollama also provides a compatible endpoint at http://localhost:11434/v1/chat/completions.
Configuration for Blender Tools
When configuring Blender addons or MCP servers to use Ollama instead of a cloud API, you typically need to set:
- API Base URL:
http://localhost:11434/v1 - API Key: Any string (Ollama does not require authentication, but some tools require a non-empty key field — use
ollamaornone) - Model name: The exact model name you pulled (e.g.,
llama3,mistral,codellama)
Practical Use Cases in Blender Workflows
Material Generation and Description
Local AI excels at translating natural language into material parameters. Describe a material in plain English and get back specific Principled BSDF values, suggested node setups, or texture recommendations.
Example prompt: "I need a weathered copper material. What Principled BSDF values should I use for the oxidized green patina areas vs the exposed copper areas?"
A 7-8B model produces useful parameter suggestions: base color values, roughness ranges, metallic settings. Not as detailed as GPT-4, but accurate enough to get you 80% of the way to the final material.
Python Script Generation
This is where CodeLlama particularly shines. You can describe a Blender operation in natural language and get back a functional Python script.
Example: "Write a Blender Python script that selects all objects in the active collection, applies all modifiers, and exports each object as a separate FBX file to a specified directory."
CodeLlama 13B generates working scripts for this type of task more often than not. You will still need to review and sometimes fix the output, but it saves significant time compared to writing from scratch or searching documentation.
Batch Processing Assistance
For repetitive tasks across many objects — renaming, material assignment, UV operations — describing the pattern to a local AI and getting a batch script is faster than doing it manually.
Shader Node Recommendations
Describe the visual effect you want and ask for a node setup recommendation. Local models know common Blender node graphs and can suggest combinations of noise textures, color ramps, and mix nodes for specific effects.
Performance Tips
Keep Models Loaded
The first request after loading a model takes longer because the model weights need to load into VRAM. Subsequent requests are faster. Ollama keeps the last used model in memory by default. If you switch between models frequently, responses will be slower.
Use Quantized Models for Limited VRAM
If your GPU VRAM is limited, use quantized model variants:
ollama pull llama3:8b-q4_0 # 4-bit quantized, ~4GB VRAM
ollama pull llama3:8b-q5_1 # 5-bit quantized, ~5GB VRAM
Quantization reduces quality slightly but makes larger models fit in less memory. The quality difference between q4 and q5 is subtle for most practical tasks.
Close Other GPU Applications
If Blender is using your GPU for viewport rendering or Cycles rendering, there may not be enough VRAM left for Ollama. Either pause rendering while using AI features, or use a smaller quantized model that fits alongside Blender's VRAM usage.
Ollama Integration with StraySpark Tools
The AI Material Generator supports Ollama as a backend, meaning you can run the entire material generation workflow locally without any cloud API dependency. Point it at your local Ollama instance, select your preferred model, and generate materials with the same interface — just powered by your own hardware instead of a cloud service.
This is particularly useful for studios with data sensitivity requirements, artists working in locations with unreliable internet, or anyone who prefers to avoid per-request API costs. The material quality with Llama 3 8B is solid for most use cases, and if you have the hardware for a 13B or larger model, the results improve further.
Summary
Ollama in 2026 is a practical, mature tool for running AI locally. For Blender workflows, it covers the most common use cases — material generation, Python scripting, workflow assistance — with acceptable quality and no ongoing costs.
Start with Llama 3 8B if you have 8GB+ VRAM. Switch to CodeLlama if scripting is your primary use case. Use Mistral if you need faster responses. And if you have a high-end workstation, the 70B models bring quality close to cloud APIs.
The setup takes 10 minutes. The models download once. After that, you have a free, private, offline AI assistant that runs directly alongside Blender.