Running Local AI for Blender with Ollama: Free, Private, and No API Keys

Every AI-powered Blender tool needs a language model behind it. For most of the past two years, that meant sending your prompts to a cloud API — OpenAI, Anthropic, or similar. You pay per request, your data travels to external servers, and you need an internet connection to work.

Ollama changes that equation. It lets you run capable AI models locally on your own machine, completely free, with no API keys and no data leaving your computer. For Blender artists and indie developers who care about privacy, cost, or just want to work offline, it is a practical alternative that has matured significantly in 2026.

This guide covers everything you need to get Ollama running for Blender workflows — installation, model selection, hardware requirements, and practical use cases.

What Ollama Actually Is

Ollama is an open-source tool that runs large language models (LLMs) locally on your machine. Think of it as a local version of the ChatGPT API, but running entirely on your own hardware. It handles model downloading, memory management, GPU acceleration, and provides a simple API that other tools can connect to.

The key benefits for Blender users:

Free. No per-request costs. No subscription. No usage limits.
Private. Your prompts, scene descriptions, and project details never leave your machine.
Offline. Works without internet once models are downloaded.
Fast for local. GPU acceleration means response times of 1-5 seconds for most queries on modern hardware.
Simple API. Compatible with tools built for the OpenAI API format, which most Blender AI integrations support.

The tradeoff is that local models are generally less capable than the largest cloud models. A 7B parameter model running on your GPU will not match GPT-4 or Claude on complex reasoning tasks. But for the specific tasks Blender artists need — generating material descriptions, writing short Python scripts, suggesting node setups — smaller models are often more than sufficient.

Installation

Ollama is straightforward to install on all major platforms.

macOS

# Using Homebrew
brew install ollama

# Or download from ollama.com
curl -fsSL https://ollama.com/install.sh | sh

Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download the installer from ollama.com. Run it. The installation adds Ollama as a system service that runs in the background.

Verify Installation

After installing, open a terminal and run:

ollama --version

Then start the Ollama service (on Linux/macOS it may start automatically):

ollama serve

This starts the local API server on http://localhost:11434. Leave this running in the background.

Choosing the Right Models for 3D Work

Not all models are equally useful for Blender workflows. Here are the models we have found most effective for 3D-related tasks, ranked by practical utility.

Llama 3 (8B) — Best All-Rounder

ollama pull llama3

Llama 3 8B is the best general-purpose model for most Blender tasks. It handles natural language material descriptions, basic Python scripting, and workflow questions well. Response quality is solid for its size, and it runs comfortably on 8GB+ VRAM GPUs.

Good for: Material descriptions, workflow questions, general Blender knowledge, light scripting tasks, summarizing documentation.

Memory: ~5GB VRAM for the 8B model.

CodeLlama (7B/13B) — Best for Python Scripting

ollama pull codellama
# Or for more capability at the cost of more memory:
ollama pull codellama:13b

If your primary use case is generating Blender Python scripts — automating tasks, creating operators, writing material node setup code — CodeLlama is the better choice. It produces more syntactically correct Python and understands code structure better than general-purpose models.

Good for: Blender Python API scripts, addon development, node setup automation, batch processing scripts.

Memory: ~4.5GB VRAM (7B), ~8GB VRAM (13B).

Mistral (7B) — Fast and Capable

ollama pull mistral

Mistral 7B is slightly faster than Llama 3 on most hardware while maintaining good quality. It is a solid alternative if Llama 3 feels slow on your machine, or if you want snappier responses for interactive workflows.

Good for: Quick queries, material suggestions, parameter recommendations, interactive chat workflows.

Memory: ~4.5GB VRAM.

Llama 3 (70B) — Maximum Quality

ollama pull llama3:70b

If you have 48GB+ VRAM (or enough system RAM for CPU inference at slower speeds), the 70B model is significantly more capable. It writes better code, produces more nuanced material descriptions, and handles complex multi-step instructions more reliably.

Good for: Complex scripting tasks, detailed technical explanations, multi-step workflows. Only practical on high-end workstations.

Memory: ~40GB VRAM (or 40GB+ system RAM for CPU-only, but expect 10-30x slower inference).

Hardware Requirements

Ollama works on a range of hardware, but performance varies dramatically.

GPU Inference (Recommended)

GPU VRAM	Recommended Models	Response Speed
6GB	Mistral 7B, CodeLlama 7B	15-25 tokens/sec
8GB	Llama 3 8B, Mistral 7B	20-35 tokens/sec
12GB	Any 7-8B model comfortably	30-50 tokens/sec
16GB	13B models, quantized 30B	20-40 tokens/sec
24GB+	30B models, quantized 70B	15-30 tokens/sec

Both NVIDIA and AMD GPUs are supported. NVIDIA generally has better performance due to more mature CUDA support, but AMD ROCm support has improved substantially.

CPU-Only Inference

If you do not have a dedicated GPU or your VRAM is too limited, Ollama can run on CPU using system RAM. Expect 3-10x slower inference compared to GPU. For quick queries this is still usable (5-15 seconds per response). For generating long scripts, it gets tedious.

Minimum: 8GB RAM for 7B models. 16GB recommended for comfortable operation alongside Blender.

Apple Silicon

Ollama runs well on Apple M-series chips. The unified memory architecture means models can use all available RAM efficiently. An M1 Pro with 16GB handles 7-8B models smoothly. M2/M3 Max with 32-64GB can run larger models.

Setting Up the Local API

Ollama exposes an OpenAI-compatible API by default. This is important because most Blender AI tools that support cloud APIs can connect to Ollama with minimal configuration.

The API runs on http://localhost:11434 by default. You can test it with curl:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Write a Blender Python script that creates a Principled BSDF material with a base color of dark red."
}'

For tools that use the OpenAI API format, Ollama also provides a compatible endpoint at http://localhost:11434/v1/chat/completions.

Configuration for Blender Tools

When configuring Blender addons or MCP servers to use Ollama instead of a cloud API, you typically need to set:

API Base URL: http://localhost:11434/v1
API Key: Any string (Ollama does not require authentication, but some tools require a non-empty key field — use ollama or none)
Model name: The exact model name you pulled (e.g., llama3, mistral, codellama)

Practical Use Cases in Blender Workflows

Material Generation and Description

Local AI excels at translating natural language into material parameters. Describe a material in plain English and get back specific Principled BSDF values, suggested node setups, or texture recommendations.

Example prompt: "I need a weathered copper material. What Principled BSDF values should I use for the oxidized green patina areas vs the exposed copper areas?"

A 7-8B model produces useful parameter suggestions: base color values, roughness ranges, metallic settings. Not as detailed as GPT-4, but accurate enough to get you 80% of the way to the final material.

Python Script Generation

This is where CodeLlama particularly shines. You can describe a Blender operation in natural language and get back a functional Python script.

Example: "Write a Blender Python script that selects all objects in the active collection, applies all modifiers, and exports each object as a separate FBX file to a specified directory."

CodeLlama 13B generates working scripts for this type of task more often than not. You will still need to review and sometimes fix the output, but it saves significant time compared to writing from scratch or searching documentation.

Batch Processing Assistance

For repetitive tasks across many objects — renaming, material assignment, UV operations — describing the pattern to a local AI and getting a batch script is faster than doing it manually.

Shader Node Recommendations

Describe the visual effect you want and ask for a node setup recommendation. Local models know common Blender node graphs and can suggest combinations of noise textures, color ramps, and mix nodes for specific effects.

Performance Tips

Keep Models Loaded

The first request after loading a model takes longer because the model weights need to load into VRAM. Subsequent requests are faster. Ollama keeps the last used model in memory by default. If you switch between models frequently, responses will be slower.

Use Quantized Models for Limited VRAM

If your GPU VRAM is limited, use quantized model variants:

ollama pull llama3:8b-q4_0    # 4-bit quantized, ~4GB VRAM
ollama pull llama3:8b-q5_1    # 5-bit quantized, ~5GB VRAM

Quantization reduces quality slightly but makes larger models fit in less memory. The quality difference between q4 and q5 is subtle for most practical tasks.

Close Other GPU Applications

If Blender is using your GPU for viewport rendering or Cycles rendering, there may not be enough VRAM left for Ollama. Either pause rendering while using AI features, or use a smaller quantized model that fits alongside Blender's VRAM usage.

Ollama Integration with StraySpark Tools

The AI Material Generator supports Ollama as a backend, meaning you can run the entire material generation workflow locally without any cloud API dependency. Point it at your local Ollama instance, select your preferred model, and generate materials with the same interface — just powered by your own hardware instead of a cloud service.

This is particularly useful for studios with data sensitivity requirements, artists working in locations with unreliable internet, or anyone who prefers to avoid per-request API costs. The material quality with Llama 3 8B is solid for most use cases, and if you have the hardware for a 13B or larger model, the results improve further.

Summary

Ollama in 2026 is a practical, mature tool for running AI locally. For Blender workflows, it covers the most common use cases — material generation, Python scripting, workflow assistance — with acceptable quality and no ongoing costs.

Start with Llama 3 8B if you have 8GB+ VRAM. Switch to CodeLlama if scripting is your primary use case. Use Mistral if you need faster responses. And if you have a high-end workstation, the 70B models bring quality close to cloud APIs.

The setup takes 10 minutes. The models download once. After that, you have a free, private, offline AI assistant that runs directly alongside Blender.

This guide covers everything you need to get Ollama running for Blender workflows — installation, model selection, hardware requirements, and practical use cases.

What Ollama Actually Is

The key benefits for Blender users:

Free. No per-request costs. No subscription. No usage limits.
Private. Your prompts, scene descriptions, and project details never leave your machine.
Offline. Works without internet once models are downloaded.
Fast for local. GPU acceleration means response times of 1-5 seconds for most queries on modern hardware.
Simple API. Compatible with tools built for the OpenAI API format, which most Blender AI integrations support.

Installation

Ollama is straightforward to install on all major platforms.

macOS

# Using Homebrew
brew install ollama

# Or download from ollama.com
curl -fsSL https://ollama.com/install.sh | sh

Linux

curl -fsSL https://ollama.com/install.sh | sh

Windows

Download the installer from ollama.com. Run it. The installation adds Ollama as a system service that runs in the background.

Verify Installation

After installing, open a terminal and run:

ollama --version

Then start the Ollama service (on Linux/macOS it may start automatically):

ollama serve

This starts the local API server on http://localhost:11434. Leave this running in the background.

Choosing the Right Models for 3D Work

Not all models are equally useful for Blender workflows. Here are the models we have found most effective for 3D-related tasks, ranked by practical utility.

Llama 3 (8B) — Best All-Rounder

ollama pull llama3

Good for: Material descriptions, workflow questions, general Blender knowledge, light scripting tasks, summarizing documentation.

Memory: ~5GB VRAM for the 8B model.

CodeLlama (7B/13B) — Best for Python Scripting

ollama pull codellama
# Or for more capability at the cost of more memory:
ollama pull codellama:13b

Good for: Blender Python API scripts, addon development, node setup automation, batch processing scripts.

Memory: ~4.5GB VRAM (7B), ~8GB VRAM (13B).

Mistral (7B) — Fast and Capable

ollama pull mistral

Good for: Quick queries, material suggestions, parameter recommendations, interactive chat workflows.

Memory: ~4.5GB VRAM.

Llama 3 (70B) — Maximum Quality

ollama pull llama3:70b

Good for: Complex scripting tasks, detailed technical explanations, multi-step workflows. Only practical on high-end workstations.

Memory: ~40GB VRAM (or 40GB+ system RAM for CPU-only, but expect 10-30x slower inference).

Hardware Requirements

Ollama works on a range of hardware, but performance varies dramatically.

GPU Inference (Recommended)

GPU VRAM	Recommended Models	Response Speed
6GB	Mistral 7B, CodeLlama 7B	15-25 tokens/sec
8GB	Llama 3 8B, Mistral 7B	20-35 tokens/sec
12GB	Any 7-8B model comfortably	30-50 tokens/sec
16GB	13B models, quantized 30B	20-40 tokens/sec
24GB+	30B models, quantized 70B	15-30 tokens/sec

Both NVIDIA and AMD GPUs are supported. NVIDIA generally has better performance due to more mature CUDA support, but AMD ROCm support has improved substantially.

CPU-Only Inference

Minimum: 8GB RAM for 7B models. 16GB recommended for comfortable operation alongside Blender.

Apple Silicon

Setting Up the Local API

Ollama exposes an OpenAI-compatible API by default. This is important because most Blender AI tools that support cloud APIs can connect to Ollama with minimal configuration.

The API runs on http://localhost:11434 by default. You can test it with curl:

curl http://localhost:11434/api/generate -d '{
  "model": "llama3",
  "prompt": "Write a Blender Python script that creates a Principled BSDF material with a base color of dark red."
}'

For tools that use the OpenAI API format, Ollama also provides a compatible endpoint at http://localhost:11434/v1/chat/completions.

Configuration for Blender Tools

When configuring Blender addons or MCP servers to use Ollama instead of a cloud API, you typically need to set:

API Base URL: http://localhost:11434/v1
API Key: Any string (Ollama does not require authentication, but some tools require a non-empty key field — use ollama or none)
Model name: The exact model name you pulled (e.g., llama3, mistral, codellama)

Practical Use Cases in Blender Workflows

Material Generation and Description

Example prompt: "I need a weathered copper material. What Principled BSDF values should I use for the oxidized green patina areas vs the exposed copper areas?"

Python Script Generation

This is where CodeLlama particularly shines. You can describe a Blender operation in natural language and get back a functional Python script.

Example: "Write a Blender Python script that selects all objects in the active collection, applies all modifiers, and exports each object as a separate FBX file to a specified directory."

Batch Processing Assistance

For repetitive tasks across many objects — renaming, material assignment, UV operations — describing the pattern to a local AI and getting a batch script is faster than doing it manually.

Shader Node Recommendations

Performance Tips

Keep Models Loaded

Use Quantized Models for Limited VRAM

If your GPU VRAM is limited, use quantized model variants:

ollama pull llama3:8b-q4_0    # 4-bit quantized, ~4GB VRAM
ollama pull llama3:8b-q5_1    # 5-bit quantized, ~5GB VRAM

Quantization reduces quality slightly but makes larger models fit in less memory. The quality difference between q4 and q5 is subtle for most practical tasks.

Close Other GPU Applications

Ollama Integration with StraySpark Tools

Summary

The setup takes 10 minutes. The models download once. After that, you have a free, private, offline AI assistant that runs directly alongside Blender.

What Ollama Actually Is

Installation

macOS

Linux

Windows

Verify Installation

Choosing the Right Models for 3D Work

Llama 3 (8B) — Best All-Rounder

CodeLlama (7B/13B) — Best for Python Scripting

Mistral (7B) — Fast and Capable

Llama 3 (70B) — Maximum Quality

Hardware Requirements

GPU Inference (Recommended)

CPU-Only Inference

Apple Silicon

Setting Up the Local API

Configuration for Blender Tools

Practical Use Cases in Blender Workflows

Material Generation and Description

Python Script Generation

Batch Processing Assistance

Shader Node Recommendations

Performance Tips

Keep Models Loaded

Use Quantized Models for Limited VRAM

Close Other GPU Applications

Ollama Integration with StraySpark Tools

Summary

Tags

Continue Reading

AI Material Generation in Blender: The Complete Guide for 2026

How AI Is Cutting Asset Creation Time by 60% for Indie Studios in 2026

Blender 5.0 for Game Developers: The Features That Actually Matter

What Ollama Actually Is

Installation

macOS

Linux

Windows

Verify Installation

Choosing the Right Models for 3D Work

Llama 3 (8B) — Best All-Rounder

CodeLlama (7B/13B) — Best for Python Scripting

Mistral (7B) — Fast and Capable

Llama 3 (70B) — Maximum Quality

Hardware Requirements

GPU Inference (Recommended)

CPU-Only Inference

Apple Silicon

Setting Up the Local API

Configuration for Blender Tools

Practical Use Cases in Blender Workflows

Material Generation and Description

Python Script Generation

Batch Processing Assistance

Shader Node Recommendations

Performance Tips

Keep Models Loaded

Use Quantized Models for Limited VRAM

Close Other GPU Applications

Ollama Integration with StraySpark Tools

Summary

Tags

Continue Reading

AI Material Generation in Blender: The Complete Guide for 2026

How AI Is Cutting Asset Creation Time by 60% for Indie Studios in 2026

Blender 5.0 for Game Developers: The Features That Actually Matter