AZMX AI

Guide · 2026-05-25 · 12 min read

How to Run Claude Locally: Step-by-Step Guide

Cut the cloud cord. Run Claude models fully offline on your Mac, Windows, or Linux machine with an approval-gated agent.

Running Claude locally is no longer a futuristic fantasy — it's a practical, privacy-first reality. By using offline-capable AI agents and local language models like Anthropic's Claude (via Ollama or LM Studio), you can preserve your data, avoid per-token costs, and maintain full control over your workflow. This guide walks you through the exact steps to set up a local Claude-like experience on your desktop, complete with terminal access, code editing, and MCP tool integration — no account required.

Why Run Claude Locally?

In 2026, the default AI ecosystem is cloud-dependent: every prompt, every file, every conversation flows through someone else's server. That's fine for casual use, but if you're working with sensitive code, proprietary data, or simply value your privacy, it's a liability. Running Claude locally eliminates that dependency. Your data stays on your machine. No telemetry. No per-token billing. No rate limits.

Local inference also means you can integrate AI directly into your development workflow — inside your terminal, your editor, your project. Agents like AZMX AI combine a real PTY terminal with granular approval gates, so you get AI assistance without the risk of silent file mutations. When you run Claude locally, you're not just chatting with an LLM; you're commissioning an agent that can read files, execute commands, and write code — all under your supervision.

What You Need to Run Claude Locally

Before you start, understand the constraints. Local models require hardware. A modern laptop with 16 GB of RAM and a dedicated GPU (NVIDIA, AMD, or Apple Silicon) can run 7B-13B parameter models comfortably. For Claude-level reasoning (think Sonnet or Opus), you'll want 32 GB+ and a powerful GPU — or use a smaller distilled model from the Claude family.

The tools you need:

  • Ollama or LM Studio for model serving
  • AZMX AI or a similar agent platform for terminal + editor integration
  • A Claude-compatible model: try claude-3.5-sonnet (via Ollama's API) or claude-3-opus:14b from Hugging Face
  • A BYOK-compatible client if you want to mix local with cloud (e.g., use local for quick edits, cloud for heavy lifts)

Step-by-Step: Set Up Claude Locally

1. Install and Run an Ollama Model

Ollama is the easiest path. Download it from azmx.ai or directly from ollama.ai. Then, pull a Claude-compatible model:

ollama pull claude-3.5-sonnet:14b
ollama run claude-3.5-sonnet:14b

If your hardware is limited, try claude-3-haiku:7b — it's faster and still good for code completions.

2. Configure AZMX AI as Your Agent Frontend

AZMX AI is a ~7 MB desktop app that connects to any local endpoint. Open it, go to Settings > Provider, and select "Ollama". Point it to http://localhost:11434. Now you have a terminal that can use your local Claude model for every command you type.

AZMX's approval gate is critical: when your local Claude suggests a command, it shows you the diff before execution. You approve or deny. No silent `rm -rf /`. This is the difference between a toy and a tool.

3. Integrate with Your Editor

AZMX includes a CodeMirror 6 editor with per-hunk AI diffs. Open any file, select text, and press Ctrl+. to ask your local Claude for a change. The diff appears inline. Approve it, and the edit is applied. This workflow works offline — no network calls.

4. Add MCP Tools (Optional but Powerful)

AZMX speaks the Model Context Protocol (MCP) over stdio and HTTP. You can connect tools like filesystem access, database querying, or web scraping — all running locally via your self-hosted MCP server. This turns your local Claude from a chat bot into an agent that can read your project's AZMX.md memory file, run tests, and install dependencies.

Comparing Local vs. Cloud Claude

Cloud Claude (via Anthropic's API) is faster and smarter for complex reasoning. Local Claude is slower at equal size but cheaper at scale (no per-token cost). The real advantage is privacy: when you run Claude locally, nothing leaves your machine. For codebases under NDA, financial models, or personal vaults, that's non-negotiable.

Competitors like Cursor and Claude Code offer local-completion modes but still phone home for authentication and telemetry. Aider and Continue are local-first but lack the terminal PTY and approval gate. Cline, Windsurf, and GitHub Copilot are entirely cloud-dependent. AZMX AI is unique in offering a fully offline, approval-gated, BYOK agent with built-in terminal and editor — no account, no telemetry.

Troubleshooting Common Issues

Model Not Responding

Check that Ollama is running. Run curl http://localhost:11434/api/tags — if it returns JSON, your server is up. If not, restart Ollama or reinstall.

Slow Performance

Reduce model size. claude-3.5-sonnet:14b is fine on 24 GB VRAM; for 8 GB, use claude-3-haiku:7b. Close other GPU-intensive apps.

AZMX Can't Connect

Ensure your provider URL is correct. If using HTTPS vs HTTP, adjust in Settings. Firewalls may block localhost — disable them temporarily.

Security Considerations

Local inference is inherently more secure than cloud, but not immune. Always:

  • Use AZMX's denial list to block access to .env, .ssh, and credential files
  • Review every diff in the approval gate before approving
  • Keep your Ollama server isolated (no external network access)
  • Regularly update both Ollama and AZMX for security patches

AZMX's signed updater is the only network call it makes on its own — no telemetry, no analytics, no tracking.

Use Cases for Local Claude

  1. Private code review — scan your entire repo for vulnerabilities without uploading to any cloud.
  2. Offline pair programming — use your local Claude as a sounding board during flights or in air-gapped environments.
  3. Automated refactoring — let the agent suggest changes to your TypeScript or Python files, approve per-hunk.
  4. Learning and experimentation — test your own fine-tuned models without sharing data.

Conclusion

Running Claude locally is straightforward: install Ollama, pull a model, point AZMX AI at it, and you have a private, approval-gated AI agent in about 10 minutes. It won't match cloud Claude's absolute intelligence, but for daily development work — where privacy, control, and cost matter — it's the better choice. Download AZMX AI from azmx.ai/download and try it today. No account. No telemetry. Just your machine, your model, your code.

One window. The whole loop.