Guide

How to run AI models locally on your Mac

Running AI on your own Mac keeps your data private, works offline, and costs nothing per message. Here's how it works — and the easiest way to do it.

Why run AI locally?

Step-by-step

  1. Check your Mac. Apple Silicon (M1 or later) is ideal. 16 GB RAM comfortably runs 7–8B models; 8 GB can run smaller 3B models.
  2. Get a local runtime. You can install Ollama yourself — or use Lyra, which sets up and manages the runtime for you.
  3. Choose a model. Match the model to your RAM (see the table below).
  4. Download & run. Pull the model once; it then runs entirely on your Mac.
  5. Stay private with cloud, too. If you ever add cloud models, use a tool that redacts personal data before any cloud call (Lyra does this by default).

Which model should I pick?

ModelGood forApprox. size
Gemma 3 4BLighter Macs, fast everyday chat~3.3 GB
Mistral 7BEveryday writing & drafting~4.4 GB
Llama 3.1 8BWriting & analysis~4.9 GB
Qwen 3 8BMultilingual & reasoning~5 GB
DeepSeek Coder V2Code~8.9 GB

The easiest way: Lyra

Lyra is a native macOS app that manages the local runtime, gives you a built-in model picker (new models appear automatically for you to download), and runs everything on-device with smart routing and PII redaction built in. Lyra Local is free for the first 100,000 users.

Download Lyra free →

FAQ

What Mac do I need?

Apple Silicon (M1+). 16 GB RAM for 7–8B models; 8 GB for smaller 3B models.

Is it free?

Yes — open-source models have no per-message cost after a one-time download.

Does it work offline?

Yes, entirely — including airplane mode.

Related: Lyra vs ChatGPT →