Guide
How to run AI models locally on your Mac
Running AI on your own Mac keeps your data private, works offline, and costs nothing per message. Here's how it works — and the easiest way to do it.
Why run AI locally?
- Privacy — your prompts and files never leave your device.
- Offline — works on a plane, on the road, or with no Wi-Fi.
- No per-message cost — open-source models run free on your hardware.
Step-by-step
- Check your Mac. Apple Silicon (M1 or later) is ideal. 16 GB RAM comfortably runs 7–8B models; 8 GB can run smaller 3B models.
- Get a local runtime. You can install Ollama yourself — or use Lyra, which sets up and manages the runtime for you.
- Choose a model. Match the model to your RAM (see the table below).
- Download & run. Pull the model once; it then runs entirely on your Mac.
- Stay private with cloud, too. If you ever add cloud models, use a tool that redacts personal data before any cloud call (Lyra does this by default).
Which model should I pick?
| Model | Good for | Approx. size |
|---|---|---|
| Gemma 3 4B | Lighter Macs, fast everyday chat | ~3.3 GB |
| Mistral 7B | Everyday writing & drafting | ~4.4 GB |
| Llama 3.1 8B | Writing & analysis | ~4.9 GB |
| Qwen 3 8B | Multilingual & reasoning | ~5 GB |
| DeepSeek Coder V2 | Code | ~8.9 GB |
The easiest way: Lyra
Lyra is a native macOS app that manages the local runtime, gives you a built-in model picker (new models appear automatically for you to download), and runs everything on-device with smart routing and PII redaction built in. Lyra Local is free for the first 100,000 users.
Download Lyra free →FAQ
What Mac do I need?
Apple Silicon (M1+). 16 GB RAM for 7–8B models; 8 GB for smaller 3B models.
Is it free?
Yes — open-source models have no per-message cost after a one-time download.
Does it work offline?
Yes, entirely — including airplane mode.