A neighbor's guide

Run Llama 3 8B locally

Llama · 8BContext: 8K (Llama 3), 128K (Llama 3.1)Released 2024

Llama 3 8B is the model most people should try first. It's small enough to run on almost any laptop from the last five years, capable enough to feel genuinely useful, and free to download. If you're new to local AI, this is the one.

One command to run it
$ hivebear run llama-3-8b

HiveBear will profile your hardware, pick the right quantization for your pool, and fall back to the hive if your machine can't carry it alone.

Hardware: running it alone

Any laptop with 16 GB of RAM, any M1+ Mac, or any PC with a discrete GPU from the last five years can run this model comfortably.

Memory
~5 GB (Q4) to ~16 GB (fp16)
GPU
Runs on CPU, any modern laptop GPU, M-series Macs, 6 GB+ discrete GPUs

Q4_K_M quantization gets you to ~5 GB on disk and ~6-7 GB of active memory. The Raspberry Pi 5 with 8 GB of RAM will run it, just slowly.

Hardware: running it on the hive

Example pool

You don't really need the hive for this one — it fits on almost anything alone. Where the hive helps is if you want faster tokens/sec: splitting across two peers can roughly double throughput on weaker hardware.

Llama 3 8B is the model we recommend starting with before attempting the bigger ones on the hive. It's the best way to get a feel for what 'fast enough' vs 'too slow' means on your hardware.

Things to know

Real gotchas from the hive. No sales pitch.

  • →The base instruct model is trained to refuse some things — if you're hitting refusals on benign tasks, try a community fine-tune.
  • →Context window on base Llama 3 (not 3.1) is only 8K tokens — fine for chat, short for long documents.

What Llama 3 8B is great at

Starter local LLM. Chat, quick questions, coding help, summarization. Fast enough on modern hardware to feel interactive.

If this isn't the one, try these instead

  • →Mistral 7B — similar size, different training data, often better at non-English tasks.
  • →Phi-3 Mini — even smaller (~4B), stronger on reasoning than its size suggests.
  • →Qwen 2.5 7B — strong all-rounder, especially good at multilingual and code.

Give it a run on your hive

Free, open-source, no sign-up. The hive helps when your machine can't carry it alone.

Download HiveBearAsk in DiscordHugging Face card

More models the hive is running

Llama · 70B
Llama 3 70B
DeepSeek · 671B (MoE, ~37B active) + distilled variants
DeepSeek R1
Qwen · 72B
Qwen 2.5 72B
Mistral · 7B
Mistral 7B
See all models
HiveBearHiveBear

Free, open-source, self-hosted AI that actually fits your machine. A P2P mesh of neighbors pooling everyday hardware to run big local AI models together. Written in Rust, powered by the hive.

Product

  • Download
  • Documentation
  • Playground
  • FAQ

Run a model

  • Run Llama 3 70B
  • Run DeepSeek R1
  • Run Qwen 2.5 72B
  • Run Mistral 7B
  • All models →

Compare

  • HiveBear vs Ollama
  • HiveBear vs LM Studio
  • HiveBear vs exo
  • HiveBear vs Jan.ai

Community

  • Discord
  • GitHub
  • Discussions
  • Community hub
PawPaw the bear, chilling

Built with Rust. MIT License. © 2026 BeckhamLabs.

Privacy Policy
HiveBearHiveBear
DownloadDocsModelsFAQCommunity
GitHubSign inInstall