Documentation

Quick Start

Get HiveBear installed and chatting with your first model in under a minute.

1. Install

$ curl -fsSL https://hivebear.com/install.sh | sh

2. Run quickstart

$ hivebear quickstart
 
🐾 Profiling your hardware...
🔍 Finding your beary best match...
📦 Installing recommended model...
✅ Ready! Starting chat...

3. Or start the API server

$ hivebear run llama-3.1-8b --api --port 8080

CLI Reference

Hardware & Recommendations

Show hardware profile
$ hivebear profile
Get model recommendations
$ hivebear recommend --json --top 5
Recommendations with community data
$ hivebear recommend --community
Run inference benchmark
$ hivebear benchmark --duration 30
Benchmark and share with community
$ hivebear benchmark --model llama-3.1-8b-q4_k_m --share
Profile \u2192 recommend \u2192 install \u2192 chat
$ hivebear quickstart

Model Management

Search models on HuggingFace
$ hivebear search "llama 8b"
Download and install a model
$ hivebear install llama-3.1-8b
List installed models
$ hivebear list
Remove an installed model
$ hivebear remove llama-3.1-8b
Convert model format
$ hivebear convert llama-3.1-8b --to gguf
Show disk usage and cleanup
$ hivebear storage --cleanup

Inference

Interactive chat
$ hivebear run llama-3.1-8b
Single generation
$ hivebear run llama-3.1-8b --prompt "Explain quantum computing"
Start OpenAI-compatible API server
$ hivebear run llama-3.1-8b --api --port 8080
List available inference engines
$ hivebear engines

P2P Mesh

Join the P2P mesh network
$ hivebear mesh start --port 7878
Show connected peers
$ hivebear mesh status
Distributed inference across mesh
$ hivebear mesh run llama-3.1-70b --prompt "Hello"
Leave the mesh
$ hivebear mesh stop

Configuration

Show current configuration
$ hivebear config show
Show config file location
$ hivebear config path
Reset to defaults
$ hivebear config reset

API Reference

HiveBear provides an OpenAI-compatible API server. Point any OpenAI client at http://localhost:8080 and use local models with zero code changes.

Start the server

$ hivebear run llama-3.1-8b --api --port 8080

Chat completions

curl
$ curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.1-8b",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}'

Python client

python
from openai import OpenAI
 
client = OpenAI(
base_url="http://localhost:8080/v1",
api_key="not-needed"
)
 
response = client.chat.completions.create(
model="llama-3.1-8b",
messages=[{"role": "user", "content": "Hello!"}]
)

Available endpoints

EndpointDescription
POST /v1/chat/completionsChat completions (streaming supported)
POST /v1/completionsText completions
GET /v1/modelsList available models
POST /v1/embeddingsGenerate embeddings

Mesh Guide

The P2P mesh network (called “the hive”) lets you pool multiple devices together to run models that are too large for any single machine. Your laptop + your desktop + your friend's PC = one distributed inference cluster.

How it works

HiveBear uses QUIC transport with TLS encryption for peer-to-peer connections. Model layers are distributed across devices based on available memory and compute. Inference happens in a pipeline - each device processes its assigned layers and passes the result to the next.

Start a mesh node

$ hivebear mesh start --port 7878

This starts a mesh node that listens for peer connections and advertises your hardware profile.

Check mesh status

$ hivebear mesh status
 
Mesh: Active
Peers: 3 connected
→ desktop-01 (RTX 4090, 24GB VRAM)
→ laptop-02 (M2 Pro, 16GB)
→ rpi-03 (ARM, 8GB)
Total compute: 48GB combined

Run distributed inference

$ hivebear mesh run llama-3.1-70b --prompt "Explain quantum computing"

The model is automatically distributed across available peers based on their capabilities.

Configuration

HiveBear stores its configuration in a TOML file. Use the CLI to view and manage settings.

Config file location
$ hivebear config path\n~/.config/hivebear/config.toml
View current configuration
$ hivebear config show
Reset to defaults
$ hivebear config reset

Config file reference

~/.config/hivebear/config.toml
"text-text-muted italic"># Model storage directory
model_dir = "~/.local/share/hivebear/models"
 
"text-text-muted italic"># Default inference engine (auto, llama-cpp, candle)
engine = "auto"
 
"text-text-muted italic"># Share benchmark results with the community (anonymized)
share_benchmarks = true
 
"text-text-muted italic"># API server settings
[api]
host = "127.0.0.1"
port = 8080
 
"text-text-muted italic"># P2P mesh settings
[mesh]
port = 7878
max_peers = 16
announce = true

Community Benchmarks

HiveBear can share anonymized benchmark results with the community, so everyone gets real-world performance data instead of guesses. When you run a benchmark and share it, people with similar hardware see actual tok/s numbers in their recommendations.

Sharing Your Results

Enable sharing globally in your config, or use the --share flag for a one-off share. You need to be logged in (or have an active device key) for sharing to work.

Enable sharing in config
"text-text-muted italic"># In ~/.config/hivebear/config.toml
share_benchmarks = true
Run a benchmark and share
$ hivebear benchmark --model llama-3.1-8b-q4_k_m --share
 
HiveBear Benchmark
 
Running real inference benchmark(s)...
Loading model: llama-3.1-8b-q4_k_m
Engine: llama.cpp
 
Results
Model: llama-3.1-8b-q4_k_m
Type: inference
Tokens generated: 256
Time to 1st tok: 182 ms
Generate tok/s: 24.3
Total duration: 10736 ms
Peak memory: 4.8 GB
Benchmark shared with community.

Getting Community Recommendations

Use --community to see what real users with similar hardware are getting. The Community column shows the median tok/s and sample count from the community.

Recommendations with community data
$ hivebear recommend --community
 
HiveBear Model Recommendations
 
Based on: AMD Ryzen 9 7950X | 32.0 GB RAM | 25.6 GB available
NVIDIA RTX 4070 | 12.0 GB VRAM
 
"text-text-muted italic"># Model Quant Engine Est. tok/s Community Memory Conf.
────────────────────────────────────────────────────────────────────────────────────────────
1 Llama 3.1 8B Q4_K_M llama.cpp 25.3 24.1 (47) 4.8 GB 85%
2 Mistral 7B Q4_K_M llama.cpp 28.1 27.5 (32) 4.0 GB 82%
3 Phi-3 Mini 3.8B Q4_K_M llama.cpp 52.4 -- 2.3 GB 70%

Privacy

All hardware data is bucketed into broad categories before leaving your machine. Your identity is hashed — the server cannot connect benchmarks to your account.

Shared (anonymized)NOT shared
CPU core count (bucketed)CPU model name
RAM size (bucketed)Exact RAM amount
GPU class (Low/Mid/High/Ultra)GPU name
VRAM size (bucketed)IP address
OS + architectureConversations
Tokens/sec, TTFT, memoryModel file paths
HiveBearHiveBear

Free, open-source, self-hosted AI that actually fits your machine. A P2P mesh of neighbors pooling everyday hardware to run big local AI models together. Written in Rust, powered by the hive.

Product

  • Download
  • Documentation
  • Playground
  • FAQ

Run a model

  • Run Llama 3 70B
  • Run DeepSeek R1
  • Run Qwen 2.5 72B
  • Run Mistral 7B
  • All models →

Compare

  • HiveBear vs Ollama
  • HiveBear vs LM Studio
  • HiveBear vs exo
  • HiveBear vs Jan.ai

Community

  • Discord
  • GitHub
  • Discussions
  • Community hub
PawPaw the bear, chilling

Built with Rust. MIT License. © 2026 BeckhamLabs.

Privacy Policy
HiveBearHiveBear
DownloadDocsModelsFAQCommunity
GitHubSign inInstall