Quick Start
Get HiveBear installed and chatting with your first model in under a minute.
1. Install
$ curl -fsSL https://hivebear.com/install.sh | sh2. Run quickstart
$ hivebear quickstart 🐾 Profiling your hardware...🔍 Finding your beary best match...📦 Installing recommended model...✅ Ready! Starting chat...3. Or start the API server
$ hivebear run llama-3.1-8b --api --port 8080CLI Reference
Hardware & Recommendations
$ hivebear profile$ hivebear recommend --json --top 5$ hivebear recommend --community$ hivebear benchmark --duration 30$ hivebear benchmark --model llama-3.1-8b-q4_k_m --share$ hivebear quickstartModel Management
$ hivebear search "llama 8b"$ hivebear install llama-3.1-8b$ hivebear list$ hivebear remove llama-3.1-8b$ hivebear convert llama-3.1-8b --to gguf$ hivebear storage --cleanupInference
$ hivebear run llama-3.1-8b$ hivebear run llama-3.1-8b --prompt "Explain quantum computing"$ hivebear run llama-3.1-8b --api --port 8080$ hivebear enginesP2P Mesh
$ hivebear mesh start --port 7878$ hivebear mesh status$ hivebear mesh run llama-3.1-70b --prompt "Hello"$ hivebear mesh stopConfiguration
$ hivebear config show$ hivebear config path$ hivebear config resetAPI Reference
HiveBear provides an OpenAI-compatible API server. Point any OpenAI client at http://localhost:8080 and use local models with zero code changes.
Start the server
$ hivebear run llama-3.1-8b --api --port 8080Chat completions
$ curl http://localhost:8080/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "llama-3.1-8b", "messages": [{"role": "user", "content": "Hello!"}], "stream": true }'Python client
from openai import OpenAI client = OpenAI( base_url="http://localhost:8080/v1", api_key="not-needed") response = client.chat.completions.create( model="llama-3.1-8b", messages=[{"role": "user", "content": "Hello!"}])Available endpoints
| Endpoint | Description |
|---|---|
| POST /v1/chat/completions | Chat completions (streaming supported) |
| POST /v1/completions | Text completions |
| GET /v1/models | List available models |
| POST /v1/embeddings | Generate embeddings |
Mesh Guide
The P2P mesh network (called “the hive”) lets you pool multiple devices together to run models that are too large for any single machine. Your laptop + your desktop + your friend's PC = one distributed inference cluster.
How it works
HiveBear uses QUIC transport with TLS encryption for peer-to-peer connections. Model layers are distributed across devices based on available memory and compute. Inference happens in a pipeline - each device processes its assigned layers and passes the result to the next.
Start a mesh node
$ hivebear mesh start --port 7878This starts a mesh node that listens for peer connections and advertises your hardware profile.
Check mesh status
$ hivebear mesh status Mesh: ActivePeers: 3 connected → desktop-01 (RTX 4090, 24GB VRAM) → laptop-02 (M2 Pro, 16GB) → rpi-03 (ARM, 8GB)Total compute: 48GB combinedRun distributed inference
$ hivebear mesh run llama-3.1-70b --prompt "Explain quantum computing"The model is automatically distributed across available peers based on their capabilities.
Configuration
HiveBear stores its configuration in a TOML file. Use the CLI to view and manage settings.
$ hivebear config path\n~/.config/hivebear/config.toml$ hivebear config show$ hivebear config resetConfig file reference
"text-text-muted italic"># Model storage directorymodel_dir = "~/.local/share/hivebear/models" "text-text-muted italic"># Default inference engine (auto, llama-cpp, candle)engine = "auto" "text-text-muted italic"># Share benchmark results with the community (anonymized)share_benchmarks = true "text-text-muted italic"># API server settings[api]host = "127.0.0.1"port = 8080 "text-text-muted italic"># P2P mesh settings[mesh]port = 7878max_peers = 16announce = trueCommunity Benchmarks
HiveBear can share anonymized benchmark results with the community, so everyone gets real-world performance data instead of guesses. When you run a benchmark and share it, people with similar hardware see actual tok/s numbers in their recommendations.
Sharing Your Results
Enable sharing globally in your config, or use the --share flag for a one-off share. You need to be logged in (or have an active device key) for sharing to work.
"text-text-muted italic"># In ~/.config/hivebear/config.tomlshare_benchmarks = true$ hivebear benchmark --model llama-3.1-8b-q4_k_m --share HiveBear Benchmark Running real inference benchmark(s)...Loading model: llama-3.1-8b-q4_k_m Engine: llama.cpp Results Model: llama-3.1-8b-q4_k_m Type: inference Tokens generated: 256 Time to 1st tok: 182 ms Generate tok/s: 24.3 Total duration: 10736 ms Peak memory: 4.8 GB Benchmark shared with community.Getting Community Recommendations
Use --community to see what real users with similar hardware are getting. The Community column shows the median tok/s and sample count from the community.
$ hivebear recommend --community HiveBear Model Recommendations Based on: AMD Ryzen 9 7950X | 32.0 GB RAM | 25.6 GB available NVIDIA RTX 4070 | 12.0 GB VRAM "text-text-muted italic"># Model Quant Engine Est. tok/s Community Memory Conf. ──────────────────────────────────────────────────────────────────────────────────────────── 1 Llama 3.1 8B Q4_K_M llama.cpp 25.3 24.1 (47) 4.8 GB 85% 2 Mistral 7B Q4_K_M llama.cpp 28.1 27.5 (32) 4.0 GB 82% 3 Phi-3 Mini 3.8B Q4_K_M llama.cpp 52.4 -- 2.3 GB 70%Privacy
All hardware data is bucketed into broad categories before leaving your machine. Your identity is hashed — the server cannot connect benchmarks to your account.
| Shared (anonymized) | NOT shared |
|---|---|
| CPU core count (bucketed) | CPU model name |
| RAM size (bucketed) | Exact RAM amount |
| GPU class (Low/Mid/High/Ultra) | GPU name |
| VRAM size (bucketed) | IP address |
| OS + architecture | Conversations |
| Tokens/sec, TTFT, memory | Model file paths |
