Quick Start

Get HiveBear installed and chatting with your first model in under a minute.

1. Install

$ curl -fsSL https://hivebear.com/install.sh | sh

2. Run quickstart

$ hivebear quickstart
 
🐾 Profiling your hardware...
🔍 Finding your beary best match...
📦 Installing recommended model...
✅ Ready! Starting chat...

3. Or start the API server

$ hivebear run llama-3.1-8b --api --port 8080

CLI Reference

Hardware & Recommendations

Show hardware profile

$ hivebear profile

Get model recommendations

$ hivebear recommend --json --top 5

Recommendations with community data

$ hivebear recommend --community

Run inference benchmark

$ hivebear benchmark --duration 30

Benchmark and share with community

$ hivebear benchmark --model llama-3.1-8b-q4_k_m --share

Profile \u2192 recommend \u2192 install \u2192 chat

$ hivebear quickstart

Model Management

Search models on HuggingFace

$ hivebear search "llama 8b"

Download and install a model

$ hivebear install llama-3.1-8b

List installed models

$ hivebear list

Remove an installed model

$ hivebear remove llama-3.1-8b

Convert model format

$ hivebear convert llama-3.1-8b --to gguf

Show disk usage and cleanup

$ hivebear storage --cleanup

Inference

Interactive chat

$ hivebear run llama-3.1-8b

Single generation

$ hivebear run llama-3.1-8b --prompt "Explain quantum computing"

Start OpenAI-compatible API server

$ hivebear run llama-3.1-8b --api --port 8080

List available inference engines

$ hivebear engines

P2P Mesh

Join the P2P mesh network

$ hivebear mesh start --port 7878

Show connected peers

$ hivebear mesh status

Distributed inference across mesh

$ hivebear mesh run llama-3.1-70b --prompt "Hello"

Leave the mesh

$ hivebear mesh stop

Configuration

Show current configuration

$ hivebear config show

Show config file location

$ hivebear config path

Reset to defaults

$ hivebear config reset

API Reference

HiveBear provides an OpenAI-compatible API server. Point any OpenAI client at http://localhost:8080 and use local models with zero code changes.

Start the server

$ hivebear run llama-3.1-8b --api --port 8080

Chat completions

curl

$ curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Python client

python

from openai import OpenAI
 
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)
 
response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[{"role": "user", "content": "Hello!"}]
)

Available endpoints

Endpoint	Description
POST /v1/chat/completions	Chat completions (streaming supported)
POST /v1/completions	Text completions
GET /v1/models	List available models
POST /v1/embeddings	Generate embeddings

Mesh Guide

The P2P mesh network (called “the hive”) lets you pool multiple devices together to run models that are too large for any single machine. Your laptop + your desktop + your friend's PC = one distributed inference cluster.

How it works

HiveBear uses QUIC transport with TLS encryption for peer-to-peer connections. Model layers are distributed across devices based on available memory and compute. Inference happens in a pipeline - each device processes its assigned layers and passes the result to the next.

Start a mesh node

$ hivebear mesh start --port 7878

This starts a mesh node that listens for peer connections and advertises your hardware profile.

Check mesh status

$ hivebear mesh status
 
Mesh: Active
Peers: 3 connected
  → desktop-01 (RTX 4090, 24GB VRAM)
  → laptop-02 (M2 Pro, 16GB)
  → rpi-03 (ARM, 8GB)
Total compute: 48GB combined

Run distributed inference

$ hivebear mesh run llama-3.1-70b --prompt "Explain quantum computing"

The model is automatically distributed across available peers based on their capabilities.

Configuration

HiveBear stores its configuration in a TOML file. Use the CLI to view and manage settings.

Config file location

$ hivebear config path\n~/.config/hivebear/config.toml

View current configuration

$ hivebear config show

Reset to defaults

$ hivebear config reset

Config file reference

~/.config/hivebear/config.toml

"text-text-muted italic"># Model storage directory
model_dir = "~/.local/share/hivebear/models"
 
"text-text-muted italic"># Default inference engine (auto, llama-cpp, candle)
engine = "auto"
 
"text-text-muted italic"># Share benchmark results with the community (anonymized)
share_benchmarks = true
 
"text-text-muted italic"># API server settings
[api]
host = "127.0.0.1"
port = 8080
 
"text-text-muted italic"># P2P mesh settings
[mesh]
port = 7878
max_peers = 16
announce = true

Community Benchmarks

HiveBear can share anonymized benchmark results with the community, so everyone gets real-world performance data instead of guesses. When you run a benchmark and share it, people with similar hardware see actual tok/s numbers in their recommendations.

Sharing Your Results

Enable sharing globally in your config, or use the --share flag for a one-off share. You need to be logged in (or have an active device key) for sharing to work.

Enable sharing in config

"text-text-muted italic"># In ~/.config/hivebear/config.toml
share_benchmarks = true

Run a benchmark and share

$ hivebear benchmark --model llama-3.1-8b-q4_k_m --share
 
  HiveBear Benchmark
 
Running real inference benchmark(s)...
Loading model: llama-3.1-8b-q4_k_m
  Engine: llama.cpp
 
Results
  Model:            llama-3.1-8b-q4_k_m
  Type:             inference
  Tokens generated: 256
  Time to 1st tok:  182 ms
  Generate tok/s:   24.3
  Total duration:   10736 ms
  Peak memory:      4.8 GB
  Benchmark shared with community.

Getting Community Recommendations

Use --community to see what real users with similar hardware are getting. The Community column shows the median tok/s and sample count from the community.

Recommendations with community data

$ hivebear recommend --community
 
  HiveBear Model Recommendations
 
Based on: AMD Ryzen 9 7950X | 32.0 GB RAM | 25.6 GB available
          NVIDIA RTX 4070 | 12.0 GB VRAM
 
  "text-text-muted italic">#    Model                     Quant      Engine       Est. tok/s  Community    Memory     Conf.
  ────────────────────────────────────────────────────────────────────────────────────────────
  1    Llama 3.1 8B              Q4_K_M     llama.cpp    25.3        24.1 (47)    4.8 GB     85%
  2    Mistral 7B                Q4_K_M     llama.cpp    28.1        27.5 (32)    4.0 GB     82%
  3    Phi-3 Mini 3.8B           Q4_K_M     llama.cpp    52.4        --           2.3 GB     70%

Privacy

All hardware data is bucketed into broad categories before leaving your machine. Your identity is hashed — the server cannot connect benchmarks to your account.

Shared (anonymized)	NOT shared
CPU core count (bucketed)	CPU model name
RAM size (bucketed)	Exact RAM amount
GPU class (Low/Mid/High/Ultra)	GPU name
VRAM size (bucketed)	IP address
OS + architecture	Conversations
Tokens/sec, TTFT, memory	Model file paths

Quick Start

Get HiveBear installed and chatting with your first model in under a minute.

1. Install

$ curl -fsSL https://hivebear.com/install.sh | sh

2. Run quickstart

$ hivebear quickstart
 
🐾 Profiling your hardware...
🔍 Finding your beary best match...
📦 Installing recommended model...
✅ Ready! Starting chat...

3. Or start the API server

$ hivebear run llama-3.1-8b --api --port 8080

CLI Reference

Hardware & Recommendations

Show hardware profile

$ hivebear profile

Get model recommendations

$ hivebear recommend --json --top 5

Recommendations with community data

$ hivebear recommend --community

Run inference benchmark

$ hivebear benchmark --duration 30

Benchmark and share with community

$ hivebear benchmark --model llama-3.1-8b-q4_k_m --share

Profile \u2192 recommend \u2192 install \u2192 chat

$ hivebear quickstart

Model Management

Search models on HuggingFace

$ hivebear search "llama 8b"

Download and install a model

$ hivebear install llama-3.1-8b

List installed models

$ hivebear list

Remove an installed model

$ hivebear remove llama-3.1-8b

Convert model format

$ hivebear convert llama-3.1-8b --to gguf

Show disk usage and cleanup

$ hivebear storage --cleanup

Inference

Interactive chat

$ hivebear run llama-3.1-8b

Single generation

$ hivebear run llama-3.1-8b --prompt "Explain quantum computing"

Start OpenAI-compatible API server

$ hivebear run llama-3.1-8b --api --port 8080

List available inference engines

$ hivebear engines

P2P Mesh

Join the P2P mesh network

$ hivebear mesh start --port 7878

Show connected peers

$ hivebear mesh status

Distributed inference across mesh

$ hivebear mesh run llama-3.1-70b --prompt "Hello"

Leave the mesh

$ hivebear mesh stop

Configuration

Show current configuration

$ hivebear config show

Show config file location

$ hivebear config path

Reset to defaults

$ hivebear config reset

API Reference

HiveBear provides an OpenAI-compatible API server. Point any OpenAI client at http://localhost:8080 and use local models with zero code changes.

Start the server

$ hivebear run llama-3.1-8b --api --port 8080

Chat completions

curl

$ curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.1-8b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Python client

python

from openai import OpenAI
 
client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="not-needed"
)
 
response = client.chat.completions.create(
    model="llama-3.1-8b",
    messages=[{"role": "user", "content": "Hello!"}]
)

Available endpoints

Endpoint	Description
POST /v1/chat/completions	Chat completions (streaming supported)
POST /v1/completions	Text completions
GET /v1/models	List available models
POST /v1/embeddings	Generate embeddings

Mesh Guide

How it works

Start a mesh node

$ hivebear mesh start --port 7878

This starts a mesh node that listens for peer connections and advertises your hardware profile.

Check mesh status

$ hivebear mesh status
 
Mesh: Active
Peers: 3 connected
  → desktop-01 (RTX 4090, 24GB VRAM)
  → laptop-02 (M2 Pro, 16GB)
  → rpi-03 (ARM, 8GB)
Total compute: 48GB combined

Run distributed inference

$ hivebear mesh run llama-3.1-70b --prompt "Explain quantum computing"

The model is automatically distributed across available peers based on their capabilities.

Configuration

HiveBear stores its configuration in a TOML file. Use the CLI to view and manage settings.

Config file location

$ hivebear config path\n~/.config/hivebear/config.toml

View current configuration

$ hivebear config show

Reset to defaults

$ hivebear config reset

Config file reference

~/.config/hivebear/config.toml

"text-text-muted italic"># Model storage directory
model_dir = "~/.local/share/hivebear/models"
 
"text-text-muted italic"># Default inference engine (auto, llama-cpp, candle)
engine = "auto"
 
"text-text-muted italic"># Share benchmark results with the community (anonymized)
share_benchmarks = true
 
"text-text-muted italic"># API server settings
[api]
host = "127.0.0.1"
port = 8080
 
"text-text-muted italic"># P2P mesh settings
[mesh]
port = 7878
max_peers = 16
announce = true

Community Benchmarks

Sharing Your Results

Enable sharing globally in your config, or use the --share flag for a one-off share. You need to be logged in (or have an active device key) for sharing to work.

Enable sharing in config

"text-text-muted italic"># In ~/.config/hivebear/config.toml
share_benchmarks = true

Run a benchmark and share

$ hivebear benchmark --model llama-3.1-8b-q4_k_m --share
 
  HiveBear Benchmark
 
Running real inference benchmark(s)...
Loading model: llama-3.1-8b-q4_k_m
  Engine: llama.cpp
 
Results
  Model:            llama-3.1-8b-q4_k_m
  Type:             inference
  Tokens generated: 256
  Time to 1st tok:  182 ms
  Generate tok/s:   24.3
  Total duration:   10736 ms
  Peak memory:      4.8 GB
  Benchmark shared with community.

Getting Community Recommendations

Use --community to see what real users with similar hardware are getting. The Community column shows the median tok/s and sample count from the community.

Recommendations with community data

$ hivebear recommend --community
 
  HiveBear Model Recommendations
 
Based on: AMD Ryzen 9 7950X | 32.0 GB RAM | 25.6 GB available
          NVIDIA RTX 4070 | 12.0 GB VRAM
 
  "text-text-muted italic">#    Model                     Quant      Engine       Est. tok/s  Community    Memory     Conf.
  ────────────────────────────────────────────────────────────────────────────────────────────
  1    Llama 3.1 8B              Q4_K_M     llama.cpp    25.3        24.1 (47)    4.8 GB     85%
  2    Mistral 7B                Q4_K_M     llama.cpp    28.1        27.5 (32)    4.0 GB     82%
  3    Phi-3 Mini 3.8B           Q4_K_M     llama.cpp    52.4        --           2.3 GB     70%

Privacy

All hardware data is bucketed into broad categories before leaving your machine. Your identity is hashed — the server cannot connect benchmarks to your account.

Shared (anonymized)	NOT shared
CPU core count (bucketed)	CPU model name
RAM size (bucketed)	Exact RAM amount
GPU class (Low/Mid/High/Ultra)	GPU name
VRAM size (bucketed)	IP address
OS + architecture	Conversations
Tokens/sec, TTFT, memory	Model file paths