No description
  • Rust 87.6%
  • Shell 12.4%
Find a file
Guilhem Lavaux 9879212039
Some checks failed
Build and create binary artifact / release (push) Successful in 2m51s
Check code passes formatting and lint rules / check (push) Failing after 3m5s
Release / release (push) Failing after 3m10s
fix: workflows adjustment
2026-04-06 16:48:35 +02:00
.forgejo/workflows fix: workflows adjustment 2026-04-06 16:48:35 +02:00
src fix: run clippy & fmt 2026-04-06 16:44:46 +02:00
tools feat: add semantic release system 2025-12-01 09:24:59 +01:00
.gitignore chore: update gitignore 2026-04-05 10:11:33 +02:00
.releaserc.json build: update label for release binary 2025-12-28 13:07:48 +01:00
Cargo.lock feat: add shell autocompletion command 2026-04-05 11:19:35 +02:00
Cargo.toml feat: add shell autocompletion command 2026-04-05 11:19:35 +02:00
CHANGELOG.md chore(release): 1.5.2 [skip ci] 2026-04-05 11:09:23 +02:00
new-release.sh fix: have auto-releaser publish after the release 2026-04-05 11:05:47 +02:00
package-lock.json feat: add semantic release system 2025-12-01 09:24:59 +01:00
package.json feat: add semantic release system 2025-12-01 09:24:59 +01:00
README.md feat: add shell autocompletion command 2026-04-05 11:19:35 +02:00

llamactl

A command-line interface for managing and interacting with ollama_proxy_3.

Author: Guilhem Lavaux Copyright: CNRS

What is this?

llamactl is the companion CLI for ollama_proxy_3, a managed reverse proxy that sits in front of one or more Ollama compatible instances running on dedicated or SLURM-managed HPC infrastructure.

Instead of talking directly to an Ollama server, users connect through the proxy, which handles authentication, server lifecycle (starting/stopping SLURM jobs), model management, and request routing. llamactl exposes all of that control surface from your terminal.

 you ──► llamactl ──► ollama_proxy_3 ──► SLURM ──► Ollama instance(s)

Features

  • Server lifecycle — start and stop Ollama backend servers via SLURM
  • Job tracking — monitor server startup progress and model pulls via streaming (SSE)
  • Model management — list available models, pull new ones onto a specific server
  • Local proxy — run a local pass-through server on port 11434 so standard Ollama clients (e.g. ollama, Open WebUI) work transparently without reconfiguration
  • User administration — manage users, passwords, admin roles, SLURM access, and API keys
  • SLURM visibility — inspect node status and GPU availability
  • Metrics — fetch Prometheus-format metrics from the proxy

Installation

Prerequisites: Rust 1.94+

cargo install --path .
# or build manually
cargo build --release
# binary: target/release/llamactl

Pre-built binaries for Linux x86_64 are attached to each release.

Configuration

Set these environment variables to avoid passing flags on every invocation:

Variable Flag Description
OLLAMA_PROX_API_URL -u, --url Base URL of the ollama_proxy_3 instance
OLLAMA_PROX_API_TOKEN -t, --token Bearer token for authentication
export OLLAMA_PROX_API_URL=https://your-proxy.example.com
export OLLAMA_PROX_API_TOKEN=your-api-token

Usage

llamactl [OPTIONS] <COMMAND>

Server management

# List all servers
llamactl list

# Show detailed info for a server
llamactl show my-server

# Start / stop a server (SLURM job)
llamactl start my-server
llamactl stop my-server

# Check SLURM node status and GPU availability
llamactl status-slurm my-server
llamactl status-slurm my-server --avail

Job tracking

# List pending startup jobs
llamactl progress list

# Stream startup progress for a job
llamactl progress query <JOB_ID>

# Cancel a job
llamactl progress cancel <JOB_ID>

Models

# List models (all servers, or filtered)
llamactl list-models
llamactl list-models my-server

# Pull a model onto a server
llamactl pull start llama3.2 my-server

# Check pull progress
llamactl pull status <JOB_ID>

Local proxy

Starts a local HTTP server on port 11434 that forwards requests to the upstream proxy with your credentials automatically injected. Any standard Ollama client pointed at http://localhost:11434 will work as-is.

# Forward all requests, let the proxy pick the server
llamactl serve

# Pin to a specific backend server
llamactl serve --server my-server

# Use a different port
llamactl serve --port 8080

# Print request/response debug info
llamactl serve --debug

Shell completions

Generate and install a completion script for your shell:

# Bash
llamactl completions bash > ~/.local/share/bash-completion/completions/llamactl

# Zsh (add to a directory on your $fpath)
llamactl completions zsh > ~/.zfunc/_llamactl
# then add `fpath=(~/.zfunc $fpath)` and `autoload -Uz compinit && compinit` to ~/.zshrc

# Fish
llamactl completions fish > ~/.config/fish/completions/llamactl.fish

# PowerShell
llamactl completions powershell >> $PROFILE

# Elvish
llamactl completions elvish

Proxy worker status & metrics

llamactl worker-status
llamactl metrics

User management (admin only)

# List users
llamactl user list

# Add / remove users
llamactl user add alice secret123
llamactl user remove alice

# Grant or revoke admin / SLURM access
llamactl user set-admin alice --is_admin true
llamactl user set-slurm-access alice --can_use_slurm true

# Change your own password
llamactl user password current-pass new-pass

# API key management
llamactl user api-key new "my-script"
llamactl user api-key list
llamactl user api-key remove <KEY_ID>

# Manage another user's keys (admin)
llamactl user api-key --user alice list

Relation to ollama_proxy_3

llamactl is a pure client — it has no business logic of its own beyond formatting requests and displaying responses. All state (servers, users, jobs, models) lives in ollama_proxy_3.

The proxy exposes two API namespaces that llamactl consumes:

  • /proxy/v1/… — server lifecycle, job tracking, model pulls, SLURM status, metrics
  • /proxy/v2/user/… — user and API key management

The serve subcommand additionally forwards the standard Ollama API (/api/…) so that unmodified Ollama-compatible tools can connect through the proxy without knowing about it.

License

MIT