Set Up LiteLLM Unified Model Gateway to Route Tasks, Control Costs, and Enforce Privacy

Outcome

By the end of this guide, you will have a LiteLLM gateway that gives OpenClaw and other clients one stable OpenAI-compatible API while you route tasks to the right model, keep private work on local-only paths, and control cloud cost intentionally.

Audience and Scope

Audience	New OpenClaw or home lab users who want one model gateway instead of per-client model sprawl
Difficulty	Beginner
Estimated Time	45 to 90 minutes for a first working setup
Assumptions	You can run Docker Compose, you have at least one model provider or local model backend, and you want a stable routing layer

This guide focuses on the decisions that matter most for a first deployment. It is not tied to one specific cloud vendor, and it does not require you to use the exact same model names shown here.

Before You Start

Decide whether you actually need a model gateway. If you only use one model from one client, LiteLLM is optional.
List the types of work you expect to run: private work, routine drafting, coding, web-heavy work, and premium reasoning.
Decide which tasks are allowed to leave your network.
Choose one host that will run LiteLLM and expose a stable base URL for your clients.

Hardware and Software

Hardware

One always-on host that can run Docker Compose
Enough memory and CPU for the local models you plan to keep private
Reliable local storage for active runtime configuration

Software

Docker and Docker Compose
LiteLLM
One or more model backends such as Ollama or a cloud provider
OpenClaw or another client that can speak an OpenAI-compatible API
A secret manager if you want clean separation between config and provider keys

Step-by-Step

Step 1: Define the work lanes before you choose any models

Objective: Build routing around task types, not around vendor names.

Actions:

Create a short list of work lanes.
Keep the list small enough that you will actually use it.
Start with intent, not providers.

Recommended beginner lanes:

private
routine
coding
web
premium

Verification:

You can explain what belongs in each lane in one sentence.
You know which lanes must stay local-only.

Common failure and fix: If you start by picking providers instead of work lanes, stop and write down the task groups first.

Step 2: Pick one starting model for each lane

Objective: Give each lane a default model before you worry about fallbacks.

Actions:

Assign one primary model to each lane.
Favor local models for sensitive work.
Favor cheaper models for routine work.
Save premium models for premium tasks.

Example starter map:

private -> local model
routine -> low-cost local or cloud-fast model
coding -> coding-capable model
web -> stronger research or coding model
premium -> best reasoning model you are willing to pay for

Verification:

Every lane has a clear first-choice model.
Your private lane does not require a cloud provider.

Common failure and fix: If one expensive model is doing everything, force yourself to choose a cheaper default for routine work.

Step 3: Create stable route names instead of exposing raw model IDs

Objective: Make your clients depend on your policy, not vendor naming.

Actions:

Create aliases such as secure-core, local-model, openai-fast, web-model, and general-smart.
Point clients to those aliases.
Keep vendor-specific names inside LiteLLM only.

Verification:

OpenClaw and scripts can call a route alias without knowing the underlying provider.
You could swap the backing model later without changing client config.

Common failure and fix: If clients still call raw provider model IDs directly, replace those calls with task-oriented aliases.

Step 4: Set privacy rules before you add fallbacks

Objective: Prevent private work from leaking to the wrong provider.

Actions:

Mark each route as local only, local first with approved cloud fallback, or cloud allowed.
Keep at least one strict local-only route for sensitive work.
Treat legal, financial, identity, and private household work as local by default.

Good beginner rules:

secure-core = local only
local-model = local only
routine-router = local first, then cheap approved cloud fallback
web-model = cloud allowed
general-smart = premium cloud allowed

Verification:

You can name which routes are allowed to leave your network.
Sensitive work has a route that cannot silently jump to cloud.

Common failure and fix: If you add cloud fallbacks before privacy rules are defined, lock the privacy boundaries first and only then add fallbacks.

Step 5: Keep secrets and runtime config separate from the main routing file

Objective: Make the gateway portable and safer to operate.

Actions:

Keep the main LiteLLM config focused on aliases, route behavior, and fallback policy.
Keep provider secrets outside the main config.
Resolve secrets at runtime through your secret manager or wrapper script.

Verification:

Your routing file can be checked into source control without provider passwords in it.
Rotating a provider key does not require rewriting the whole config file.

Common failure and fix: If live provider keys are sitting in the main config, move them into a runtime-only secret source.

Step 6: If your config source lives on a NAS, keep the runtime copy local

Objective: Avoid brittle container startup failures during reboot.

Actions:

Keep the source-of-truth config in Git or on NAS storage if that fits your lab.
Copy it to a local runtime path on the host before LiteLLM starts.
Mount the local runtime directory into the container instead of bind-mounting a single NAS-backed file.

Verification:

The container reads its live config from local disk.
A NAS mount timing issue will not stop LiteLLM from starting cleanly.

Common failure and fix: If you bind-mount one config file directly from NFS, replace it with a local runtime sync pattern.

Step 7: Point OpenClaw and your other clients at one LiteLLM base URL

Objective: Make LiteLLM the policy layer all clients share.

Actions:

Configure OpenClaw to use the LiteLLM base URL.
Update scripts and internal tools to use the same base URL.
Have each client request the route alias that matches the task.

Verification:

OpenClaw does not need raw provider config for each model lane.
Multiple clients can share the same model policy.

Common failure and fix: If every client still keeps its own routing logic, centralize routing through LiteLLM.

Step 8: Validate the routes with real tests before you trust them

Objective: Prove that the gateway matches your intended privacy and cost policy.

Actions:

Confirm LiteLLM is reachable.
List available models and aliases.
Run one prompt through a local-only route.
Run one prompt through a cloud-allowed route.
Confirm your private route does not fall back to cloud.

Verification:

Local-only routes return successfully from the local backend.
Cloud-allowed routes work only where expected.
You can explain which route was used for each test.

Common failure and fix: If the container is healthy but the route behavior is wrong, test each route explicitly before trusting the gateway.

Validation Checklist

LiteLLM is reachable from your client host.
Route aliases are visible and documented.
At least one route is strictly local-only.
Routine work is not defaulting to your most expensive model.
OpenClaw points to LiteLLM rather than raw vendor model IDs.
Secrets live outside the main routing config.
If the source config is NAS-backed, the container uses a local runtime copy.

Operations and Maintenance

Review route aliases before adding new clients.
Re-test local-only privacy routes after any model or fallback change.
Keep a short route map in documentation so future edits stay intentional.
If you store config on NAS, verify the local runtime sync still works after reboots and upgrades.
Review costs periodically to confirm routine work is still landing on the intended route.

Troubleshooting and Rollback

LiteLLM does not start after reboot: check whether the container still depends on a brittle single-file NAS bind and roll back to the last known-good local runtime config.
Private work hits the cloud: remove or tighten the fallback chain for that route immediately.
Clients behave inconsistently: confirm all clients use the same LiteLLM base URL and route aliases.
Costs spike: check whether a client bypassed your aliases and started calling premium models directly.

Source Links

My Implementation Notes

The most durable setup is to treat LiteLLM as a routing layer, not as the only way to reach a local model. In NAS-backed labs, a strong pattern is to keep the source config in versioned storage, sync it to a local runtime directory, and mount that local runtime directory into the container. For private records or similar sensitive work, pair LiteLLM route policy with separate agent lanes so privacy is enforced by both routing and filesystem boundaries.