Deployment

Self-host on your hardware. Clone the repo and you're ready to go.

Getting Started

Docker is not yet supported. Lancy is deployed by cloning the git repository and running the provided start commands or scripts directly on the host.

The stack has three moving parts: the backend, the frontend, and an LLM you bring yourself. They start independently and can run on the same machine or on separate hosts.

Backend

The backend is a FastAPI application written in Python. It owns the retrieval pipeline, embedding models, and vector store, and listens on port 8080. An install script handles the full setup — virtual environment creation, dependency installation, and pre-downloading the embedding models. Once installed, start and stop scripts manage the process, and for persistent deployments the backend can be registered as a systemd user service so it survives reboots without manual intervention.

Frontend

The frontend is a Next.js application that serves the web UI and proxies all API calls to the backend server-side, so the backend never needs to be publicly reachable. It runs on port 3000 and requires no separate install step — dependencies are handled automatically on first start. The only manual step before starting is creating a .env file from the provided example, where you set the login password and, for split deployments, the backend URL.

Scaling Up

Everything can run on a single machine for personal use or small teams. As load grows, the components separate cleanly: the backend and LLM are GPU-hungry and typically move to a dedicated GPU server first, while the frontend stays lightweight and can run on any modest Node.js host. A third tier adds a dedicated PostgreSQL instance for the vector store and conversation history, and a reverse proxy in front of the frontend for TLS termination.

For a full walkthrough of each configuration — environment variables, start scripts, systemd services, firewall rules, and persistent data paths — see the Deployment Guide in the docs.