LANCY

KNOWLEDGE SERVED TRANSPARENTLY

Open Source Retrieval Augmented Generation (RAG) system for individuals and companies. Ingest your internal knowledge, retrieve answers from your local LLM with relevant context. See and tweak the RAG parameters in effect. Brand however you like.

Start Building Free

rag-pipeline --status

llm: served

backend: online

latency: 14ms

status: ingesting...

Built for Transparency

Full stack of capabilities for making your internal knowledge more easily accessible. Aimed at giving you insight into the RAG process, letting you adjust the settings to your environment.

LLM AGNOSTIC

MULTI-KB SUPPORT

VISION-AIDED RAG

STRUCTURED EVIDENCE

API-READY

ADVANCED RETRIEVAL METHODS

MODULAR DEPLOYMENT

SINGLE SIGN-ON

Keep your Information Private

Running locally, there is no need to upload your confidential documents to a cloud provider.

Advanced RAG features

Featuring advanced RAG capabilities, Lancy lets you fine-tune features according to your specific needs and infrastructure. Too many abbreviations? Check the RAG Basics page for an explanation of the RAG pipeline.

Learn More

Admin Panel

Gain transparent insights into the operation of Lancy and your users interactions with it. Tailor it to your backend infrastructure and monitor database health and maintenance operations. Add custom branding to match your company style.

Monitor ingestion runs
Debug LLM calls and responses
Configure SSO

Transparent Insights

Gain transparent insights into the state of your Knowledge Bases, learn and monitor how chunking was done to drill down on issues with answer quality and relevance. Simulate the retrieval process without hitting your LLM, visualizing the effect of features like reranking.

Frequently Asked Questions

What is Lancy?

Lancy is a self-hosted document question-answering system built on Retrieval-Augmented Generation. You point it at a collection of documents, it indexes them, and you can then ask questions in plain language and get answers grounded in the source material. Everything runs on your own infrastructure — no data leaves your environment.

What are the hardware requirements?

Lancy can run on modest hardware, but the quality of responses is directly tied to the models you can afford to run. On a standard laptop or small server you will get something working, but language models and embedding models are computationally heavy, and without dedicated GPU resources inference will be slow and model output quality will be limited. Typically, smaller models also struggle to follow directions for creating structured output. For anything beyond occasional personal use, a machine with a capable GPU is recommended. See the deployment section for specific guidance on supported configurations.

Is Lancy open source?

Yes, Lancy is open source under the Apache-2.0 license, allowing you to run it locally or host it yourself without vendor lock-in.

Is this vibe coded?

Without LLM coding agents, the current state of the app would not have been achievable. However, the code base has evolved in small evolutionary steps, building on top of a solid foundation. Steps are planned in advance and documented in design documents. Defining requirements and doing testing is done with applying the stack on real-world production scenarios.