LANCY
KNOWLEDGE SERVED TRANSPARENTLY
Open Source Retrieval Augmented Generation (RAG) system for individuals and companies. Ingest your internal knowledge, retrieve answers from your local LLM with relevant context. See and tweak the RAG parameters in effect. Brand however you like.
llm: served
backend: online
latency: 14ms
status: ingesting...
Built for Transparency
Full stack of capabilities for making your internal knowledge more easily accessible. Aimed at giving you insight into the RAG process, letting you adjust the settings to your environment.
Keep your Information Private
Running locally, there is no need to upload your confidential documents to a cloud provider.
Advanced RAG features
Featuring advanced RAG capabilities, Lancy lets you fine-tune features according to your specific needs and infrastructure. Too many abbreviations? Check the RAG Basics page for an explanation of the RAG pipeline.
Learn MoreAdmin Panel
Gain transparent insights into the operation of Lancy and your users interactions with it. Tailor it to your backend infrastructure and monitor database health and maintenance operations. Add custom branding to match your company style.
- Monitor ingestion runs
- Debug LLM calls and responses
- Configure SSO
Transparent Insights
Gain transparent insights into the state of your Knowledge Bases, learn and monitor how chunking was done to drill down on issues with answer quality and relevance. Simulate the retrieval process without hitting your LLM, visualizing the effect of features like reranking.
Frequently Asked Questions
What is Lancy?
Lancy is a self-hosted document question-answering system built on Retrieval-Augmented Generation. You point it at a collection of documents, it indexes them, and you can then ask questions in plain language and get answers grounded in the source material. Everything runs on your own infrastructure — no data leaves your environment.
What are the hardware requirements?
Lancy can run on modest hardware, but the quality of responses is directly tied to the models you can afford to run. On a standard laptop or small server you will get something working, but language models and embedding models are computationally heavy, and without dedicated GPU resources inference will be slow and model output quality will be limited. Typically, smaller models also struggle to follow directions for creating structured output. For anything beyond occasional personal use, a machine with a capable GPU is recommended. See the deployment section for specific guidance on supported configurations.
Is Lancy open source?
Yes, Lancy is open source under the Apache-2.0 license, allowing you to run it locally or host it yourself without vendor lock-in.
Is this vibe coded?
Without LLM coding agents, the current state of the app would not have been achievable. However, the code base has evolved in small evolutionary steps, building on top of a solid foundation. Steps are planned in advance and documented in design documents. Defining requirements and doing testing is done with applying the stack on real-world production scenarios.