- Clone repository
- Run
./scripts/start.sh
git clone https://github.com/mage-ai/rag-project
cd rag-project
./scripts/start.sh
Once started, go to http://localhost:6789/
For more setup information, refer to these instructions
In this section, we cover the ingestion of documents from a single data source.
Todo: what if we only have custom code? How to edit it?
Once data is ingested, we break it into manageable chunks. This section explains the importance of chunking data and various techniques.
TODO: why do we need chunking?
Tokenization is a crucial step in text processing and preparing the data for effective retrieval.
Embedding data translates text into numerical vectors that can be processed by models.
After processing, data needs to be exported for storage so that it can be retrieved for better contextualization of user queries.
After exporting the chunks and embeddings, we can test the search query to retrieve relevant documents on sample queries.
Automation is key to maintaining and updating your system. This section demonstrates how to schedule and trigger daily runs for your data pipelines, ensuring up-to-date and consistent data processing.
TBA
- First link goes here
- Did you take notes? Add them above this line (Send a PR with links to your notes)