Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md

README.md

Data Preparation in RAG

Getting started

Clone repository
Run ./scripts/start.sh

git clone https://github.com/mage-ai/rag-project
cd rag-project
./scripts/start.sh

Once started, go to http://localhost:6789/

For more setup information, refer to these instructions

0. Module overview

1. Ingest

In this section, we cover the ingestion of documents from a single data source.

Todo: what if we only have custom code? How to edit it?

2. Chunk

Once data is ingested, we break it into manageable chunks. This section explains the importance of chunking data and various techniques.

TODO: why do we need chunking?

3. Tokenization

Tokenization is a crucial step in text processing and preparing the data for effective retrieval.

https://youtu.be/SpoepeljNGc

4. Embed

Embedding data translates text into numerical vectors that can be processed by models.

https://youtu.be/BmlDmGMnrEA

5. Export

After processing, data needs to be exported for storage so that it can be retrieved for better contextualization of user queries.

https://youtu.be/nhdG09vZqtc

6. Test Vector Search Query

After exporting the chunks and embeddings, we can test the search query to retrieve relevant documents on sample queries.

https://youtu.be/BDgzv5nDt5g

https://youtu.be/2A7h4dWV8xA

7. Trigger Daily Runs

Automation is key to maintaining and updating your system. This section demonstrates how to schedule and trigger daily runs for your data pipelines, ensuring up-to-date and consistent data processing.

https://youtu.be/7JyWw1F50CE

Homework

TBA

Notes

First link goes here
Did you take notes? Add them above this line (Send a PR with links to your notes)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

05-orchestration

05-orchestration

README.md

Data Preparation in RAG

Getting started

0. Module overview

1. Ingest

2. Chunk

3. Tokenization

4. Embed

5. Export

6. Test Vector Search Query

7. Trigger Daily Runs

Homework

Notes

Files

05-orchestration

Directory actions

More options

Directory actions

More options

Latest commit

History

05-orchestration

Folders and files

parent directory

README.md

Data Preparation in RAG

Getting started

0. Module overview

1. Ingest

2. Chunk

3. Tokenization

4. Embed

5. Export

6. Test Vector Search Query

7. Trigger Daily Runs

Homework

Notes