The DOCK_BYTE module provides tools for extracting text from PDF and TXT documents and enables interactive chat-based exploration of the extracted content using a language model. It leverages various libraries for document processing and integrates with Streamlit for a GUI-based interface.
- Extract text from PDF documents using PyMuPDF.
- Perform OCR on PDF documents using Tesseract.
- Extract text from TXT files.
- Use a language model to chat with the content of the documents.
- GUI support with Streamlit for interactive usage.
pip install DOCK_BYTE
from dock_byte import chat_with_doc
chat_with_doc("gemma:2b", "data.txt", use_gui=True)
streamlit CODE_FILE.py
This project is licensed under the MIT License - see the LICENSE file for details.
For more information and to contribute, please visit the GitHub repository.