Skip to content

The DOCK_BYTE module provides tools for extracting text from PDF and TXT documents and enables interactive chat-based exploration of the extracted content using a language model. It leverages various libraries for document processing and integrates with Streamlit for a GUI-based interface.

License

Notifications You must be signed in to change notification settings

codebytemirza/DockByte

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOCK_BYTE

The DOCK_BYTE module provides tools for extracting text from PDF and TXT documents and enables interactive chat-based exploration of the extracted content using a language model. It leverages various libraries for document processing and integrates with Streamlit for a GUI-based interface.

Features

  • Extract text from PDF documents using PyMuPDF.
  • Perform OCR on PDF documents using Tesseract.
  • Extract text from TXT files.
  • Use a language model to chat with the content of the documents.
  • GUI support with Streamlit for interactive usage.

Installation

pip install DOCK_BYTE

Usage

from dock_byte import chat_with_doc

chat_with_doc("gemma:2b", "data.txt", use_gui=True)

Runing

streamlit CODE_FILE.py 

License

This project is licensed under the MIT License - see the LICENSE file for details.

Repository

For more information and to contribute, please visit the GitHub repository.

Demo

Watch the video

About

The DOCK_BYTE module provides tools for extracting text from PDF and TXT documents and enables interactive chat-based exploration of the extracted content using a language model. It leverages various libraries for document processing and integrates with Streamlit for a GUI-based interface.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages