Build software better, together

allenai / papermage

Star

library supporting NLP and CV research on scientific papers

python machine-learning natural-language-processing computer-vision scientific-papers multimodal pdf-processing

Updated Nov 8, 2024
Python

ahmedkhemiri95 / PDFs-TextExtract

Star

Multiple and Large PDF Documents Text Extraction.

python pdf parser data-science pdf-document text-analytics pdfs pypdf2 extract-text pdfminer pdf-processing pdfs-textextract

Updated Feb 10, 2025
Python

aws-samples / document-processing-pipeline-for-regulated-industries

Star

A boilerplate solution for processing image and PDF documents for regulated industries, with lineage and pipeline operations metadata services.

Updated Oct 25, 2021
Python

Govind-S-B / pdf-to-text-chroma-search

Star

Python scripts that converts PDF files to text, splits them into chunks, and stores their vector representations using GPT4All embeddings in a Chroma DB. It also provides a script to query the Chroma DB for similarity search based on user input.

text-extraction similarity-search pdf-processing vector-embeddings chromadb

Updated Oct 23, 2023
Python

ranguy9304 / LangGraphRAG

Star

LangGraphRAG: A terminal-based Retrieval-Augmented Generation system using LangGraph. Features include message history caching, query transformation, and vector database retrieval. Ideal for NLP researchers and developers working on advanced conversational AI and information retrieval systems.

python natural-language-processing information-retrieval chatbot web-scraping nlp-machine-learning rag terminal-application pdf-processing vector-database openai-api langgraph

Updated Jul 13, 2024
Python

Inc44 / MaTools

Star

An all-in-one GUI management toolkit built with PyQt6, offering a suite of tools for file synchronization, media organization, PDF merging, code formatting, and more.

python rust productivity application gui qt ocr image-processing video-processing speech-recognition youtube-downloader file-management audio-processing pdf-processing code-formatting

Updated Mar 15, 2025
Python

DioCrafts / ai-book-summarizer

Star

📚 AI-Powered Book PDF Knowledge Extractor & Summarizer Transform your PDF books into structured knowledge effortlessly! This tool leverages AI to analyze books page by page, extracting key insights, definitions, and concepts, and organizes them into Markdown summaries for easier study

python markdown pdf machine-learning natural-language-processing automation ai text-analysis openai text-summarization document-analysis study-materials pymupdf knowledge-extraction pdf-processing book-summary educational-tools pdf-summarization ai-powered-tools

Updated Jan 2, 2025
Python

Yardenrsk / PsychometryReceiverCV

Star

A side project to easily get and annotate questions and answers to the PsychometryBot project DB using computer vision and pdf parsing

pandas opencv-python pdf-processing

Updated Sep 18, 2022
Python

Aleptonic / PdfSnipper

Star

PdfSnipper is a lightweight and efficient Python package designed to simplify the management of PDF files, pages, and their conversions during various NLP, Computer Vision (CV), or other data processing tasks. The package eliminates the need for repetitive code by providing intuitive, ready-to-use functions for common PDF-related operations.

utilities pdf-processing nlp-tools

Updated Feb 3, 2025
Python

thinhuos0913 / python_useful_mini_projects

Star

This is some useful mini projects that I had worked for self-learning Python programming.

python opencv ocr image-processing pdf-processing

Updated May 20, 2024
Python

Al-shwaib / Book-Preparation-for-Printing

Star

A web application for preparing books and magazines for offset printing. Automatically arranges PDF pages for commercial A3 printing, supporting both Arabic (RTL) and English (LTR) books. تطبيق ويب لتحضير الكتب والمجلات للطباعة على مطابع الأوفست. يقوم تلقائياً بترتيب صفحات PDF للطباعة التجارية على ورق A3، مع دعم الكتب العربية والإنجليزية.

flask-application pymupdf pdf-processing rtl-support offset-printing book-preparation arabic-books commercial-printing a3-printing order-to-print

Updated Jan 6, 2025
Python

arsath-eng / RAG1-NVIDIA-GENAI

Star

A powerful Retrieval Augmented Generation (RAG) application built with NVIDIA AI endpoints and Streamlit. This solution enables intelligent document analysis and question-answering using state-of-the-art language models, featuring multi-PDF processing, FAISS vector store integration, and advanced prompt engineering.

embeddings question-answering document-analysis faiss rag pdf-processing streamlit llm langchain vector-store nvidia-ai-faundry llama-models

Updated Oct 31, 2024
Python

rithulkamesh / docproc

Sponsor

Star

Opinionated and Sophisticated Document Region Analyzer.

python machine-learning ocr text-classification text-extraction data-extraction region-detection content-extraction document-analysis layout-analysis pdf-processing pdf-text-extraction document-parsing equation-detection mathematical-symbols

Updated Apr 13, 2025
Python

AkshayG999 / MistralOCR---AI-Powered-Document-Extraction

Star

MistralOCR is an open-source application that transforms documents into structured data using Mistral AI's OCR capabilities. Built with FastAPI and Streamlit, it provides an intuitive interface for extracting and processing text from PDFs and images, making document digitization effortless and accurate.

Updated Mar 11, 2025
Python

akshatpunia26 / berrylit_pdf_chat

Star

Berrylit is a simple chatbot interface that allows users to upload a PDF file and ask a question related to its contents. The chatbot uses the Berri API for processing.

python api natural-language-processing chatbot pdf-processing streamlit

Updated Jun 26, 2023
Python

FurqanHun / textnomnom-py

Star

Extract text from PDFs, PPTs, & URLs (with OCR support). Converts PPT to PDF & handles files or folders. 🦍

windows linux automation cross-platform document-conversion text-extraction pptx pdf-to-text ppt pdf-processing ppt-to-text pptx-to-text automated-conversion image-text-extraction

Updated Apr 14, 2025
Python

salameaz / pdf-process-rag

Star

A Python-based application that extracts and processes PDF content using a Retrieval-Augmented Generation (RAG) approach. Leverage vector embeddings to enable efficient querying of both text-based and scanned PDFs, and interact with your documents using a large language model.

python nlp machine-learning document-search rag pdf-processing streamlit vector-embeddings retrieval-augmented-generation

Updated Mar 27, 2025
Python

gs-ai / PDFProfessor

Star

PDF Professor 2.0 extracts and processes PDF text, analyzed by Ollama for summarization, data extraction, and insights. More coming soon!

python machine-learning natural-language-processing text-extraction data-extraction document-processing pdf-processing ollama ai-analysis

Updated Apr 17, 2025
Python

omritriki / BIU-Points-Calculator

Star

A web application for calculating credit points and GPA from PDF transcripts. Built with FastAPI and pdfplumber, this tool simplifies the process for BIU engineering students.

css python html api education render web-application biu fastapi pdf-processing gpa-calculation-tool university-tools credit-points

Updated Apr 14, 2025
Python

Mateusz2734 / pdf-cli

Star

CLI tool to merge, compress, extract or delete pages from PDF

python cli pdf pdf-processing pdf-tool

Updated Oct 28, 2023
Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf-processing

Here are 55 public repositories matching this topic...

allenai / papermage

ahmedkhemiri95 / PDFs-TextExtract

aws-samples / document-processing-pipeline-for-regulated-industries

Govind-S-B / pdf-to-text-chroma-search

ranguy9304 / LangGraphRAG

Inc44 / MaTools

DioCrafts / ai-book-summarizer

Yardenrsk / PsychometryReceiverCV

Aleptonic / PdfSnipper

thinhuos0913 / python_useful_mini_projects

Al-shwaib / Book-Preparation-for-Printing

arsath-eng / RAG1-NVIDIA-GENAI

rithulkamesh / docproc

AkshayG999 / MistralOCR---AI-Powered-Document-Extraction

akshatpunia26 / berrylit_pdf_chat

FurqanHun / textnomnom-py

salameaz / pdf-process-rag

gs-ai / PDFProfessor

omritriki / BIU-Points-Calculator

Mateusz2734 / pdf-cli

Improve this page

Add this topic to your repo