eventsdc-doc-poc
Discription

image
EventsDC Document POC A proof-of-concept application that ingests documents in multiple formats (PDF, Word, PowerPoint, Text), extracts and indexes their content, and enables users to query that information using both a chatbot and keyword search. ✨ Features Multi-format Document Ingestion: PDF, DOCX, PPTX, TXT with OCR support for scanned PDFs Intelligent Search: Keyword (BM25), Vector (Semantic), and Hybrid search methods Natural Language Chat: Q&A interface with source citations Batch Processing: Handle multiple documents efficiently Fast Performance: Sub-second response times for most queries Docker Ready: Containerized deployment with Docker Compose 🚀 Quick Start Prerequisites Python 3.11+ Tesseract OCR (for scanned PDFs) Git (for version control) Installation “`powershell 1. Clone or navigate to the project cd C:Usersyomseeventsdc-doc-poc 2. Create and activate virtual environment python -m venv .venv .venvScriptsactivate 3. Install dependencies pip install -r requirements.txt 4. Start the server uvicorn app.main:app –reload “` Access the Application API Documentation: https://127.0.0.1:8000/docs Streamlit UI: Run streamlit run ui.py in another terminal Health Check: https://127.0.0.1:8000/health 📖 Usage 1. Upload Documents “`bash POST /ingest Upload PDF, DOCX, PPTX, or TXT files “` 2. Search Content “`bash Keyword search GET /search/keyword?q=policy&k=5 Vector search (semantic) GET /search/vector?q=refund policy&k=5 Hybrid search (recommended) GET…Read More

Back to Main

Subscribe for the latest news: