# ๐Ÿค– AskMyPDF - AI-Powered PDF Chat Application [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Node.js](https://img.shields.io/badge/Node.js-18+-green.svg)](https://nodejs.org/) [![React](https://img.shields.io/badge/React-18+-blue.svg)](https://reactjs.org/) [![TypeScript](https://img.shields.io/badge/TypeScript-5+-blue.svg)](https://www.typescriptlang.org/) [![MongoDB](https://img.shields.io/badge/MongoDB-6+-green.svg)](https://mongodb.com/) [![Qdrant](https://img.shields.io/badge/Qdrant-Vector%20DB-orange.svg)](https://qdrant.tech/) [![Alchemyst AI](https://img.shields.io/badge/Alchemyst%20AI-Primary%20Engine-purple.svg)](https://alchemyst.ai/) [![Gemini AI](https://img.shields.io/badge/Gemini%20AI-Fallback-blue.svg)](https://gemini.google.com/) [![Framer Motion](https://img.shields.io/badge/Framer%20Motion-Animations-ff69b4.svg)](https://www.framer.com/motion/) [![Tailwind CSS](https://img.shields.io/badge/Tailwind%20CSS-3+-teal.svg)](https://tailwindcss.com/) > ๐Ÿš€ **A cutting-edge full-stack MERN application with TypeScript that revolutionizes PDF interaction through intelligent conversations. Features an advanced dual AI engine architecture (Alchemyst AI with dynamic workflow planning + Google Gemini fallback), sophisticated RAG pipeline, Qdrant vector database, and a stunning glassmorphism UI with real-time streaming responses.** ## ๐ŸŒŸ Features ### ๐Ÿ“„ **Advanced Document Processing** - **Multi-format Support**: PDF, DOCX, images (JPG, PNG, GIF), and audio files (MP3, WAV, M4A) - **Multimodal Processing**: Unified processing pipeline for text, visual, and audio content - **Intelligent Chunking**: Multiple strategies (sentence, paragraph, semantic, hybrid) - **OCR Integration**: Text extraction from images using Google Gemini Vision - **Speech-to-Text**: Audio transcription with timestamp tracking - **Vector Embeddings**: Google Gemini text-embedding-004 for semantic search - **Progress Tracking**: Real-time document processing with status updates - **Metadata Extraction**: Comprehensive document analysis including page count, word count - **Language Detection**: Automatic text language identification - **Cross-Modal Linking**: Intelligent connections between different content types ### ๐Ÿง  **Dual AI Engine Architecture with Advanced RAG** - **Primary Engine**: Alchemyst AI with dynamic workflow planning and Context Lake integration - **Fallback Engine**: Google Gemini 2.0 Flash for maximum reliability - **Intelligent Switching**: Automatic failover with real-time health monitoring - **Advanced RAG Pipeline**: Enhanced Retrieval-Augmented Generation system - **Multimodal RAG**: Cross-modal semantic search and retrieval - **Query Expansion**: AI-powered query enhancement for better retrieval - **Context-Aware Responses**: Maintains conversation history and document context - **Streaming Responses**: Real-time token streaming for instant feedback - **Engine Status Monitoring**: Live tracking via `/api/chat/engine-status` endpoint ### ๐Ÿ” **Enhanced Vector Search & Database** - **Qdrant Integration**: High-performance vector database with collection management - **Hybrid Search**: Semantic + keyword search with advanced filtering - **Batch Operations**: Efficient bulk vector operations - **Real-time Analytics**: Comprehensive search result evaluation - **Document-Specific Filtering**: Precise vector search within documents - **Cross-Modal Search**: Search across text, image, and audio content simultaneously - **Caching Layer**: Optimized performance with intelligent caching ### ๐ŸŽจ **Modern Glassmorphism UI/UX** - **TypeScript React Frontend**: Type-safe React 18 with Vite build system - **Glassmorphism Design**: Beautiful backdrop-blur effects and transparency - **Framer Motion Animations**: Smooth, engaging micro-interactions - **Dark Mode Support**: Adaptive theming with seamless transitions - **Responsive Design**: Mobile-first approach with Tailwind CSS - **Real-time Chat Interface**: Markdown support with syntax highlighting - **Animated Components**: Custom animated buttons and loading states - **Floating UI Elements**: Dynamic background animations - **Multimodal Interface**: Specialized UI for different content types - **Citation System**: Transparent source attribution with modality indicators ### ๐Ÿ”’ **Enterprise-Grade Security & Authentication** - **JWT Authentication**: Secure token-based user session management - **User Tiers**: Free (50 messages/month) and Premium (1000 messages/month) - **Rate Limiting**: API protection against abuse and spam - **Input Validation**: Comprehensive data sanitization with express-validator - **Password Security**: bcrypt hashing with salt rounds - **CORS Protection**: Configurable cross-origin request security - **Error Handling**: Secure error responses without sensitive data exposure ### ๐Ÿ“Š **Advanced Analytics & Monitoring** - **Processing Metrics**: Detailed performance analytics for all operations - **AI Engine Health**: Real-time monitoring of both Alchemyst and Gemini engines - **Token Usage Tracking**: Comprehensive usage analytics per user - **Response Time Monitoring**: Performance metrics for optimization - **Error Tracking**: Detailed error logging and reporting - **User Usage Statistics**: Monthly usage tracking and limits - **Multimodal Analytics**: Processing metrics across different content types ## ๐Ÿ—๏ธ Architecture Overview ```mermaid graph TB subgraph "Frontend Layer" A[React/TypeScript App] --> B[Authentication Context] A --> C[Glassmorphism Dashboard] A --> D[Real-time Chat Interface] A --> E[Animated File Upload] A --> F[Dark Mode Theme] A --> G[Multimodal Dashboard] A --> H[Cross-Modal Search] end subgraph "Backend API Layer" I[Express.js Server] --> J[JWT Auth Middleware] I --> K[Rate Limiting] I --> L[Error Handling] I --> M[Input Validation] end subgraph "Route Handlers" N[Auth Routes] --> O[User Management] P[PDF Routes] --> Q[Document Processing] R[Chat Routes] --> S[Dual AI RAG Pipeline] R --> T[Engine Status Monitor] U[Multimodal Routes] --> V[Cross-Format Processing] W[Multimodal Chat] --> X[Cross-Modal RAG] end subgraph "Dual AI Engine System" Y[Alchemyst AI Primary] --> Z[Dynamic Workflow Planning] Y --> AA[Context Lake Integration] BB[Google Gemini Fallback] --> CC[Reliable Processing] DD[Engine Health Monitor] --> EE[Automatic Failover] end subgraph "Enhanced RAG Pipeline" FF[Query Preprocessing] --> GG[Query Expansion] GG --> HH[Hybrid Retrieval] HH --> II[Context Ranking] II --> JJ[Response Generation] JJ --> KK[Streaming Output] LL[Cross-Modal Retrieval] --> MM[Modality Fusion] end subgraph "Vector Database Layer" NN[Qdrant Vector DB] --> OO[Semantic Search] NN --> PP[Document Filtering] NN --> QQ[Batch Operations] NN --> RR[Collection Management] end subgraph "Multimodal Processing" SS[Image Analysis] --> TT[OCR Extraction] SS --> UU[Visual Description] VV[Audio Processing] --> WW[Speech-to-Text] VV --> XX[Timestamp Tracking] YY[Cross-Modal Links] --> ZZ[Semantic Connections] end subgraph "Data Layer" AAA[MongoDB] --> BBB[User Profiles] AAA --> CCC[Document Metadata] AAA --> DDD[Chat History] AAA --> EEE[Usage Analytics] AAA --> FFF[Multimodal Documents] end A --> I S --> Y S --> BB HH --> NN JJ --> AAA DD --> Y DD --> BB V --> SS V --> VV X --> LL ``` ## ๐Ÿ“Š System Flow ```mermaid sequenceDiagram participant U as User participant F as Frontend (React/TS) participant B as Backend (Express) participant Q as Qdrant Vector DB participant A as Alchemyst AI Engine participant G as Gemini AI Engine participant M as MongoDB U->>F: Upload multimodal document F->>B: POST /api/multimodal/upload B->>B: Multimodal processing pipeline Note over B: Text extraction, OCR, Speech-to-Text B->>Q: Store vector embeddings B->>M: Save document metadata B->>F: Real-time progress updates F->>U: Processing complete notification U->>F: Initiate multimodal chat F->>B: POST /api/multimodal-chat/start B->>M: Create new chat session B->>F: Return chat instance U->>F: Send message query F->>B: POST /api/multimodal-chat/:id/message Note over B: Cross-modal query analysis B->>Q: Cross-modal vector search Q->>B: Return relevant chunks with scores alt Alchemyst AI Primary Engine B->>A: Multimodal RAG with Context Lake A-->>B: Streaming response with metadata Note over A: Dynamic workflow planning else Intelligent Fallback B->>G: Multimodal RAG processing G-->>B: Generated response Note over B: Automatic engine switching end B->>M: Save conversation history B->>F: Stream AI response with citations F->>U: Real-time response display ``` ## ๐Ÿš€ Quick Start ### Prerequisites - ๐Ÿ“ฆ Node.js 18+ - ๐Ÿƒ MongoDB 6+ - ๐Ÿ”ง Docker (optional, for Qdrant) - ๐Ÿ”‘ Google Gemini API key - ๐Ÿงช Alchemyst AI API key (optional, for enhanced features) - ๐ŸŽต FFmpeg (for audio processing) ### 1. Clone Repository ```bash git clone https://github.com/yourusername/askmypdf.git cd askmypdf ``` ### 2. Install Dependencies ```bash npm run install:all ``` ### 3. Setup Environment Variables **Backend (.env):** ```bash # Server Configuration PORT=5000 NODE_ENV=development FRONTEND_URL=http://localhost:5000 # Database MONGODB_URI=mongodb://localhost:27017/pdfchat # JWT JWT_SECRET=your-super-secret-jwt-key-here # Google Gemini API (Required - Fallback Engine) GEMINI_API_KEY=your-gemini-api-key-here # Alchemyst AI API (Optional - Primary Engine for Enhanced Features) ALCHEMYST_API_KEY=your-alchemyst-api-key-here ALCHEMYST_API_URL=https://platform-backend.getalchemystai.com/api/v1 # Qdrant Vector Database QDRANT_URL=http://localhost:6333 QDRANT_API_KEY=your-qdrant-api-key-here QDRANT_COLLECTION_NAME=multimodal_documents ``` **Frontend (.env):** ```bash VITE_API_URL=http://localhost:5000/api ``` ### 4. Start Qdrant Vector Database ```bash docker run -p 6333:6333 qdrant/qdrant ``` ### 5. Start Development Servers ```bash # Development mode (both frontend and backend) npm run dev # Or start separately npm run dev:backend npm run dev:frontend ``` ### 6. Production Build ```bash npm run build npm start ``` ## ๐Ÿ“ Project Structure ``` AskMyPDF/ โ”œโ”€โ”€ ๐Ÿ“ backend/ # Express.js backend โ”‚ โ”œโ”€โ”€ ๐Ÿ“ middleware/ # Authentication, error handling โ”‚ โ”œโ”€โ”€ ๐Ÿ“ models/ # MongoDB schemas โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ MultimodalDocument.js # Enhanced document model โ”‚ โ”œโ”€โ”€ ๐Ÿ“ routes/ # API route handlers โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ multimodal.js # Multimodal document routes โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“„ multimodalChat.js # Cross-modal chat routes โ”‚ โ”œโ”€โ”€ ๐Ÿ“ services/ # Business logic services โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ multimodalProcessor.js # Cross-format processing โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“„ multimodalRAGService.js # Cross-modal RAG โ”‚ โ””โ”€โ”€ ๐Ÿ“„ server.js # Express server entry point โ”œโ”€โ”€ ๐Ÿ“ frontend/ # React frontend โ”‚ โ”œโ”€โ”€ ๐Ÿ“ src/ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ components/ # Reusable React components โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“„ MultimodalFileUpload.tsx # Enhanced upload โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ contexts/ # React contexts โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“ pages/ # Route components โ”‚ โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ MultimodalDashboard.tsx # Multimodal interface โ”‚ โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“„ MultimodalChat.tsx # Cross-modal chat โ”‚ โ”‚ โ””โ”€โ”€ ๐Ÿ“ services/ # API service layer โ”‚ โ””โ”€โ”€ ๐Ÿ“„ vite.config.ts # Vite configuration โ”œโ”€โ”€ ๐Ÿ“„ package.json # Root package configuration โ””โ”€โ”€ ๐Ÿ“„ README.md # This file ``` ## ๐Ÿ› ๏ธ Technology Stack ### Frontend Technologies - **โš›๏ธ React 18**: Modern UI library with hooks - **๐Ÿ“˜ TypeScript**: Type-safe JavaScript - **๐ŸŽจ Tailwind CSS**: Utility-first CSS framework - **๐Ÿš€ Vite**: Fast development build tool - **๐Ÿ“ฑ React Router**: Client-side routing - **๐Ÿ”„ React Query**: Data fetching and caching - **๐Ÿ“‹ React Dropzone**: File upload handling - **๐Ÿ”ฅ React Hot Toast**: Notifications ### Backend Technologies - **๐ŸŸข Node.js**: JavaScript runtime - **โšก Express.js**: Web application framework - **๐Ÿƒ MongoDB**: NoSQL database with Mongoose ODM - **๐Ÿ” JWT**: JSON Web Tokens for authentication - **๐Ÿ›ก๏ธ bcrypt**: Password hashing - **๐Ÿ“Š Multer**: File upload middleware - **๐Ÿšฆ Rate Limiting**: API protection ### AI/ML Technologies - **๐Ÿค– Alchemyst AI**: Primary AI engine with dynamic workflow planning and Context Lake - **๐Ÿค– Google Gemini**: Fallback AI model for embeddings and generation - **๐Ÿ‘๏ธ Gemini Vision**: Image analysis and OCR capabilities - **๐ŸŽต Whisper**: Speech-to-text transcription (integration ready) - **๐Ÿ” Qdrant**: Vector database for semantic search - **๐Ÿ“„ PDF-Parse**: PDF text extraction - **๐Ÿ“„ Mammoth**: DOCX document processing - **๐Ÿ–ผ๏ธ Sharp**: Image processing and optimization - **๐ŸŽต FFmpeg**: Audio format conversion and processing - **๐Ÿง  Enhanced RAG Pipeline**: Dual-engine Retrieval-Augmented Generation - **๐Ÿ”— Cross-Modal RAG**: Unified semantic search across content types - **โšก Intelligent Fallback**: Automatic engine switching for reliability - **๐Ÿ“Š Engine Monitoring**: Real-time AI engine health tracking ## ๐Ÿ”ง API Documentation ### Authentication Endpoints - `POST /api/auth/register` - User registration - `POST /api/auth/login` - User login - `GET /api/auth/me` - Get current user - `PUT /api/auth/profile` - Update user profile ### Document Endpoints - `POST /api/pdf/upload` - Upload PDF document - `POST /api/multimodal/upload` - Upload multimodal document (PDF, DOCX, images, audio) - `GET /api/pdf/documents` - List user documents - `GET /api/multimodal/documents` - List multimodal documents with filtering - `GET /api/pdf/documents/:id` - Get document details - `GET /api/multimodal/documents/:id` - Get multimodal document details - `DELETE /api/pdf/documents/:id` - Delete document - `DELETE /api/multimodal/documents/:id` - Delete multimodal document - `POST /api/multimodal/search` - Cross-modal search across documents ### Chat Endpoints - `POST /api/chat/start` - Start new chat session - `POST /api/multimodal-chat/start` - Start multimodal chat session - `POST /api/chat/:id/message` - Send message with dual AI engine support - `POST /api/multimodal-chat/:id/message` - Send message with cross-modal RAG - `GET /api/chat/:id` - Get chat history with metadata - `GET /api/multimodal-chat/:id` - Get multimodal chat with cross-modal info - `GET /api/chat/user/chats` - List user chat sessions - `DELETE /api/chat/:id` - Delete chat session - `POST /api/chat/:id/follow-up` - Generate AI-powered follow-up questions - `POST /api/multimodal-chat/:id/follow-up` - Generate cross-modal follow-up questions - `POST /api/chat/:id/evaluate` - Evaluate response quality metrics - `POST /api/chat/:id/search` - Advanced search within document using Qdrant - `POST /api/multimodal-chat/:id/search` - Cross-modal search within document - `GET /api/chat/engine-status` - **NEW**: Real-time AI engine status and health monitoring - `GET /api/multimodal-chat/:id/citations` - Get citations with modality information ## ๐Ÿ“ˆ Performance Optimizations ### Backend Optimizations - **๐Ÿš€ Vector Indexing**: Efficient similarity search with Qdrant - **๐Ÿงช Dual AI Engine**: Primary Alchemyst AI with Gemini fallback - **๐Ÿ”„ Intelligent Fallback**: Automatic engine switching for reliability - **๐Ÿ”— Cross-Modal Indexing**: Unified indexing across content types - **๐Ÿ“Š Database Indexing**: Optimized MongoDB queries - **๐Ÿ”„ Caching**: Response caching for frequently accessed data - **โšก Async Processing**: Background PDF processing - **๐ŸŽต Audio Optimization**: Efficient audio format conversion - **๐Ÿ–ผ๏ธ Image Optimization**: Smart image resizing and compression - **๐Ÿ“ฆ Compression**: Response compression middleware - **๐Ÿ” Hybrid Search**: Semantic + keyword search combination - **๐Ÿ”— Cross-Modal Search**: Simultaneous search across all modalities ## ๐Ÿงช Testing & Development ### Alchemyst AI Testing ```bash # Test Alchemyst AI integration cd backend npm run test:alchemyst # Test multimodal processing npm run test:multimodal ``` This will verify: - โœ… Service configuration and API key setup - โœ… Connection to Alchemyst AI platform - โœ… Health check and response generation - โœ… Fallback mechanism to Gemini AI - โœ… Multimodal document processing - โœ… Cross-modal search functionality - โœ… OCR and speech-to-text integration ### Development Scripts ```bash # Install all dependencies npm run install:all # Start development servers npm run dev # Start backend only npm run dev:backend # Start frontend only npm run dev:frontend # Run production build npm run build npm start ``` ### Performance Optimizations #### Backend Optimizations - **๐Ÿš€ Vector Indexing**: Efficient similarity search with Qdrant - **๐Ÿ”— Cross-Modal Indexing**: Unified semantic indexing - **๐Ÿ“Š Connection Pooling**: MongoDB connection optimization - **โšก Batch Processing**: Bulk operations for document processing - **๐ŸŽต Audio Processing**: Efficient transcription pipelines - **๐Ÿ–ผ๏ธ Image Processing**: Optimized OCR and visual analysis - **๐Ÿ”„ Caching**: Intelligent caching for frequently accessed data #### Frontend Optimizations - **๐ŸŽฏ Code Splitting**: Lazy loading of route components - **๐Ÿ”„ React Query**: Intelligent data caching and synchronization - **๐Ÿ“ฑ Responsive Design**: Mobile-first approach with optimized assets - **โšก Bundle Optimization**: Vite's efficient bundling with tree shaking - **๐ŸŽจ Multimodal UI**: Specialized interfaces for different content types - **๐Ÿ”— Cross-Modal Navigation**: Seamless switching between content types ## ๐Ÿ”’ Security Features - **๐Ÿ” JWT Authentication**: Secure token-based authentication with refresh tokens - **๐Ÿ›ก๏ธ Password Hashing**: bcrypt with configurable salt rounds - **๐Ÿšฆ Rate Limiting**: Advanced API protection against abuse and DDoS - **๐Ÿ” Input Validation**: Comprehensive Express validator middleware - **๐Ÿ›ก๏ธ Helmet**: Security headers middleware for XSS protection - **๐Ÿ”’ CORS**: Configurable cross-origin resource sharing - **๐Ÿ” API Key Security**: Secure handling of AI service credentials - **๐Ÿ“ File Type Validation**: Comprehensive file format verification - **๐Ÿ” Content Scanning**: Security checks for uploaded content ## ๐ŸŒ Environment Support - **๐Ÿ”ง Development**: Hot reloading, source maps, debug logging, AI engine monitoring - **๐Ÿญ Production**: Optimized builds, compression, security headers, error tracking - **๐Ÿณ Docker**: Container support with multi-stage builds for easy deployment - **โ˜๏ธ Cloud**: AWS, Google Cloud, Azure compatible with environment-specific configs - **๐Ÿ“Š Monitoring**: Real-time AI engine health monitoring and performance metrics - **๐Ÿ”— Multimodal Support**: Cross-platform compatibility for all content types ## ๐Ÿค Contributing 1. Fork the repository 2. Create a feature branch (`git checkout -b feature/amazing-feature`) 3. Commit your changes (`git commit -m 'Add some amazing feature'`) 4. Push to the branch (`git push origin feature/amazing-feature`) 5. Open a Pull Request ## ๐Ÿ“„ License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## ๐Ÿ™ Acknowledgments - **Alchemyst AI**: For providing advanced AI capabilities with dynamic workflow planning - **Google Gemini**: For reliable AI capabilities and embeddings - **Google Gemini Vision**: For advanced image analysis and OCR - **Qdrant**: For efficient vector database operations - **MongoDB**: For flexible document storage - **React Community**: For excellent documentation and tools - **FFmpeg**: For robust audio processing capabilities - **Sharp**: For efficient image processing ## ๐Ÿงช Testing & Development ### Backend Testing ```bash # Run Alchemyst AI integration test cd backend npm run test:alchemyst # Run multimodal processing tests npm run test:multimodal # Run all backend tests npm test ``` ### Engine Status Monitoring The application includes built-in AI engine monitoring: - Real-time health checks for both AI engines - Automatic fallback detection - Performance metrics tracking - Engine status API endpoint: `GET /api/chat/engine-status` - Cross-modal processing monitoring - Multimodal search performance metrics ## ๐Ÿ“ž Support & Contact For support or questions, feel free to reach out: - ๐Ÿ“ง **Email**: [sakshamsinghrathore1304@gmail.com](mailto:sakshamsinghrathore1304@gmail.com) - ๐Ÿ’ผ **LinkedIn**: [Saksham Singh Rathore](https://www.linkedin.com/in/saksham-singh-rathore1304/) - ๐Ÿ› **Issues**: Found a bug or have a feature request? Please [open an issue](https://github.com/saksham-1304/AskMyPDF/issues) on GitHub - ๐Ÿ’ก **Discussions**: Join the conversation in our [GitHub Discussions](https://github.com/saksham-1304/AskMyPDF/discussions) ---
๐Ÿš€ Built with โค๏ธ using modern web technologies