# WebFind A retro styled search engine ## Features - **Web Search**: BM25-based search algorithm for finding websites - **News Feed**: Location-based news aggregation from RSS feeds - **Weather**: Real-time weather data based on user's IP location - **Stock Market**: Live stock quotes for major indices (DOW, NASDAQ, S&P 500) ## Project Structure ``` Search Engine/ ├── front/ # Frontend files │ ├── index.html # Main search page │ ├── index.js # Main JavaScript functionality │ ├── news.html # News page │ ├── news.js # News page JavaScript │ └── news.css # News page styling ├── scripts/ # Build and processing scripts │ ├── build_bm25_index.js # BM25 search index builder │ ├── build_line_index.js # Line index builder for document retrieval │ └── build_index.js # Basic search index builder ├── tools/ # Data processing tools │ ├── pythonhelpers/ # Python data processing scripts │ └── domain_meta_fetcher/ # Rust domain metadata fetcher ├── data/ # Static data files │ ├── GeoLite2-City.mmdb # MaxMind GeoIP database │ ├── geoCOPYRIGHT.txt # GeoIP database copyright │ └── geoLICENSE.txt # GeoIP database license ├── server.js # Main Express server ├── package.json # Node.js dependencies ├── package-lock.json # Node.js dependency lock file ├── README.md # Project documentation ├── bm25_index.json # BM25 search index ├── line_index.json # Document line index ├── domain_to_docId.json # Domain to document ID mapping ├── promoted_sites.json # Promoted search results ├── news_feeds.json # RSS feed configurations ├── domains_meta.jsonl # Main document database ├── sites.jsonl # Additional site data └── top10milliondomains.csv # Source domain list ``` ## Prerequisites - Node.js (v16 or higher) - Python 3.7+ (for data processing tools) - Rust (for domain metadata fetcher) ## Installation 1. **Clone the repository** ```bash git clone cd Search-Engine ``` 2. **Install Node.js dependencies** ```bash npm install ``` 3. **Install Python dependencies** (optional, for data processing) ```bash cd tools/pythonhelpers pip install -r requirements.txt ``` 4. **Install Rust dependencies** (optional, for domain metadata fetcher) ```bash cd tools/domain_meta_fetcher cargo build ``` ## Usage ### Running the Search Engine 1. **Start the server** ```bash npm start ``` 2. **Access the application** - Open your browser and navigate to `http://localhost:30009` ### Building Search Indexes If you need to rebuild the search indexes: 1. **Build BM25 index** ```bash node scripts/build_bm25_index.js ``` 2. **Build line index** ```bash node scripts/build_line_index.js ``` 3. **Build basic search index** ```bash node scripts/build_index.js ``` ### Data Processing For processing new domain data: 1. **Fetch domain metadata** (requires Rust) ```bash cd tools/domain_meta_fetcher cargo run ``` **or Use python** (requires Python) ```bash cd tools/pythonhelpers python getdata.py ``` ## API Endpoints ### Search - `GET /api/search?q=` - Search for websites - Returns: Array of search results with domain, title, and description ### News - `GET /api/news?for_home_page=true` - Get news for home page (5 articles) - `GET /api/news` - Get full news feed (15 articles) - Returns: Array of news articles with title, link, and content ### Weather - `GET /api/weather` - Get weather data based on IP location - Returns: Weather information including temperature and wind speed ### Stocks - `GET /api/stocks` - Get stock market data - Returns: DOW, NASDAQ, and S&P 500 stocks with changes ## Configuration ### News Feeds Edit `news_feeds.json` to configure RSS feeds for different countries: ```json { "US": ["https://rss.cnn.com/rss/edition.rss"], "GB": ["https://feeds.bbci.co.uk/news/rss.xml"] } ``` ## Data Files - **domains_meta.jsonl**: Main document database (2.2GB) - **bm25_index.json**: BM25 search index (145MB) - **line_index.json**: Document line offsets (7.1MB) - **domain_to_docId.json**: Domain mapping (13MB) - **top10milliondomains.csv**: Source domain list (357MB) ## Technologies Used - **Backend**: Node.js, Express.js - **Search**: BM25 algorithm implementation - **Frontend**: Vanilla JavaScript, HTML5, CSS3 - **Data Processing**: Python, Rust - **APIs**: Yahoo Finance, Open-Meteo, RSS feeds - **Geolocation**: MaxMind GeoIP database ## Performance - Search results returned in <100ms - Supports up to 100 search results per query - Efficient document retrieval using line indexing - BM25 ranking for relevant search results ## License just free to use, no license required just do what you want with it no restrictions or credits required. enjoy !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! ## Acknowledgments - News feed data from [news-feed-list-of-countries](https://github.com/yavuz/news-feed-list-of-countries) - GeoIP data from MaxMind - Stock data from Yahoo Finance - Weather data from Open-Meteo - Listing of top domains from [top10milliondomains](https://www.kaggle.com/datasets/joebeachcapital/top-10-million-websites-2023) - prob forgot more be added if i remember. - thanks for you for reading this far. thanks:)