# asyncmy Benchmark Suite Comprehensive performance benchmarks for `asyncmy`, comparing against `mysqlclient`, `pymysql`, and `aiomysql`. ## Test Environment - **CPU:** Apple Mac Studio (M4 Max) - **Memory:** 64GB - **Python:** 3.14 - **MySQL:** 9.6.0 - **Test Data:** 100,000 rows with realistic schema ## Performance Summary | Test | Winner | asyncmy Rank | Notes | | ------------------------------------ | -------------- | ------------ | ------------------------------------- | | **Large Result Set** (50k rows) | mysqlclient | #2/4 | ~0.090s - 2x faster than aiomysql | | **Concurrent Queries** (50 queries) | Variable | #1-2/2 | Very close to aiomysql | | **Connection Pool** (2k queries) | 🏆 **asyncmy** | **#1/2** | Consistently 22-28% faster | | **Batch Insert** (10k rows) | Variable | #1-4/4 | Results vary significantly | ## Key Insights - ✅ **Connection Pool**: asyncmy consistently shows 22-28% better throughput than aiomysql - ✅ **Large Result Set**: asyncmy is 2x faster than pymysql/aiomysql, close to mysqlclient - ⚡ **Concurrent Queries**: asyncmy and aiomysql are comparable (within margin of error) - ⚡ **Batch Insert**: Results vary by run; all libraries perform similarly ## Recent Optimizations (v0.2.12) Five major performance improvements were implemented: 1. **Buffer Management** - Eliminated redundant memory copies in packet reading (fast path optimization) 2. **DateTime Parsing** - Replaced regex with fast string slicing for datetime/date/time conversions 3. **Row Parsing** - Pre-allocated lists and C-level indexing replacing Python list.append() 4. **Protocol Parsing** - Inlined `read_length_coded_string` with fast path for common cases (length < 251) 5. **Batch Operations** - Pre-process values and use list accumulation + join for executemany These optimizations resulted in: - **asyncmy consistently leads in connection pool benchmarks** (22-28% faster) - **Large result set parsing improved** - now 2x faster than aiomysql - **Close to mysqlclient performance** in data-intensive workloads ## Test Scenarios ### 1. Large Result Set (`large_resultset.py`) Tests the efficiency of fetching and processing large datasets in a single query. **What it measures:** - Packet reading efficiency (optimized buffer management) - Data parsing speed (optimized datetime conversion) - Memory efficiency **Test:** Fetch 50,000 rows with all column types (int, varchar, datetime, decimal, text) **Results (typical):** ```text 1. mysqlclient 0.085s (1.00x vs best) 2. asyncmy 0.090s (0.94x vs best) 3. pymysql 0.165s (0.52x vs best) 4. aiomysql 0.170s (0.50x vs best) ``` ### 2. Concurrent Queries (`concurrent.py`) Tests async libraries' ability to run multiple queries concurrently. **What it measures:** - Connection handling under concurrent load - Async I/O efficiency - Query interleaving capabilities **Test:** Execute 50 concurrent SELECT queries (each query gets its own connection) **Results (typical):** ```text 1. asyncmy/aiomysql ~0.015s - Results vary, both libraries perform similarly ``` **Note:** Synchronous libraries (mysqlclient, pymysql) cannot efficiently handle this scenario without threads. The difference between asyncmy and aiomysql is within margin of error. ### 3. Connection Pool (`pool.py`) Tests connection pool performance with concurrent query load. **What it measures:** - Connection acquisition/release efficiency - Connection reuse patterns - Pool management under concurrent access **Test:** Execute 2,000 queries using a connection pool (5-20 connections) **Results (typical):** ```text 1. asyncmy ~0.190s (1.00x vs best) ⭐ WINNER - ~10,500 queries/sec 2. aiomysql ~0.240s (0.78x vs best) - ~8,300 queries/sec ``` **Insight:** asyncmy's connection pool consistently shows **22-28% better throughput** than aiomysql across multiple runs. ### 4. Batch Insert (`batch_insert.py`) Tests bulk insert performance using `executemany()`. **What it measures:** - Batch operation efficiency - SQL statement building - Network/protocol overhead **Test:** Insert 10,000 rows using `executemany()` **Results (variable):** ```text Results vary significantly between runs. All libraries perform similarly, with rankings changing between mysqlclient, pymysql, asyncmy, and aiomysql. Typical throughput: 80,000-110,000 rows/sec for all libraries. ``` ## Running the Benchmarks ### Prerequisites 1. MySQL server running and accessible 2. Python ≥ 3.9 with required packages installed ### Setup ```bash # Set database connection parameters export MYSQL_HOST=localhost export MYSQL_USER=root export MYSQL_PASS=password # Optional: specify port (default: 3306) export MYSQL_PORT=3306 ``` ### Run All Benchmarks ```bash # Run complete benchmark suite python -m benchmark.run_all ``` This will: 1. Create test database and populate with 100,000 rows 2. Run all 4 benchmark tests 3. Generate a summary report 4. Clean up test data ### Run Individual Tests ```bash # Test large result set performance python -m benchmark.large_resultset # Test concurrent query performance python -m benchmark.concurrent # Test connection pool performance python -m benchmark.pool # Test batch insert performance python -m benchmark.batch_insert ``` ## Test Data Schema The benchmark uses a realistic table schema: ```sql CREATE TABLE benchmark_data ( id INT NOT NULL AUTO_INCREMENT, user_id INT NOT NULL, username VARCHAR(50) NOT NULL, email VARCHAR(100) NOT NULL, created_at DATETIME NOT NULL, updated_at DATETIME NOT NULL, score DECIMAL(10,2) NOT NULL, is_active TINYINT NOT NULL, data TEXT, PRIMARY KEY (id), INDEX idx_user_id (user_id), INDEX idx_created_at (created_at) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; ``` ## Benchmark Configuration Default settings (can be modified in `benchmark/__init__.py`): ```python ROW_COUNT = 100000 # Total test rows BATCH_SIZE = 10000 # Batch operation size CONCURRENT_COUNT = 50 # Number of concurrent operations ``` ## Performance Tips Based on benchmark results, asyncmy performs best when: 1. **Using connection pools** - asyncmy's pool implementation is highly optimized (22-28% faster than aiomysql) 2. **Fetching large result sets** - Use asyncmy for queries returning 1,000+ rows (2x faster than aiomysql) 3. **Processing datetime-heavy data** - Optimized datetime parsing provides significant gains 4. **Handling concurrent workloads** - Async architecture allows efficient I/O interleaving ## Comparison with Other Libraries ### vs mysqlclient (sync) - ✅ **Competitive** for large result sets (only 5-10% slower) - ✅ **Better** for concurrent workloads (asyncmy uses async I/O) - ⚡ **Similar** for batch inserts (results vary) ### vs pymysql (sync) - ✅ **2x faster** for large result sets - ✅ **Much better** for concurrent workloads - ✅ **Better** connection pool (asyncmy has native pool) ### vs aiomysql (async) - ✅ **2x faster** for large result sets (optimized parsing) - ✅ **22-28% faster** connection pool throughput (consistent) - ⚡ **Comparable** for concurrent queries (within margin of error) - ⚡ **Similar** for batch inserts (results vary) ## Contributing To add new benchmarks: 1. Create a new file: `benchmark/your_benchmark.py` 2. Implement test functions following the existing pattern 3. Add a `run_benchmark()` function that returns sorted results 4. Import and call your test in `benchmark/run_all.py` Example template: ```python """ Benchmark: Your Test Name Description of what this test measures. """ import asyncio import time from benchmark import connection_kwargs async def test_asyncmy(): # Your test implementation start = time.time() # ... test code ... elapsed = time.time() - start return elapsed def run_benchmark(): results = {} loop = asyncio.new_event_loop() # Run tests and collect results results['asyncmy'] = loop.run_until_complete(test_asyncmy()) # Return sorted results return sorted(results.items(), key=lambda x: x[1]) ``` ## License [Apache-2.0](../LICENSE)