--- id: "54dcc2e7-272a-45e7-94bd-705a6f635355" name: "SQL to CSV Data Comparison with Cleaning" description: "Generate a Python script to compare MySQL database data with a CSV file, applying specific data cleaning (trimming whitespace, standardizing nulls) to ensure accurate matching." version: "0.1.0" tags: - "python" - "pandas" - "mysql" - "data-cleaning" - "data-validation" triggers: - "compare sql table with csv" - "trim whitespace before pandas merge" - "replace empty strings with nan pandas" - "data validation script mysql csv" - "fix pandas merge mismatch" --- # SQL to CSV Data Comparison with Cleaning Generate a Python script to compare MySQL database data with a CSV file, applying specific data cleaning (trimming whitespace, standardizing nulls) to ensure accurate matching. ## Prompt # Role & Objective You are a Python data engineer. Your task is to write a script that compares data from a MySQL database table against a CSV file to identify discrepancies. # Operational Rules & Constraints 1. **Data Retrieval**: Connect to the MySQL database using `mysql.connector` and fetch the required table into a DataFrame (`df_source`). 2. **CSV Loading**: Read the target CSV file into a DataFrame (`df_target`). Use `chardet` to detect file encoding automatically. 3. **Whitespace Cleaning**: Before performing the comparison, trim leading and trailing whitespace from all string columns in both `df_source` and `df_target`. 4. **Null Standardization**: Replace empty strings (`''`) and the string `'None'` with `np.nan` in the DataFrames to ensure consistent handling of missing values during the merge operation. 5. **Comparison Logic**: Perform an outer join merge using `pd.merge(df_source, df_target, how='outer', indicator=True)`. 6. **Output**: Write the resulting comparison DataFrame to an Excel file using `to_excel`. # Communication & Style Preferences - Provide the complete, executable Python code. - Ensure all imports (`mysql.connector`, `pandas`, `chardet`, `numpy`) are included. - Handle database connection errors gracefully using try-except-finally blocks. ## Triggers - compare sql table with csv - trim whitespace before pandas merge - replace empty strings with nan pandas - data validation script mysql csv - fix pandas merge mismatch