# Prompt Bypass Detector ## Overview The Prompt Bypass Detector is a Python tool designed to help developers detect and classify attempted bypasses of AI safety guardrails. It analyzes input prompts and model responses for suspicious patterns, aiding in identifying edge cases and improving model safety. ## Features - Detect potential bypass attempts in input prompts and model responses. - Classify inputs and responses as "safe" or "bypass." - Provide anomaly scores for both input and response. ## Installation Install the required dependencies using pip: ```bash pip install scikit-learn numpy ``` ## Usage Run the tool from the command line: ```bash python prompt_bypass_detector.py "" "" ``` Example: ```bash python prompt_bypass_detector.py "This is a test prompt." "This is a test response." ``` ## Testing To run the tests, install `pytest` and run the following command: ```bash pip install pytest pytest test_prompt_bypass_detector.py ``` ## Files - `prompt_bypass_detector.py`: The main tool for detecting prompt bypass attempts. - `test_prompt_bypass_detector.py`: Test cases for the tool. ## Notes - Ensure that the `bypass_detector_model.pkl` and `tfidf_vectorizer.pkl` files are present in the same directory as `prompt_bypass_detector.py`. - If these files are missing, the tool will return an error indicating their absence.