# LLM Quantization Evaluator ## Overview The `llm_quantization_eval` tool evaluates the performance of quantized versions of large language models (LLMs), comparing them against full-precision models in terms of speed, memory usage, and accuracy. This helps developers understand the trade-offs of model quantization for their specific workloads. ## Features - Quantize models using INT8 or FP16 precision. - Evaluate model accuracy on a given dataset. - Benchmark model performance in terms of inference time and memory usage. ## Requirements - Python 3.8+ - Required Python packages: - `torch` - `transformers` - `datasets` - `pytest` (for testing) Install the required packages using: ```bash pip install torch transformers datasets pytest ``` ## Usage Run the tool from the command line: ```bash python llm_quantization_eval.py --model --quantization --dataset --device ``` ### Example ```bash python llm_quantization_eval.py --model gpt2 --quantization INT8 --dataset squad --device cpu ``` ## Testing To run the tests, use: ```bash pytest test_llm_quantization_eval.py ``` The tests include: - Verifying the quantization methods. - Testing the evaluation of model accuracy with mocked data. - Testing the benchmarking of model performance with mocked data.