{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Transformers installation\n", "! pip install transformers datasets\n", "# To install from source instead of the last release, comment the command above and uncomment the following one.\n", "# ! pip install git+https://github.com/huggingface/transformers.git" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Benchmarks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's take a look at how 🤗 Transformers models can be benchmarked, best practices, and already available benchmarks.\n", "\n", "A notebook explaining in more detail how to benchmark 🤗 Transformers models can be found [here](https://github.com/huggingface/notebooks/tree/master/examples/benchmark.ipynb)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## How to benchmark 🤗 Transformers models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The classes `PyTorchBenchmark`and `TensorFlowBenchmark`allow to flexibly benchmark 🤗 Transformers models. The benchmark classes allow us to measure the _peak memory usage_ and _required time_ for both _inference_ and _training_.\n", "\n", "\n", "\n", "Hereby, _inference_ is defined by a single forward pass, and _training_ is defined by a single forward pass and\n", "backward pass.\n", "\n", "\n", "\n", "The benchmark classes `PyTorchBenchmark`and `TensorFlowBenchmark`expect an object of type `PyTorchBenchmarkArguments`and\n", "`TensorFlowBenchmarkArguments` respectively, for instantiation. `PyTorchBenchmarkArguments`and `TensorFlowBenchmarkArguments`are data classes and contain all relevant configurations for their corresponding benchmark class. In the following example, it is shown how a BERT model of type _bert-base-cased_ can be benchmarked." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments\n", "\n", "args = PyTorchBenchmarkArguments(models=[\"bert-base-uncased\"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512])\n", "benchmark = PyTorchBenchmark(args)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here, three arguments are given to the benchmark argument data classes, namely `models`, `batch_sizes`, and\n", "`sequence_lengths`. The argument `models` is required and expects a `list` of model identifiers from the\n", "[model hub](https://huggingface.co/models) The `list` arguments `batch_sizes` and `sequence_lengths` define\n", "the size of the `input_ids` on which the model is benchmarked. There are many more parameters that can be configured\n", "via the benchmark argument data classes. For more detail on these one can either directly consult the files\n", "`src/transformers/benchmark/benchmark_args_utils.py`, `src/transformers/benchmark/benchmark_args.py` (for PyTorch)\n", "and `src/transformers/benchmark/benchmark_args_tf.py` (for Tensorflow). Alternatively, running the following shell\n", "commands from root will print out a descriptive list of all configurable parameters for PyTorch and Tensorflow\n", "respectively.\n", "\n", "```bash\n", "python examples/pytorch/benchmarking/run_benchmark.py --help\n", "\n", "```\n", "\n", "An instantiated benchmark object can then simply be run by calling `benchmark.run()`." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "==================== INFERENCE - SPEED - RESULT ====================\n", "--------------------------------------------------------------------------------\n", "Model Name Batch Size Seq Length Time in s \n", "--------------------------------------------------------------------------------\n", "bert-base-uncased 8 8 0.006 \n", "bert-base-uncased 8 32 0.006 \n", "bert-base-uncased 8 128 0.018 \n", "bert-base-uncased 8 512 0.088 \n", "--------------------------------------------------------------------------------\n", "\n", "==================== INFERENCE - MEMORY - RESULT ====================\n", "--------------------------------------------------------------------------------\n", "Model Name Batch Size Seq Length Memory in MB \n", "--------------------------------------------------------------------------------\n", "bert-base-uncased 8 8 1227\n", "bert-base-uncased 8 32 1281\n", "bert-base-uncased 8 128 1307\n", "bert-base-uncased 8 512 1539\n", "--------------------------------------------------------------------------------\n", "\n", "==================== ENVIRONMENT INFORMATION ====================\n", "\n", "- transformers_version: 2.11.0\n", "- framework: PyTorch\n", "- use_torchscript: False\n", "- framework_version: 1.4.0\n", "- python_version: 3.6.10\n", "- system: Linux\n", "- cpu: x86_64\n", "- architecture: 64bit\n", "- date: 2020-06-29\n", "- time: 08:58:43.371351\n", "- fp16: False\n", "- use_multiprocessing: True\n", "- only_pretrain_model: False\n", "- cpu_ram_mb: 32088\n", "- use_gpu: True\n", "- num_gpus: 1\n", "- gpu: TITAN RTX\n", "- gpu_ram_mb: 24217\n", "- gpu_power_watts: 280.0\n", "- gpu_performance_state: 2\n", "- use_tpu: False" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results = benchmark.run()\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the _time_ and the _required memory_ for _inference_ are benchmarked. In the example output above the first\n", "two sections show the result corresponding to _inference time_ and _inference memory_. In addition, all relevant\n", "information about the computing environment, _e.g._ the GPU type, the system, the library versions, etc... are printed\n", "out in the third section under _ENVIRONMENT INFORMATION_. This information can optionally be saved in a _.csv_ file\n", "when adding the argument `save_to_csv=True` to `PyTorchBenchmarkArguments`and\n", "`TensorFlowBenchmarkArguments`respectively. In this case, every section is saved in a separate\n", "_.csv_ file. The path to each _.csv_ file can optionally be defined via the argument data classes.\n", "\n", "Instead of benchmarking pre-trained models via their model identifier, _e.g._ `bert-base-uncased`, the user can\n", "alternatively benchmark an arbitrary configuration of any available model class. In this case, a `list` of\n", "configurations must be inserted with the benchmark args as follows." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "==================== INFERENCE - SPEED - RESULT ====================\n", "--------------------------------------------------------------------------------\n", "Model Name Batch Size Seq Length Time in s \n", "--------------------------------------------------------------------------------\n", "bert-base 8 128 0.006\n", "bert-base 8 512 0.006\n", "bert-base 8 128 0.018 \n", "bert-base 8 512 0.088 \n", "bert-384-hid 8 8 0.006 \n", "bert-384-hid 8 32 0.006 \n", "bert-384-hid 8 128 0.011 \n", "bert-384-hid 8 512 0.054 \n", "bert-6-lay 8 8 0.003 \n", "bert-6-lay 8 32 0.004 \n", "bert-6-lay 8 128 0.009 \n", "bert-6-lay 8 512 0.044\n", "--------------------------------------------------------------------------------\n", "\n", "==================== INFERENCE - MEMORY - RESULT ====================\n", "--------------------------------------------------------------------------------\n", "Model Name Batch Size Seq Length Memory in MB \n", "--------------------------------------------------------------------------------\n", "bert-base 8 8 1277\n", "bert-base 8 32 1281\n", "bert-base 8 128 1307 \n", "bert-base 8 512 1539 \n", "bert-384-hid 8 8 1005 \n", "bert-384-hid 8 32 1027 \n", "bert-384-hid 8 128 1035 \n", "bert-384-hid 8 512 1255 \n", "bert-6-lay 8 8 1097 \n", "bert-6-lay 8 32 1101 \n", "bert-6-lay 8 128 1127 \n", "bert-6-lay 8 512 1359\n", "--------------------------------------------------------------------------------\n", "\n", "==================== ENVIRONMENT INFORMATION ====================\n", "\n", "- transformers_version: 2.11.0\n", "- framework: PyTorch\n", "- use_torchscript: False\n", "- framework_version: 1.4.0\n", "- python_version: 3.6.10\n", "- system: Linux\n", "- cpu: x86_64\n", "- architecture: 64bit\n", "- date: 2020-06-29\n", "- time: 09:35:25.143267\n", "- fp16: False\n", "- use_multiprocessing: True\n", "- only_pretrain_model: False\n", "- cpu_ram_mb: 32088\n", "- use_gpu: True\n", "- num_gpus: 1\n", "- gpu: TITAN RTX\n", "- gpu_ram_mb: 24217\n", "- gpu_power_watts: 280.0\n", "- gpu_performance_state: 2\n", "- use_tpu: False" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from transformers import PyTorchBenchmark, PyTorchBenchmarkArguments, BertConfig\n", "\n", "args = PyTorchBenchmarkArguments(\n", " models=[\"bert-base\", \"bert-384-hid\", \"bert-6-lay\"], batch_sizes=[8], sequence_lengths=[8, 32, 128, 512]\n", ")\n", "config_base = BertConfig()\n", "config_384_hid = BertConfig(hidden_size=384)\n", "config_6_lay = BertConfig(num_hidden_layers=6)\n", "\n", "benchmark = PyTorchBenchmark(args, configs=[config_base, config_384_hid, config_6_lay])\n", "benchmark.run()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Again, _inference time_ and _required memory_ for _inference_ are measured, but this time for customized configurations\n", "of the `BertModel` class. This feature can especially be helpful when deciding for which configuration the model\n", "should be trained." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Benchmark best practices" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This section lists a couple of best practices one should be aware of when benchmarking a model.\n", "\n", "- Currently, only single device benchmarking is supported. When benchmarking on GPU, it is recommended that the user\n", " specifies on which device the code should be run by setting the `CUDA_VISIBLE_DEVICES` environment variable in the\n", " shell, _e.g._ `export CUDA_VISIBLE_DEVICES=0` before running the code.\n", "- The option `no_multi_processing` should only be set to `True` for testing and debugging. To ensure accurate\n", " memory measurement it is recommended to run each memory benchmark in a separate process by making sure\n", " `no_multi_processing` is set to `True`.\n", "- One should always state the environment information when sharing the results of a model benchmark. Results can vary\n", " heavily between different GPU devices, library versions, etc., so that benchmark results on their own are not very\n", " useful for the community." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sharing your benchmark" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Previously all available core models (10 at the time) have been benchmarked for _inference time_, across many different\n", "settings: using PyTorch, with and without TorchScript, using TensorFlow, with and without XLA. All of those tests were\n", "done across CPUs (except for TensorFlow XLA) and GPUs.\n", "\n", "The approach is detailed in the [following blogpost](https://medium.com/huggingface/benchmarking-transformers-pytorch-and-tensorflow-e2917fb891c2) and the results are\n", "available [here](https://docs.google.com/spreadsheets/d/1sryqufw2D0XlUH4sq3e9Wnxu5EAQkaohzrJbd5HdQ_w/edit?usp=sharing).\n", "\n", "With the new _benchmark_ tools, it is easier than ever to share your benchmark results with the community\n", "\n", "- [PyTorch Benchmarking Results](https://github.com/huggingface/transformers/tree/master/examples/pytorch/benchmarking/README.md).\n", "- [TensorFlow Benchmarking Results](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/benchmarking/README.md)." ] } ], "metadata": {}, "nbformat": 4, "nbformat_minor": 4 }