{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "view-in-github", "colab_type": "text" }, "source": [ "\"Open" ] }, { "cell_type": "markdown", "metadata": { "id": "pCpbQuaxDY7H" }, "source": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "id": "97yoSiRvDY7G" }, "source": [ "# Finetuning falcon-40b\n", "\n", "- Axolotl+QLoRA\n", "- minotaur datasets\n", "- deepspeed ZeRO 3 8xGPU\n", "\n", "Entry script:\n", "\n", "https://github.com/utensil/llm-playground/blob/main/scripts/entry/ax_lite.sh\n", "\n", "Mount volume disk to `/content`\n", "\n", "Save to:\n", "\n", "- repo: `utensil/llm-playground`\n", "- path: `notebooks/axolotl/runpod/axolotl-falcon-40b-qlora-deepspeed.ipynb`\n" ] }, { "cell_type": "markdown", "metadata": { "id": "uZd-cf70HM2n" }, "source": [ "## Prepare" ] }, { "cell_type": "markdown", "metadata": { "id": "UjFhNeLIshBM" }, "source": [ "### Setup notify" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "KNM9aJoto7ZD", "scrolled": true, "outputId": "49e9a166-2ab2-4248-d54f-687a3b64333d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Collecting git+https://github.com/cphyc/jupyter-notify.git\n", " Cloning https://github.com/cphyc/jupyter-notify.git to /tmp/pip-req-build-zu3hp4aa\n", " Running command git clone --filter=blob:none --quiet https://github.com/cphyc/jupyter-notify.git /tmp/pip-req-build-zu3hp4aa\n", " Resolved https://github.com/cphyc/jupyter-notify.git to commit 8cff958cbd3f00f7e4eb59b457f9f915e2ddff37\n", " Preparing metadata (setup.py) ... \u001b[?25ldone\n", "\u001b[?25hRequirement already satisfied: ipython in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyternotify==0.1.15) (8.14.0)\n", "Requirement already satisfied: jupyter in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyternotify==0.1.15) (1.0.0)\n", "Requirement already satisfied: backcall in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipython->jupyternotify==0.1.15) (0.2.0)\n", "Requirement already satisfied: decorator in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipython->jupyternotify==0.1.15) (5.1.1)\n", "Requirement already satisfied: jedi>=0.16 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipython->jupyternotify==0.1.15) (0.18.2)\n", "Requirement already satisfied: matplotlib-inline in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipython->jupyternotify==0.1.15) (0.1.6)\n", "Requirement already satisfied: pickleshare in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipython->jupyternotify==0.1.15) (0.7.5)\n", "Requirement already satisfied: prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipython->jupyternotify==0.1.15) (3.0.38)\n", "Requirement already satisfied: pygments>=2.4.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipython->jupyternotify==0.1.15) (2.15.1)\n", "Requirement already satisfied: stack-data in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipython->jupyternotify==0.1.15) (0.6.2)\n", "Requirement already satisfied: traitlets>=5 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipython->jupyternotify==0.1.15) (5.9.0)\n", "Requirement already satisfied: typing-extensions in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipython->jupyternotify==0.1.15) (4.6.3)\n", "Requirement already satisfied: pexpect>4.3 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipython->jupyternotify==0.1.15) (4.8.0)\n", "Requirement already satisfied: notebook in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter->jupyternotify==0.1.15) (6.5.4)\n", "Requirement already satisfied: qtconsole in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter->jupyternotify==0.1.15) (5.4.3)\n", "Requirement already satisfied: jupyter-console in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter->jupyternotify==0.1.15) (6.6.3)\n", "Requirement already satisfied: nbconvert in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter->jupyternotify==0.1.15) (7.5.0)\n", "Requirement already satisfied: ipykernel in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter->jupyternotify==0.1.15) (6.23.2)\n", "Requirement already satisfied: ipywidgets in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter->jupyternotify==0.1.15) (8.0.6)\n", "Requirement already satisfied: parso<0.9.0,>=0.8.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jedi>=0.16->ipython->jupyternotify==0.1.15) (0.8.3)\n", "Requirement already satisfied: ptyprocess>=0.5 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from pexpect>4.3->ipython->jupyternotify==0.1.15) (0.7.0)\n", "Requirement already satisfied: wcwidth in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from prompt-toolkit!=3.0.37,<3.1.0,>=3.0.30->ipython->jupyternotify==0.1.15) (0.2.6)\n", "Requirement already satisfied: comm>=0.1.1 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipykernel->jupyter->jupyternotify==0.1.15) (0.1.3)\n", "Requirement already satisfied: debugpy>=1.6.5 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipykernel->jupyter->jupyternotify==0.1.15) (1.6.7)\n", "Requirement already satisfied: jupyter-client>=6.1.12 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipykernel->jupyter->jupyternotify==0.1.15) (8.2.0)\n", "Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipykernel->jupyter->jupyternotify==0.1.15) (5.3.1)\n", "Requirement already satisfied: nest-asyncio in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipykernel->jupyter->jupyternotify==0.1.15) (1.5.6)\n", "Requirement already satisfied: packaging in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipykernel->jupyter->jupyternotify==0.1.15) (23.1)\n", "Requirement already satisfied: psutil in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipykernel->jupyter->jupyternotify==0.1.15) (5.9.5)\n", "Requirement already satisfied: pyzmq>=20 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipykernel->jupyter->jupyternotify==0.1.15) (25.1.0)\n", "Requirement already satisfied: tornado>=6.1 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipykernel->jupyter->jupyternotify==0.1.15) (6.3.2)\n", "Requirement already satisfied: widgetsnbextension~=4.0.7 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipywidgets->jupyter->jupyternotify==0.1.15) (4.0.7)\n", "Requirement already satisfied: jupyterlab-widgets~=3.0.7 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from ipywidgets->jupyter->jupyternotify==0.1.15) (3.0.7)\n", "Requirement already satisfied: beautifulsoup4 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (4.12.2)\n", "Requirement already satisfied: bleach!=5.0.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (6.0.0)\n", "Requirement already satisfied: defusedxml in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (0.7.1)\n", "Requirement already satisfied: importlib-metadata>=3.6 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (6.6.0)\n", "Requirement already satisfied: jinja2>=3.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (3.1.2)\n", "Requirement already satisfied: jupyterlab-pygments in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (0.2.2)\n", "Requirement already satisfied: markupsafe>=2.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (2.1.3)\n", "Requirement already satisfied: mistune<3,>=2.0.3 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (2.0.5)\n", "Requirement already satisfied: nbclient>=0.5.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (0.8.0)\n", "Requirement already satisfied: nbformat>=5.7 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (5.9.0)\n", "Requirement already satisfied: pandocfilters>=1.4.1 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (1.5.0)\n", "Requirement already satisfied: tinycss2 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbconvert->jupyter->jupyternotify==0.1.15) (1.2.1)\n", "Requirement already satisfied: argon2-cffi in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from notebook->jupyter->jupyternotify==0.1.15) (21.3.0)\n", "Requirement already satisfied: ipython-genutils in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from notebook->jupyter->jupyternotify==0.1.15) (0.2.0)\n", "Requirement already satisfied: Send2Trash>=1.8.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from notebook->jupyter->jupyternotify==0.1.15) (1.8.2)\n", "Requirement already satisfied: terminado>=0.8.3 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from notebook->jupyter->jupyternotify==0.1.15) (0.17.1)\n", "Requirement already satisfied: prometheus-client in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from notebook->jupyter->jupyternotify==0.1.15) (0.17.0)\n", "Requirement already satisfied: nbclassic>=0.4.7 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from notebook->jupyter->jupyternotify==0.1.15) (1.0.0)\n", "Requirement already satisfied: qtpy>=2.0.1 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from qtconsole->jupyter->jupyternotify==0.1.15) (2.3.1)\n", "Requirement already satisfied: executing>=1.2.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from stack-data->ipython->jupyternotify==0.1.15) (1.2.0)\n", "Requirement already satisfied: asttokens>=2.1.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from stack-data->ipython->jupyternotify==0.1.15) (2.2.1)\n", "Requirement already satisfied: pure-eval in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from stack-data->ipython->jupyternotify==0.1.15) (0.2.2)\n", "Requirement already satisfied: six in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from asttokens>=2.1.0->stack-data->ipython->jupyternotify==0.1.15) (1.16.0)\n", "Requirement already satisfied: webencodings in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from bleach!=5.0.0->nbconvert->jupyter->jupyternotify==0.1.15) (0.5.1)\n", "Requirement already satisfied: zipp>=0.5 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from importlib-metadata>=3.6->nbconvert->jupyter->jupyternotify==0.1.15) (3.15.0)\n", "Requirement already satisfied: python-dateutil>=2.8.2 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter-client>=6.1.12->ipykernel->jupyter->jupyternotify==0.1.15) (2.8.2)\n", "Requirement already satisfied: platformdirs>=2.5 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter-core!=5.0.*,>=4.12->ipykernel->jupyter->jupyternotify==0.1.15) (3.5.3)\n", "Requirement already satisfied: jupyter-server>=1.8 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (2.6.0)\n", "Requirement already satisfied: notebook-shim>=0.2.3 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (0.2.3)\n", "Requirement already satisfied: fastjsonschema in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbformat>=5.7->nbconvert->jupyter->jupyternotify==0.1.15) (2.17.1)\n", "Requirement already satisfied: jsonschema>=2.6 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from nbformat>=5.7->nbconvert->jupyter->jupyternotify==0.1.15) (4.17.3)\n", "Requirement already satisfied: argon2-cffi-bindings in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from argon2-cffi->notebook->jupyter->jupyternotify==0.1.15) (21.2.0)\n", "Requirement already satisfied: soupsieve>1.2 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from beautifulsoup4->nbconvert->jupyter->jupyternotify==0.1.15) (2.4.1)\n", "Requirement already satisfied: attrs>=17.4.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert->jupyter->jupyternotify==0.1.15) (23.1.0)\n", "Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert->jupyter->jupyternotify==0.1.15) (0.19.3)\n", "Requirement already satisfied: anyio>=3.1.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (3.7.0)\n", "Requirement already satisfied: jupyter-events>=0.6.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (0.6.3)\n", "Requirement already satisfied: jupyter-server-terminals in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (0.4.4)\n", "Requirement already satisfied: overrides in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (7.3.1)\n", "Requirement already satisfied: websocket-client in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (1.5.3)\n", "Requirement already satisfied: cffi>=1.0.1 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from argon2-cffi-bindings->argon2-cffi->notebook->jupyter->jupyternotify==0.1.15) (1.15.1)\n", "Requirement already satisfied: idna>=2.8 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from anyio>=3.1.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (3.4)\n", "Requirement already satisfied: sniffio>=1.1 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from anyio>=3.1.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (1.3.0)\n", "Requirement already satisfied: exceptiongroup in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from anyio>=3.1.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (1.1.1)\n", "Requirement already satisfied: pycparser in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi->notebook->jupyter->jupyternotify==0.1.15) (2.21)\n", "Requirement already satisfied: python-json-logger>=2.0.4 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter-events>=0.6.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (2.0.7)\n", "Requirement already satisfied: pyyaml>=5.3 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter-events>=0.6.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (6.0)\n", "Requirement already satisfied: rfc3339-validator in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter-events>=0.6.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (0.1.4)\n", "Requirement already satisfied: rfc3986-validator>=0.1.1 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jupyter-events>=0.6.0->jupyter-server>=1.8->nbclassic>=0.4.7->notebook->jupyter->jupyternotify==0.1.15) (0.1.1)\n", "Requirement already satisfied: fqdn in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert->jupyter->jupyternotify==0.1.15) (1.5.1)\n", "Requirement already satisfied: isoduration in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert->jupyter->jupyternotify==0.1.15) (20.11.0)\n", "Requirement already satisfied: jsonpointer>1.13 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert->jupyter->jupyternotify==0.1.15) (2.3)\n", "Requirement already satisfied: uri-template in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert->jupyter->jupyternotify==0.1.15) (1.2.0)\n", "Requirement already satisfied: webcolors>=1.11 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jsonschema>=2.6->nbformat>=5.7->nbconvert->jupyter->jupyternotify==0.1.15) (1.13)\n", "Requirement already satisfied: arrow>=0.15.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from isoduration->jsonschema>=2.6->nbformat>=5.7->nbconvert->jupyter->jupyternotify==0.1.15) (1.2.3)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m" ] } ], "source": [ "!pip install git+https://github.com/cphyc/jupyter-notify.git" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "7j0Wga8io7ZD", "outputId": "29fb626b-9e53-4085-adfd-dd380f744901" }, "outputs": [ { "data": { "application/javascript": [ "if (!(\"Notification\" in window)) {\n", " alert(\"This browser does not support desktop notifications, so the %%notify magic will not work.\");\n", "} else if (Notification.permission !== 'granted' && Notification.permission !== 'denied') {\n", " Notification.requestPermission(function (permission) {\n", " if(!('permission' in Notification)) {\n", " Notification.permission = permission;\n", " }\n", " })\n", "}\n", "\n", "if(!window.jQuery) {\n", " var jq = document.createElement('script');\n", " jq.src = \"//ajax.googleapis.com/ajax/libs/jquery/2.1.4/jquery.min.js\";\n", " document.getElementsByTagName('head')[0].appendChild(jq);\n", "}\n", "\n", "// Detect if the window is out of focus.\n", "window.jupyterNotifyIsInBackground = undefined;\n", "(function() {\n", " // Check document.hidden support\n", " var hidden;\n", " if (typeof document.hidden !== \"undefined\") { // Opera 12.10 and Firefox 18 and later support\n", " hidden = \"hidden\";\n", " } else if (typeof document.msHidden !== \"undefined\") {\n", " hidden = \"msHidden\";\n", " } else if (typeof document.webkitHidden !== \"undefined\") {\n", " hidden = \"webkitHidden\";\n", " }\n", "\n", " // Set initial background state\n", " if (document[hidden]) {\n", " window.jupyterNotifyIsInBackground = true;\n", " } else {\n", " window.jupyterNotifyIsInBackground = false;\n", " }\n", "\n", " window.addEventListener('blur', function() { window.jupyterNotifyIsInBackground = true; }, false);\n", " window.addEventListener('focus', function() { window.jupyterNotifyIsInBackground = false; }, false);\n", "})();\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%load_ext jupyternotify" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "iWWi1zMYo7ZD" }, "outputs": [], "source": [ "%autonotify -a 30 -o" ] }, { "cell_type": "markdown", "metadata": { "id": "DkrtaRkauKLL" }, "source": [ "### Set HF Cache" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hpIyW4yWuHvd" }, "outputs": [], "source": [ "# %env HF_DATASETS_CACHE" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "hHOfHUYQduRX" }, "outputs": [], "source": [ "#%env TRANSFORMERS_CACHE" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "oR9oblHTduRY" }, "outputs": [], "source": [ "!rm -rf /root/.cache" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "KkaKgWKnduRY" }, "outputs": [], "source": [ "!mkdir -p /content/cache" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "aJMsnPnOduRY" }, "outputs": [], "source": [ "!ln -s /content/cache /root/.cache" ] }, { "cell_type": "markdown", "metadata": { "id": "FkZ6RzFyiAue" }, "source": [ "### Speed up model download" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "huZ5eZUUgJ76", "outputId": "ae3286d2-d076-4885-e736-11f70e4df99a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease\n", "Get:2 http://security.ubuntu.com/ubuntu jammy-security InRelease [110 kB] \n", "Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [119 kB] \n", "Hit:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64 InRelease\n", "Get:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [108 kB]\n", "Get:6 http://archive.ubuntu.com/ubuntu jammy-updates/universe amd64 Packages [1185 kB]\n", "Get:7 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 Packages [919 kB]\n", "Get:8 http://archive.ubuntu.com/ubuntu jammy-updates/restricted amd64 Packages [545 kB]\n", "Fetched 2985 kB in 1s (3189 kB/s) \n", "Reading package lists... Done\n", "Reading package lists... Done\n", "Building dependency tree... Done\n", "Reading state information... Done\n", "The following additional packages will be installed:\n", " libaria2-0 libc-ares2 libicu70 libssh2-1 libxml2\n", "The following NEW packages will be installed:\n", " aria2 libaria2-0 libc-ares2 libicu70 libssh2-1 libxml2\n", "0 upgraded, 6 newly installed, 0 to remove and 58 not upgraded.\n", "Need to get 13.0 MB of archives.\n", "After this operation, 43.2 MB of additional disk space will be used.\n", "Get:1 http://archive.ubuntu.com/ubuntu jammy/main amd64 libicu70 amd64 70.1-2 [10.6 MB]\n", "Get:2 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libxml2 amd64 2.9.13+dfsg-1ubuntu0.3 [763 kB]\n", "Get:3 http://archive.ubuntu.com/ubuntu jammy-updates/main amd64 libc-ares2 amd64 1.18.1-1ubuntu0.22.04.2 [45.0 kB]\n", "Get:4 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libssh2-1 amd64 1.10.0-3 [109 kB]\n", "Get:5 http://archive.ubuntu.com/ubuntu jammy/universe amd64 libaria2-0 amd64 1.36.0-1 [1086 kB]\n", "Get:6 http://archive.ubuntu.com/ubuntu jammy/universe amd64 aria2 amd64 1.36.0-1 [381 kB]\n", "Fetched 13.0 MB in 2s (8370 kB/s)\n", "debconf: delaying package configuration, since apt-utils is not installed\n", "Selecting previously unselected package libicu70:amd64.\n", "(Reading database ... 17816 files and directories currently installed.)\n", "Preparing to unpack .../0-libicu70_70.1-2_amd64.deb ...\n", "Unpacking libicu70:amd64 (70.1-2) ...\n", "Selecting previously unselected package libxml2:amd64.\n", "Preparing to unpack .../1-libxml2_2.9.13+dfsg-1ubuntu0.3_amd64.deb ...\n", "Unpacking libxml2:amd64 (2.9.13+dfsg-1ubuntu0.3) ...\n", "Selecting previously unselected package libc-ares2:amd64.\n", "Preparing to unpack .../2-libc-ares2_1.18.1-1ubuntu0.22.04.2_amd64.deb ...\n", "Unpacking libc-ares2:amd64 (1.18.1-1ubuntu0.22.04.2) ...\n", "Selecting previously unselected package libssh2-1:amd64.\n", "Preparing to unpack .../3-libssh2-1_1.10.0-3_amd64.deb ...\n", "Unpacking libssh2-1:amd64 (1.10.0-3) ...\n", "Selecting previously unselected package libaria2-0:amd64.\n", "Preparing to unpack .../4-libaria2-0_1.36.0-1_amd64.deb ...\n", "Unpacking libaria2-0:amd64 (1.36.0-1) ...\n", "Selecting previously unselected package aria2.\n", "Preparing to unpack .../5-aria2_1.36.0-1_amd64.deb ...\n", "Unpacking aria2 (1.36.0-1) ...\n", "Setting up libc-ares2:amd64 (1.18.1-1ubuntu0.22.04.2) ...\n", "Setting up libssh2-1:amd64 (1.10.0-3) ...\n", "Setting up libicu70:amd64 (70.1-2) ...\n", "Setting up libxml2:amd64 (2.9.13+dfsg-1ubuntu0.3) ...\n", "Setting up libaria2-0:amd64 (1.36.0-1) ...\n", "Setting up aria2 (1.36.0-1) ...\n", "Processing triggers for libc-bin (2.35-0ubuntu3.1) ...\n", "Updated git hooks.\n", "Git LFS initialized.\n", "Requirement already satisfied: requests in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (2.31.0)\n", "Requirement already satisfied: huggingface_hub in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (0.15.1)\n", "Requirement already satisfied: charset-normalizer<4,>=2 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from requests) (3.1.0)\n", "Requirement already satisfied: idna<4,>=2.5 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from requests) (3.4)\n", "Requirement already satisfied: urllib3<3,>=1.21.1 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from requests) (2.0.3)\n", "Requirement already satisfied: certifi>=2017.4.17 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from requests) (2023.5.7)\n", "Requirement already satisfied: filelock in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from huggingface_hub) (3.12.2)\n", "Requirement already satisfied: fsspec in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from huggingface_hub) (2023.6.0)\n", "Requirement already satisfied: tqdm>=4.42.1 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from huggingface_hub) (4.65.0)\n", "Requirement already satisfied: pyyaml>=5.1 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from huggingface_hub) (6.0)\n", "Requirement already satisfied: typing-extensions>=3.7.4.3 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from huggingface_hub) (4.6.3)\n", "Requirement already satisfied: packaging>=20.9 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from huggingface_hub) (23.1)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m" ] } ], "source": [ "!apt-get update\n", "!apt-get install -y aria2\n", "!git lfs install\n", "!pip install requests huggingface_hub" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "dqzd-Ul4iMPf" }, "outputs": [], "source": [ "#%cd /content/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mmJ7tmaIifTR" }, "outputs": [], "source": [ "#!git clone https://github.com/utensil/llm-playground" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "29dpWyRujiSZ", "outputId": "e97d1254-edad-4510-c694-4973186323ab" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/content/llm-playground\n" ] } ], "source": [ "%cd /content/llm-playground" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "c8nAsJ4ggJ77", "scrolled": true, "outputId": "65f92918-64fe-4d5b-cf4c-bc7417fceac1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Working directory changed to: /content/llm-playground/helper/..\n", "Skipping: .gitattributes\n", "Downloading the model to models/tiiuae_falcon-40b\n", "Running: aria2c -c -x 16 -s 16 -k 1M https://huggingface.co/tiiuae/falcon-40b/resolve/main/README.md -d models/tiiuae_falcon-40b -o README.md\n", "Running: aria2c -c -x 16 -s 16 -k 1M https://huggingface.co/tiiuae/falcon-40b/resolve/main/config.json -d models/tiiuae_falcon-40b -o config.json\n", "Running: aria2c -c -x 16 -s 16 -k 1M https://huggingface.co/tiiuae/falcon-40b/resolve/main/configuration_RW.py -d models/tiiuae_falcon-40b -o configuration_RW.py\n", "Running: aria2c -c -x 16 -s 16 -k 1M https://huggingface.co/tiiuae/falcon-40b/resolve/main/generation_config.json -d models/tiiuae_falcon-40b -o generation_config.json\n", "Running: aria2c -c -x 16 -s 16 -k 1M https://huggingface.co/tiiuae/falcon-40b/resolve/main/modelling_RW.py -d models/tiiuae_falcon-40b -o modelling_RW.py\n", "\n", "06/16 05:46:00 [\u001b[1;32mNOTICE\u001b[0m] Downloading 1 item(s)\n", "Running: aria2c -c -x 16 -s 16 -k 1M https://huggingface.co/tiiuae/falcon-40b/resolve/main/pytorch_model-00001-of-00009.bin -d models/tiiuae_falcon-40b -o pytorch_model-00001-of-00009.bin\n", "\n", "06/16 05:46:00 [\u001b[1;32mNOTICE\u001b[0m] Downloading 1 item(s)\n", "\n", "06/16 05:46:00 [\u001b[1;32mNOTICE\u001b[0m] Downloading 1 item(s)\n", "Running: aria2c -c -x 16 -s 16 -k 1M https://huggingface.co/tiiuae/falcon-40b/resolve/main/pytorch_model-00002-of-00009.bin -d models/tiiuae_falcon-40b -o pytorch_model-00002-of-00009.bin\n", "\n", "06/16 05:46:00 [\u001b[1;32mNOTICE\u001b[0m] Downloading 1 item(s)\n", "Running: aria2c -c -x 16 -s 16 -k 1M https://huggingface.co/tiiuae/falcon-40b/resolve/main/pytorch_model-00003-of-00009.bin -d models/tiiuae_falcon-40b -o pytorch_model-00003-of-00009.bin\n", " 0%| | 0/18 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "!python /content/llm-playground/helper/download-model.py tiiuae/falcon-40b\n", "# /content/llm-playground/models/tiiuae_falcon-40b" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ydCdRNI9gJ78" }, "outputs": [], "source": [ "# Downloading shards: 100%|████████████████████████| 9/9 [19:19<00:00, 128.85s/it]\n", "# Loading checkpoint shards: 100%|██████████████████| 9/9 [01:30<00:00, 10.08s/it]\n", "# (OK):download completed.\n", "# 100%|████████████████████████████████████████████████████████████████████| 18/18 [01:44<00:00, 5.81s/it]" ] }, { "cell_type": "markdown", "metadata": { "id": "RFqQyPp5HbAm" }, "source": [ "### HF Login" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "s6wC1lP23B-4", "outputId": "3326297a-1f36-4f48-8793-7ce777b29cdb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Token is valid (permission: write).\n", "Your token has been saved in your configured git credential helpers (store).\n", "Your token has been saved to /root/.cache/huggingface/token\n", "Login successful\n" ] } ], "source": [ "# For axolotl push_dataset_to_hub\n", "import os\n", "from huggingface_hub import notebook_login, login\n", "# Colab:\n", "# notebook_login()\n", "# RunPod:\n", "login(os.environ.get(\"HUGGINGFACE_TOKEN\"), add_to_git_credential=True)" ] }, { "cell_type": "markdown", "metadata": { "id": "R2AigYR_DY7X" }, "source": [ "### Update axolotl" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "IQYJdcd7DY7X" }, "outputs": [], "source": [ "%cd /workspace/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "PLn5aNACDY7X", "scrolled": true }, "outputs": [], "source": [ "!git clone https://github.com/OpenAccess-AI-Collective/axolotl axolotl-update" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "vdY5tXtYDY7X" }, "outputs": [], "source": [ "!cp -r axolotl-update/* axolotl" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kHsKmIUmDY7Y" }, "outputs": [], "source": [ "%cd /workspace/axolotl" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ljX3Yir2DY7Y", "scrolled": true }, "outputs": [], "source": [ "!git status" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Adru8lVlzhLi" }, "outputs": [], "source": [ "!ds_report" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wbaorAF2DY7Y", "scrolled": true }, "outputs": [], "source": [ "!pip install -e ." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "p_9eIEonzhLi" }, "outputs": [], "source": [ "!pip list|grep torch" ] }, { "cell_type": "markdown", "metadata": { "id": "SD1Ahx1QG5xm" }, "source": [ "### Init Storage" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "LBBF-lS2HVPG", "outputId": "4d389e27-531d-4bcc-c9ab-47cedde3b5eb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/content/axolotl-trained is already a clone of https://huggingface.co/utensil/axolotl-trained. Make sure you pull the latest changes with `repo.git_pull()`.\n" ] } ], "source": [ "!python /content/llm-playground/helper/storage.py utensil/axolotl-trained /content/ -m" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "0_FP2VeXH0Un", "outputId": "552f8ca9-1403-47e7-ec5e-098e3bd248c5" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "falcon-qlora-40b-gsm8k\n" ] } ], "source": [ "!ls /content/axolotl-trained" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "G86XUuQ1gJ8J" }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "oOXz9ZCJgJ8J" }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": { "id": "raYU6HmYduRa" }, "source": [ "### Reinstall PyTorch with CUDA 11.8 (optional)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Ym6v1nl6duRa", "outputId": "7e5078cd-0aba-4d65-8342-56106d536f03" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Looking in indexes: https://download.pytorch.org/whl/cu118\n", "Requirement already satisfied: torch in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (2.0.1)\n", "Collecting torch\n", " Using cached https://download.pytorch.org/whl/cu118/torch-2.0.1%2Bcu118-cp39-cp39-linux_x86_64.whl (2267.3 MB)\n", "Requirement already satisfied: filelock in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from torch) (3.12.2)\n", "Requirement already satisfied: typing-extensions in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from torch) (4.6.3)\n", "Requirement already satisfied: sympy in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from torch) (1.12)\n", "Requirement already satisfied: networkx in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from torch) (3.1)\n", "Requirement already satisfied: jinja2 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from torch) (3.1.2)\n", "Requirement already satisfied: triton==2.0.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from torch) (2.0.0)\n", "Requirement already satisfied: cmake in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from triton==2.0.0->torch) (3.26.4)\n", "Requirement already satisfied: lit in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from triton==2.0.0->torch) (16.0.6)\n", "Requirement already satisfied: MarkupSafe>=2.0 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from jinja2->torch) (2.1.3)\n", "Requirement already satisfied: mpmath>=0.19 in /root/miniconda3/envs/py3.9/lib/python3.9/site-packages (from sympy->torch) (1.3.0)\n", "Installing collected packages: torch\n", " Attempting uninstall: torch\n", " Found existing installation: torch 2.0.1\n", " Uninstalling torch-2.0.1:\n", " Successfully uninstalled torch-2.0.1\n", "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", "torchaudio 2.0.1+cu118 requires torch==2.0.0, but you have torch 2.0.1+cu118 which is incompatible.\u001b[0m\u001b[31m\n", "\u001b[0mSuccessfully installed torch-2.0.1+cu118\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\u001b[33m\n", "\u001b[0m" ] }, { "data": { "application/javascript": [ "$(document).ready(\n", " function() {\n", " function appendUniqueDiv(){\n", " // append a div with our uuid so we can check that it's already\n", " // been sent and avoid duplicates on page reload\n", " var notifiedDiv = document.createElement(\"div\")\n", " notifiedDiv.id = \"2a00f291-38a2-45ab-a95a-bb9362565c8c\"\n", " element.append(notifiedDiv)\n", " }\n", "\n", " // only send notifications if the pageload is complete; this will\n", " // help stop extra notifications when a saved notebook is loaded,\n", " // which during testing gives us state \"interactive\", not \"complete\"\n", " if (document.readyState === 'complete') {\n", " // check for the div that signifies that the notification\n", " // was already sent\n", " if (document.getElementById(\"2a00f291-38a2-45ab-a95a-bb9362565c8c\") === null) {\n", " var notificationPayload = {\"requireInteraction\": false, \"icon\": \"/static/base/images/favicon.ico\", \"body\": \"Cell Execution Has Finished!!\", \"autonotify_after\": \"30\", \"autonotify_output\": true, \"only_in_background\": false};\n", "\n", " // We have a notification but the window is active\n", " if (notificationPayload.only_in_background && !window.jupyterNotifyIsInBackground) {\n", " appendUniqueDiv();\n", " return;\n", " }\n", " if (Notification.permission !== 'denied') {\n", " if (Notification.permission !== 'granted') { \n", " Notification.requestPermission(function (permission) {\n", " if(!('permission' in Notification)) {\n", " Notification.permission = permission\n", " }\n", " })\n", " }\n", " if (Notification.permission === 'granted') {\n", " var notification = new Notification(\"Jupyter Notebook\", notificationPayload)\n", " appendUniqueDiv()\n", " notification.onclick = function () {\n", " window.focus();\n", " this.close();\n", " };\n", " } \n", " } \n", " }\n", " }\n", " }\n", ")\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "!pip3 install -U torch --index-url https://download.pytorch.org/whl/cu118" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "W0A0aPWJzhLk", "outputId": "06c4e42c-1953-4ff1-9fd0-96429b6fed84" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch 2.0.1+cu118\n", "torchaudio 2.0.1+cu118\n", "torchvision 0.15.2\n" ] } ], "source": [ "!pip list|grep torch" ] }, { "cell_type": "markdown", "metadata": { "id": "4sPSrhKPHIrS" }, "source": [ "### Reinstall deepspeed (optional)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "U_PIACM6duRa", "outputId": "3deaeef1-b635-4574-bc6f-2b0fc3649be8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setting ds_accelerator to cuda (auto detect)\n", "--------------------------------------------------\n", "DeepSpeed C++/CUDA extension op report\n", "--------------------------------------------------\n", "NOTE: Ops not installed will be just-in-time (JIT) compiled at\n", " runtime if needed. Op compatibility means that your system\n", " meet the required dependencies to JIT install the op.\n", "--------------------------------------------------\n", "JIT compiled ops requires ninja\n", "ninja .................. \u001b[92m[OKAY]\u001b[0m\n", "--------------------------------------------------\n", "op name ................ installed .. compatible\n", "--------------------------------------------------\n", "async_io ............... \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "cpu_adagrad ............ \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "cpu_adam ............... \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "fused_adam ............. \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "fused_lamb ............. \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "quantizer .............. \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "random_ltd ............. \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "\u001b[93m [WARNING] \u001b[0m sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0\n", "\u001b[93m [WARNING] \u001b[0m using untested triton version (2.0.0), only 1.0.0 is known to be compatible\n", "sparse_attn ............ \u001b[93m[NO]\u001b[0m ....... \u001b[93m[NO]\u001b[0m\n", "spatial_inference ...... \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "transformer ............ \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "stochastic_transformer . \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "transformer_inference .. \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "utils .................. \u001b[92m[YES]\u001b[0m ...... \u001b[92m[OKAY]\u001b[0m\n", "--------------------------------------------------\n", "DeepSpeed general environment info:\n", "torch install path ............... ['/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch']\n", "torch version .................... 2.0.1+cu118\n", "deepspeed install path ........... ['/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/deepspeed']\n", "deepspeed info ................... 0.9.3+52907a66, 52907a66, master\n", "torch cuda version ............... 11.8\n", "torch hip version ................ None\n", "nvcc version ..................... 11.8\n", "deepspeed wheel compiled w. ...... torch 2.0, cuda 11.8\n" ] } ], "source": [ "!ds_report" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "i1-u6vhWHvt6" }, "outputs": [], "source": [ "# !yes|pip uninstall deepspeed" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "xIpqugJOHuhh", "scrolled": true }, "outputs": [], "source": [ "# !TORCH_CUDA_ARCH_LIST=\"3.5;5.0;6.0;6.1;7.0;7.5;8.0;8.6+PTX\" DS_BUILD_OPS=1 DS_BUILD_SPARSE_ATTN=0 pip install deepspeed --global-option=\"build_ext\" --global-option=\"-j8\" # --global-option=\"bdist_wheel\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "o8JzOr94dDm1" }, "outputs": [], "source": [ "# !pip install deepspeed" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "xcKtKd_MdDm1" }, "outputs": [], "source": [ "# !ds_report" ] }, { "cell_type": "markdown", "metadata": { "id": "m_tw4OcVHQdK" }, "source": [ "### Init Configs" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zJvyUmZdktu0", "outputId": "14f03ce3-4cc1-4c5f-be1f-684bf1a4ec49" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/workspace/axolotl\n" ] } ], "source": [ "%cd /workspace/axolotl" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "mC48y25Lkqa5" }, "outputs": [], "source": [ "# Try no config\n", "# !accelerate config default" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "cL5E8urQEXiL", "outputId": "a5000955-4357-454b-fcf9-32b29e437ff6" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting ds_config.json\n" ] } ], "source": [ "%%writefile ds_config.json\n", "{\n", " \"zero_optimization\": {\n", " \"stage\": 3,\n", " \"offload_optimizer\": {\n", " \"device\": \"cpu\",\n", " \"pin_memory\": true\n", " },\n", " \"offload_param\": {\n", " \"device\": \"cpu\",\n", " \"pin_memory\": true\n", " },\n", " \"overlap_comm\": true,\n", " \"contiguous_gradients\": true,\n", " \"sub_group_size\": 0,\n", " \"reduce_bucket_size\": \"auto\",\n", " \"stage3_prefetch_bucket_size\": \"auto\",\n", " \"stage3_param_persistence_threshold\": \"auto\",\n", " \"stage3_max_live_parameters\": 0,\n", " \"stage3_max_reuse_distance\": 0,\n", " \"stage3_gather_16bit_weights_on_model_save\": true\n", " },\n", " \"bf16\": {\n", " \"enabled\": \"auto\"\n", " },\n", " \"fp16\": {\n", " \"enabled\": \"auto\",\n", " \"auto_cast\": false,\n", " \"loss_scale\": 0,\n", " \"initial_scale_power\": 32,\n", " \"loss_scale_window\": 1000,\n", " \"hysteresis\": 2,\n", " \"min_loss_scale\": 1\n", " },\n", " \"optimizer\": {\n", " \"type\": \"AdamW\",\n", " \"params\": {\n", " \"lr\": \"auto\",\n", " \"betas\": \"auto\",\n", " \"eps\": \"auto\",\n", " \"weight_decay\": \"auto\"\n", " }\n", " },\n", " \"scheduler\": {\n", " \"type\": \"WarmupDecayLR\",\n", " \"params\": {\n", " \"total_num_steps\": \"auto\",\n", " \"warmup_min_lr\": \"auto\",\n", " \"warmup_max_lr\": \"auto\",\n", " \"warmup_num_steps\": \"auto\"\n", " }\n", " },\n", " \"gradient_accumulation_steps\": \"auto\",\n", " \"train_batch_size\": \"auto\",\n", " \"train_micro_batch_size_per_gpu\": \"auto\",\n", " \"wall_clock_breakdown\": false\n", "}" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "efZJw_gCDY7J", "scrolled": true, "outputId": "e8106247-b926-4f3d-bd02-0781af7830b8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing examples/falcon/config-40b-qlora.yml\n" ] } ], "source": [ "%%writefile examples/falcon/config-40b-qlora.yml\n", "# 1b: tiiuae/falcon-rw-1b\n", "# 7b: tiiuae/falcon-7b\n", "# 40b: tiiuae/falcon-40b\n", "base_model: /content/llm-playground/models/tiiuae_falcon-40b\n", "base_model_config: /content/llm-playground/models/tiiuae_falcon-40b\n", "# required by falcon custom model code: https://huggingface.co/tiiuae/falcon-7b/tree/main\n", "trust_remote_code: true\n", "model_type: AutoModelForCausalLM\n", "tokenizer_type: AutoTokenizer\n", "load_in_8bit: false\n", "# enable 4bit for QLoRA\n", "load_in_4bit: true\n", "gptq: false\n", "strict: false\n", "\n", "push_dataset_to_hub: utensil\n", "hf_use_auth_token: true\n", "\n", "datasets:\n", " - path: winglian/evals\n", " data_files:\n", " - hf/ARC-Challenge.jsonl\n", " - hf/ARC-Easy.jsonl\n", " - hf/riddle_sense.jsonl\n", " type: explainchoice:chat\n", " - path: winglian/evals\n", " data_files:\n", " - hf/gsm8k.jsonl\n", " - hf/winogrande.jsonl\n", " type: alpaca_chat.load_qa\n", " - path: winglian/evals\n", " data_files:\n", " - custom/n_task.jsonl\n", " - custom/misconceptions.jsonl\n", " - custom/context_insensitivity.jsonl\n", " type: alpaca_chat\n", " - path: camel-ai/math\n", " type: alpaca_chat.load_camel_ai\n", " - path: camel-ai/biology\n", " type: alpaca_chat.load_camel_ai\n", " - path: camel-ai/physics\n", " type: alpaca_chat.load_camel_ai\n", " - path: camel-ai/chemistry\n", " type: alpaca_chat.load_camel_ai\n", " - path: winglian/evals\n", " data_files:\n", " - custom/in_context_qa.jsonl\n", " type: context_qa\n", " - path: winglian/evals\n", " data_files:\n", " - custom/in_context_qa.jsonl\n", " type: context_qa.load_404\n", " - path: winglian/evals\n", " data_files:\n", " - custom/jokes_explained_500up.jsonl\n", " type: sharegpt_jokes\n", " - path: winglian/evals\n", " data_files:\n", " - custom/classify-self-chat.sharegpt.jsonl\n", " - custom/coding-self-chat.sharegpt.jsonl\n", " - custom/prose-gpt4.sharegpt.jsonl\n", " - custom/prose-rewrite-gpt4.sharegpt.jsonl\n", " type: sharegpt_simple.load_role\n", " - path: winglian/evals\n", " data_files:\n", " - openai/tldr.jsonl\n", " type: summarizetldr:chat\n", " - path: winglian/evals\n", " data_files:\n", " - hellaswag/hellaswag.jsonl\n", " type: explainchoice:chat\n", " - path: metaeval/ScienceQA_text_only\n", " type: concisechoice:chat\n", " - path: teknium/GPT4-LLM-Cleaned\n", " type: alpaca_chat\n", " - path: teknium/GPTeacher-General-Instruct\n", " data_files: gpt4-instruct-similarity-0.6-dataset.json\n", " type: gpteacher:chat\n", " - path: QingyiSi/Alpaca-CoT\n", " data_files:\n", " - Chain-of-Thought/formatted_cot_data/aqua_train.json\n", " - Chain-of-Thought/formatted_cot_data/creak_train.json\n", " - Chain-of-Thought/formatted_cot_data/ecqa_train.json\n", " - Chain-of-Thought/formatted_cot_data/esnli_train.json\n", " - Chain-of-Thought/formatted_cot_data/qasc_train.json\n", " - Chain-of-Thought/formatted_cot_data/qed_train.json\n", " - Chain-of-Thought/formatted_cot_data/sensemaking_train.json\n", " - Chain-of-Thought/formatted_cot_data/strategyqa_train.json\n", " - GPTeacher/Roleplay/formatted_roleplay-similarity_0.6-instruct-dataset.json\n", " type: alpaca_chat\n", " - path: ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered\n", " type: alpaca_chat\n", " - path: ehartford/wizard_vicuna_70k_unfiltered\n", " type: sharegpt:chat\n", "\n", "dataset_prepared_path: last_run_prepared\n", "val_set_size: 0.01\n", "# enable QLoRA\n", "adapter: qlora\n", "lora_model_dir:\n", "sequence_len: 2048\n", "max_packed_sequence_len: 2048\n", "\n", "# hyperparameters from QLoRA paper Appendix B.2\n", "# \"We find hyperparameters to be largely robust across datasets\"\n", "lora_r: 64\n", "lora_alpha: 16\n", "# 0.1 for models up to 13B\n", "# 0.05 for 33B and 65B models\n", "lora_dropout: 0.05\n", "# add LoRA modules on all linear layers of the base model\n", "lora_target_modules:\n", "lora_target_linear: true\n", "lora_fan_in_fan_out:\n", "\n", "wandb_project: falcon-qlora\n", "wandb_watch:\n", "wandb_run_id:\n", "wandb_log_model:\n", "output_dir: /content/axolotl-trained/falcon-qlora-40b-minotaur/\n", "\n", "# QLoRA paper Table 9\n", "# - 16 for 7b & 13b\n", "# - 32 for 33b, 64 for 64b\n", "# Max size tested on A6000\n", "# - 7b: 40\n", "# - 40b: 4\n", "# decrease if OOM, increase for max VRAM utilization\n", "micro_batch_size: 4\n", "gradient_accumulation_steps: 1\n", "num_epochs: 3\n", "# Optimizer for QLoRA\n", "optimizer: paged_adamw_32bit\n", "torchdistx_path:\n", "lr_scheduler: cosine\n", "# QLoRA paper Table 9\n", "# - 2e-4 for 7b & 13b\n", "# - 1e-4 for 33b & 64b\n", "learning_rate: 0.0002\n", "train_on_inputs: false\n", "group_by_length: false\n", "bf16: true\n", "fp16: false\n", "tf32: true\n", "gradient_checkpointing: true\n", "# stop training after this many evaluation losses have increased in a row\n", "# https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback\n", "# early_stopping_patience: 3\n", "resume_from_checkpoint:\n", "auto_resume_from_checkpoints: true\n", "local_rank:\n", "logging_steps: 1\n", "xformers_attention: true\n", "flash_attention:\n", "gptq_groupsize:\n", "gptq_model_v1:\n", "warmup_steps: 10\n", "eval_steps: 5\n", "save_steps: 10\n", "debug:\n", "deepspeed:\n", "weight_decay: 0.01\n", "adam_beta1:\n", "adam_beta2: 0.999\n", "adam_epsilon:\n", "# Gradient clipping max norm\n", "max_grad_norm: 0.3\n", "\n", "fsdp:\n", "fsdp_config:\n", "special_tokens:\n", " pad_token: \"<|endoftext|>\"\n", " bos_token: \">>ABSTRACT<<\"\n", " eos_token: \"<|endoftext|>\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "uTWOnrpzEr-1" }, "outputs": [], "source": [ "%env ACCELERATE_USE_DEEPSPEED=true" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "iz9sbxRAElkU", "outputId": "8d4a834e-f6a9-4fa3-91ce-ee5e3e87936b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Writing scripts/ft.py\n" ] } ], "source": [ "%%writefile scripts/ft.py\n", "import os\n", "from pathlib import Path\n", "import fire\n", "import logging\n", "import finetune\n", "from axolotl.utils.trainer import setup_trainer as setup_trainer_orig\n", "\n", "logging.basicConfig(level=os.getenv(\"LOG_LEVEL\", \"INFO\"))\n", "\n", "def train_ex(\n", " config: Path = Path(\"configs/\"),\n", " prepare_ds_only: bool = False,\n", " **kwargs,\n", "):\n", " logging.info('train_ex before')\n", " finetune.train(config, prepare_ds_only, **kwargs)\n", " logging.info('train_ex after')\n", "\n", "def setup_trainer_ex(cfg, train_dataset, eval_dataset, model, tokenizer):\n", " logging.info('setup_trainer_ex before')\n", " logging.info(f'cfg.some_config = {cfg.some_config}')\n", " trainer = setup_trainer_orig(cfg, train_dataset, eval_dataset, model, tokenizer)\n", " logging.info('setup_trainer_ex after')\n", " return trainer\n", "\n", "finetune.setup_trainer = setup_trainer_ex\n", "\n", "if __name__ == \"__main__\":\n", " fire.Fire(train_ex)" ] }, { "cell_type": "markdown", "metadata": { "id": "vj6CD_zZHUpG" }, "source": [ "# Training" ] }, { "cell_type": "markdown", "metadata": { "id": "2CqrhYq-DY7J" }, "source": [ "## #1 ACC mbs 4 2xA100 minotaur" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "GsRPtribgJ8O", "outputId": "a3452f74-6422-4bcc-b57c-c15a3a9f264a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "env: HF_HUB_DISABLE_PROGRESS_BARS=1\n" ] } ], "source": [ "%env HF_HUB_DISABLE_PROGRESS_BARS=1" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "I_oAE22wgJ8Q", "outputId": "e3b97be5-81aa-427a-911c-fcebf1b97b30" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "env: ACCELERATE_USE_DEEPSPEED=false\n" ] } ], "source": [ "%env ACCELERATE_USE_DEEPSPEED=false" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "m66yoSyzK7uC", "outputId": "e1a64fea-1591-4205-b3ae-74d050ceae7b" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/workspace/axolotl\n" ] } ], "source": [ "%cd /workspace/axolotl" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "joQh6mGBm3_J", "outputId": "bfd4a19e-dc7e-4af9-8f85-081909bce3cb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# 1b: tiiuae/falcon-rw-1b\n", "# 7b: tiiuae/falcon-7b\n", "# 40b: tiiuae/falcon-40b\n", "base_model: /content/llm-playground/models/tiiuae_falcon-40b\n", "base_model_config: /content/llm-playground/models/tiiuae_falcon-40b\n", "# required by falcon custom model code: https://huggingface.co/tiiuae/falcon-7b/tree/main\n", "trust_remote_code: true\n", "model_type: AutoModelForCausalLM\n", "tokenizer_type: AutoTokenizer\n", "load_in_8bit: false\n", "# enable 4bit for QLoRA\n", "load_in_4bit: true\n", "gptq: false\n", "strict: false\n", "\n", "push_dataset_to_hub: utensil\n", "hf_use_auth_token: true\n", "\n", "datasets:\n", " - path: winglian/evals\n", " data_files:\n", " - hf/ARC-Challenge.jsonl\n", " - hf/ARC-Easy.jsonl\n", " - hf/riddle_sense.jsonl\n", " type: explainchoice:chat\n", " - path: winglian/evals\n", " data_files:\n", " - hf/gsm8k.jsonl\n", " - hf/winogrande.jsonl\n", " type: alpaca_chat.load_qa\n", " - path: winglian/evals\n", " data_files:\n", " - custom/n_task.jsonl\n", " - custom/misconceptions.jsonl\n", " - custom/context_insensitivity.jsonl\n", " type: alpaca_chat\n", " - path: camel-ai/math\n", " type: alpaca_chat.load_camel_ai\n", " - path: camel-ai/biology\n", " type: alpaca_chat.load_camel_ai\n", " - path: camel-ai/physics\n", " type: alpaca_chat.load_camel_ai\n", " - path: camel-ai/chemistry\n", " type: alpaca_chat.load_camel_ai\n", " - path: winglian/evals\n", " data_files:\n", " - custom/in_context_qa.jsonl\n", " type: context_qa\n", " - path: winglian/evals\n", " data_files:\n", " - custom/in_context_qa.jsonl\n", " type: context_qa.load_404\n", " - path: winglian/evals\n", " data_files:\n", " - custom/jokes_explained_500up.jsonl\n", " type: sharegpt_jokes\n", " - path: winglian/evals\n", " data_files:\n", " - custom/classify-self-chat.sharegpt.jsonl\n", " - custom/coding-self-chat.sharegpt.jsonl\n", " - custom/prose-gpt4.sharegpt.jsonl\n", " - custom/prose-rewrite-gpt4.sharegpt.jsonl\n", " type: sharegpt_simple.load_role\n", " - path: winglian/evals\n", " data_files:\n", " - openai/tldr.jsonl\n", " type: summarizetldr:chat\n", " - path: winglian/evals\n", " data_files:\n", " - hellaswag/hellaswag.jsonl\n", " type: explainchoice:chat\n", " - path: metaeval/ScienceQA_text_only\n", " type: concisechoice:chat\n", " - path: teknium/GPT4-LLM-Cleaned\n", " type: alpaca_chat\n", " - path: teknium/GPTeacher-General-Instruct\n", " data_files: gpt4-instruct-similarity-0.6-dataset.json\n", " type: gpteacher:chat\n", " - path: QingyiSi/Alpaca-CoT\n", " data_files:\n", " - Chain-of-Thought/formatted_cot_data/aqua_train.json\n", " - Chain-of-Thought/formatted_cot_data/creak_train.json\n", " - Chain-of-Thought/formatted_cot_data/ecqa_train.json\n", " - Chain-of-Thought/formatted_cot_data/esnli_train.json\n", " - Chain-of-Thought/formatted_cot_data/qasc_train.json\n", " - Chain-of-Thought/formatted_cot_data/qed_train.json\n", " - Chain-of-Thought/formatted_cot_data/sensemaking_train.json\n", " - Chain-of-Thought/formatted_cot_data/strategyqa_train.json\n", " - GPTeacher/Roleplay/formatted_roleplay-similarity_0.6-instruct-dataset.json\n", " type: alpaca_chat\n", " - path: ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered\n", " type: alpaca_chat\n", " - path: ehartford/wizard_vicuna_70k_unfiltered\n", " type: sharegpt:chat\n", "\n", "dataset_prepared_path: last_run_prepared\n", "val_set_size: 0.01\n", "# enable QLoRA\n", "adapter: qlora\n", "lora_model_dir:\n", "sequence_len: 2048\n", "max_packed_sequence_len: 2048\n", "\n", "# hyperparameters from QLoRA paper Appendix B.2\n", "# \"We find hyperparameters to be largely robust across datasets\"\n", "lora_r: 64\n", "lora_alpha: 16\n", "# 0.1 for models up to 13B\n", "# 0.05 for 33B and 65B models\n", "lora_dropout: 0.05\n", "# add LoRA modules on all linear layers of the base model\n", "lora_target_modules:\n", "lora_target_linear: true\n", "lora_fan_in_fan_out:\n", "\n", "wandb_project: falcon-qlora\n", "wandb_watch:\n", "wandb_run_id:\n", "wandb_log_model:\n", "output_dir: /content/axolotl-trained/falcon-qlora-40b-minotaur/\n", "\n", "# QLoRA paper Table 9\n", "# - 16 for 7b & 13b\n", "# - 32 for 33b, 64 for 64b\n", "# Max size tested on A6000\n", "# - 7b: 40\n", "# - 40b: 4\n", "# decrease if OOM, increase for max VRAM utilization\n", "micro_batch_size: 4\n", "gradient_accumulation_steps: 1\n", "num_epochs: 3\n", "# Optimizer for QLoRA\n", "optimizer: paged_adamw_32bit\n", "torchdistx_path:\n", "lr_scheduler: cosine\n", "# QLoRA paper Table 9\n", "# - 2e-4 for 7b & 13b\n", "# - 1e-4 for 33b & 64b\n", "learning_rate: 0.0002\n", "train_on_inputs: false\n", "group_by_length: false\n", "bf16: true\n", "fp16: false\n", "tf32: true\n", "gradient_checkpointing: true\n", "# stop training after this many evaluation losses have increased in a row\n", "# https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback\n", "# early_stopping_patience: 3\n", "resume_from_checkpoint:\n", "auto_resume_from_checkpoints: true\n", "local_rank:\n", "logging_steps: 1\n", "xformers_attention: true\n", "flash_attention:\n", "gptq_groupsize:\n", "gptq_model_v1:\n", "warmup_steps: 10\n", "eval_steps: 5\n", "save_steps: 10\n", "debug:\n", "deepspeed:\n", "weight_decay: 0.01\n", "adam_beta1:\n", "adam_beta2: 0.999\n", "adam_epsilon:\n", "# Gradient clipping max norm\n", "max_grad_norm: 0.3\n", "\n", "fsdp:\n", "fsdp_config:\n", "special_tokens:\n", " pad_token: \"<|endoftext|>\"\n", " bos_token: \">>ABSTRACT<<\"\n", " eos_token: \"<|endoftext|>\"\n" ] } ], "source": [ "!cat examples/falcon/config-40b-qlora.yml" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ULwPlElbm73f" }, "outputs": [], "source": [ "#%%writefile examples/falcon/config-40b-qlora.yml" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jICMPJuomFsx", "scrolled": true, "outputId": "84276356-abdd-40a7-e9a1-916e276d47f2" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setting ds_accelerator to cuda (auto detect)\n", "The following values were not passed to `accelerate launch` and had defaults used instead:\n", "\t`--num_processes` was set to a value of `2`\n", "\t\tMore than one GPU was found, enabling multi-GPU training.\n", "\t\tIf this was unintended please pass in `--num_processes=1`.\n", "\t`--num_machines` was set to a value of `1`\n", "\t`--mixed_precision` was set to a value of `'no'`\n", "\t`--dynamo_backend` was set to a value of `'no'`\n", "To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.\n", "Setting ds_accelerator to cuda (auto detect)\n", "Setting ds_accelerator to cuda (auto detect)\n", "\n", "===================================BUG REPORT===================================\n", "Welcome to bitsandbytes. For bug reports, please run\n", "\n", "python -m bitsandbytes\n", "\n", " and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n", "================================================================================\n", "\n", "===================================BUG REPORT===================================\n", "Welcome to bitsandbytes. For bug reports, please run\n", "\n", "python -m bitsandbytes\n", "\n", " and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n", "================================================================================\n", "bin /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//matplotlib_inline.backend_inline'), PosixPath('module')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDvnE4umHheXhWsDJbbukYvvyc47/mC4z8syS93btA72T90WDrQagOy5O+DrhdXOvr5i/JwsTlAImy57eLRrtRFOrQq73jyi7Dzo0tvrAiNLVgX2q2dFLoplRyXDXiVYLPmPieMWQOeUCLeSb8FC5zzllcocZwjMXpxScDerZqnlAR0ccpSkGyKIod4ZMkn/29A/C5kHEb/wT8cOAq+MWJ/2okZZgbiR0AMV4DynAkrtcx9JnJnTs9chiMyH+dyCS42Ai24sHWJBkQo6TfxXkyKo9GOpu3Y2WLgrHyaot9Lk5mA1mujyIWdlReD2nvjeCQKjl3KW3xZ73m4nD97MydWSWoJfEWlr+VZvk8EWsZk3CYLZCIBLdod6xXJJ0DD0pvTIq11c8VB7XkgVjapuU/sC8M6HFzHW/NBeE+xX/txPkZkIGqrnxeQ0AtBXdN9ukyNGhGzTkPYJNliiYpY0dCvVuz/BJ2FawFTQGnD1EHOenUCRajREFGCbKoYZqi40j8= utensil@Utensils-MacBook-Pro.local')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_dvz1krnf/none_vrp7sfn2/attempt_0/1/error.json')}\n", " warn(msg)\n", "CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.\n", "Either way, this might cause trouble in the future:\n", "If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.\n", " warn(msg)\n", "CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so\n", "CUDA SETUP: Highest compute capability among GPUs detected: 8.0\n", "CUDA SETUP: Detected CUDA version 118\n", "CUDA SETUP: Loading binary /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so...\n", "bin /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('module'), PosixPath('//matplotlib_inline.backend_inline')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDvnE4umHheXhWsDJbbukYvvyc47/mC4z8syS93btA72T90WDrQagOy5O+DrhdXOvr5i/JwsTlAImy57eLRrtRFOrQq73jyi7Dzo0tvrAiNLVgX2q2dFLoplRyXDXiVYLPmPieMWQOeUCLeSb8FC5zzllcocZwjMXpxScDerZqnlAR0ccpSkGyKIod4ZMkn/29A/C5kHEb/wT8cOAq+MWJ/2okZZgbiR0AMV4DynAkrtcx9JnJnTs9chiMyH+dyCS42Ai24sHWJBkQo6TfxXkyKo9GOpu3Y2WLgrHyaot9Lk5mA1mujyIWdlReD2nvjeCQKjl3KW3xZ73m4nD97MydWSWoJfEWlr+VZvk8EWsZk3CYLZCIBLdod6xXJJ0DD0pvTIq11c8VB7XkgVjapuU/sC8M6HFzHW/NBeE+xX/txPkZkIGqrnxeQ0AtBXdN9ukyNGhGzTkPYJNliiYpY0dCvVuz/BJ2FawFTQGnD1EHOenUCRajREFGCbKoYZqi40j8= utensil@Utensils-MacBook-Pro.local')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_dvz1krnf/none_vrp7sfn2/attempt_0/0/error.json')}\n", " warn(msg)\n", "CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.\n", "Either way, this might cause trouble in the future:\n", "If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.\n", " warn(msg)\n", "CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0\n", "CUDA SETUP: Highest compute capability among GPUs detected: 8.0\n", "CUDA SETUP: Detected CUDA version 118\n", "CUDA SETUP: Loading binary /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so...\n", "WARNING:root:`trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.\n", "INFO:root:loading tokenizer... /content/llm-playground/models/tiiuae_falcon-40b\n", "WARNING:root:`trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.\n", "INFO:root:loading tokenizer... /content/llm-playground/models/tiiuae_falcon-40b\n", "Using bos_token, but it is not set yet.\n", "Using pad_token, but it is not set yet.\n", "Using unk_token, but it is not set yet.\n", "INFO:root:Checking for packed prepared dataset from hub... utensil/216aa4e2be3974d0a1d25d8c19db01a4\n", "Using bos_token, but it is not set yet.\n", "Using pad_token, but it is not set yet.\n", "Using unk_token, but it is not set yet.\n", "INFO:root:Checking for packed prepared dataset from hub... utensil/216aa4e2be3974d0a1d25d8c19db01a4\n", "WARNING:datasets.builder:Found cached dataset parquet (/root/.cache/huggingface/datasets/utensil___parquet/utensil--9a63aa2c07ace8350a0e8b32ab913f2a-9cb79670bc460bd7/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7)\n", "100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 127.56it/s]\n", "INFO:root:packing master dataset to len: 2048\n", "WARNING:datasets.builder:Found cached dataset parquet (/root/.cache/huggingface/datasets/utensil___parquet/utensil--9a63aa2c07ace8350a0e8b32ab913f2a-9cb79670bc460bd7/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7)\n", "100%|████████████████████████████████████████████| 1/1 [00:00<00:00, 129.49it/s]\n", "INFO:root:packing master dataset to len: 2048\n", "INFO:root:loading model and peft_config...\n", "Loading checkpoint shards: 0%| | 0/9 [00:00\n", " fire.Fire(train)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py\", line 141, in Fire\n", " component_trace = _Fire(component, args, parsed_flag_args, context, name)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py\", line 475, in _Fire\n", " component, remaining_args = _CallAndUpdateTrace(\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/fire/core.py\", line 691, in _CallAndUpdateTrace\n", " component = fn(*varargs, **kwargs)\n", " File \"/workspace/axolotl/scripts/finetune.py\", line 337, in train\n", " trainer.train(resume_from_checkpoint=resume_from_checkpoint)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py\", line 1540, in train\n", " return inner_training_loop(\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py\", line 1884, in _inner_training_loop\n", " self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py\", line 2185, in _maybe_log_save_evaluate\n", " metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py\", line 2931, in evaluate\n", " output = eval_loop(\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py\", line 3120, in evaluation_loop\n", " loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py\", line 3363, in prediction_step\n", " loss, outputs = self.compute_loss(model, inputs, return_outputs=True)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/transformers/trainer.py\", line 2662, in compute_loss\n", " outputs = model(**inputs)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py\", line 1501, in _call_impl\n", " return forward_call(*args, **kwargs)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/peft/peft_model.py\", line 785, in forward\n", " return self.base_model(\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py\", line 1501, in _call_impl\n", " return forward_call(*args, **kwargs)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py\", line 165, in new_forward\n", " output = old_forward(*args, **kwargs)\n", " File \"/root/.cache/huggingface/modules/transformers_modules/tiiuae_falcon-40b/modelling_RW.py\", line 759, in forward\n", " transformer_outputs = self.transformer(\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py\", line 1501, in _call_impl\n", " return forward_call(*args, **kwargs)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py\", line 165, in new_forward\n", " output = old_forward(*args, **kwargs)\n", " File \"/root/.cache/huggingface/modules/transformers_modules/tiiuae_falcon-40b/modelling_RW.py\", line 654, in forward\n", " outputs = block(\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py\", line 1501, in _call_impl\n", " return forward_call(*args, **kwargs)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py\", line 165, in new_forward\n", " output = old_forward(*args, **kwargs)\n", " File \"/root/.cache/huggingface/modules/transformers_modules/tiiuae_falcon-40b/modelling_RW.py\", line 396, in forward\n", " attn_outputs = self.self_attention(\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/torch/nn/modules/module.py\", line 1501, in _call_impl\n", " return forward_call(*args, **kwargs)\n", " File \"/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/accelerate/hooks.py\", line 165, in new_forward\n", " output = old_forward(*args, **kwargs)\n", " File \"/root/.cache/huggingface/modules/transformers_modules/tiiuae_falcon-40b/modelling_RW.py\", line 255, in forward\n", " (query_layer, key_layer, value_layer) = self._split_heads(fused_qkv)\n", " File \"/root/.cache/huggingface/modules/transformers_modules/tiiuae_falcon-40b/modelling_RW.py\", line 201, in _split_heads\n", " k = qkv[:, :, :, [-2]]\n", "KeyboardInterrupt\n" ] }, { "data": { "application/javascript": [ "$(document).ready(\n", " function() {\n", " function appendUniqueDiv(){\n", " // append a div with our uuid so we can check that it's already\n", " // been sent and avoid duplicates on page reload\n", " var notifiedDiv = document.createElement(\"div\")\n", " notifiedDiv.id = \"8a07ca3b-7744-4187-87f4-9deef5baf20d\"\n", " element.append(notifiedDiv)\n", " }\n", "\n", " // only send notifications if the pageload is complete; this will\n", " // help stop extra notifications when a saved notebook is loaded,\n", " // which during testing gives us state \"interactive\", not \"complete\"\n", " if (document.readyState === 'complete') {\n", " // check for the div that signifies that the notification\n", " // was already sent\n", " if (document.getElementById(\"8a07ca3b-7744-4187-87f4-9deef5baf20d\") === null) {\n", " var notificationPayload = {\"requireInteraction\": false, \"icon\": \"/static/base/images/favicon.ico\", \"body\": \"Cell Execution Has Finished!!\", \"autonotify_after\": \"30\", \"autonotify_output\": true, \"only_in_background\": false};\n", "\n", " // We have a notification but the window is active\n", " if (notificationPayload.only_in_background && !window.jupyterNotifyIsInBackground) {\n", " appendUniqueDiv();\n", " return;\n", " }\n", " if (Notification.permission !== 'denied') {\n", " if (Notification.permission !== 'granted') { \n", " Notification.requestPermission(function (permission) {\n", " if(!('permission' in Notification)) {\n", " Notification.permission = permission\n", " }\n", " })\n", " }\n", " if (Notification.permission === 'granted') {\n", " var notification = new Notification(\"Jupyter Notebook\", notificationPayload)\n", " appendUniqueDiv()\n", " notification.onclick = function () {\n", " window.focus();\n", " this.close();\n", " };\n", " } \n", " } \n", " }\n", " }\n", " }\n", ")\n" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "!accelerate launch scripts/finetune.py examples/falcon/config-40b-qlora.yml" ] }, { "cell_type": "markdown", "metadata": { "id": "yIfs-AbKDY7K" }, "source": [ "## #2 optimizer: adamw_bnb_8bit" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "3gNWb1VKgJ8S", "outputId": "3892191c-932f-4966-ab0a-6b8d37b66604" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "env: ACCELERATE_USE_DEEPSPEED=false\n" ] } ], "source": [ "%env ACCELERATE_USE_DEEPSPEED=false" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "6U1u8RtbDY7K", "scrolled": true, "outputId": "6088cd8e-9579-4a5f-c9d3-44284bd577eb" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "# 1b: tiiuae/falcon-rw-1b\n", "# 7b: tiiuae/falcon-7b\n", "# 40b: tiiuae/falcon-40b\n", "base_model: /content/llm-playground/models/tiiuae_falcon-40b\n", "base_model_config: /content/llm-playground/models/tiiuae_falcon-40b\n", "# required by falcon custom model code: https://huggingface.co/tiiuae/falcon-7b/tree/main\n", "trust_remote_code: true\n", "model_type: AutoModelForCausalLM\n", "tokenizer_type: AutoTokenizer\n", "load_in_8bit: false\n", "# enable 4bit for QLoRA\n", "load_in_4bit: true\n", "gptq: false\n", "strict: false\n", "\n", "push_dataset_to_hub: utensil\n", "hf_use_auth_token: true\n", "\n", "datasets:\n", " - path: winglian/evals\n", " data_files:\n", " - hf/ARC-Challenge.jsonl\n", " - hf/ARC-Easy.jsonl\n", " - hf/riddle_sense.jsonl\n", " type: explainchoice:chat\n", " - path: winglian/evals\n", " data_files:\n", " - hf/gsm8k.jsonl\n", " - hf/winogrande.jsonl\n", " type: alpaca_chat.load_qa\n", " - path: winglian/evals\n", " data_files:\n", " - custom/n_task.jsonl\n", " - custom/misconceptions.jsonl\n", " - custom/context_insensitivity.jsonl\n", " type: alpaca_chat\n", " - path: camel-ai/math\n", " type: alpaca_chat.load_camel_ai\n", " - path: camel-ai/biology\n", " type: alpaca_chat.load_camel_ai\n", " - path: camel-ai/physics\n", " type: alpaca_chat.load_camel_ai\n", " - path: camel-ai/chemistry\n", " type: alpaca_chat.load_camel_ai\n", " - path: winglian/evals\n", " data_files:\n", " - custom/in_context_qa.jsonl\n", " type: context_qa\n", " - path: winglian/evals\n", " data_files:\n", " - custom/in_context_qa.jsonl\n", " type: context_qa.load_404\n", " - path: winglian/evals\n", " data_files:\n", " - custom/jokes_explained_500up.jsonl\n", " type: sharegpt_jokes\n", " - path: winglian/evals\n", " data_files:\n", " - custom/classify-self-chat.sharegpt.jsonl\n", " - custom/coding-self-chat.sharegpt.jsonl\n", " - custom/prose-gpt4.sharegpt.jsonl\n", " - custom/prose-rewrite-gpt4.sharegpt.jsonl\n", " type: sharegpt_simple.load_role\n", " - path: winglian/evals\n", " data_files:\n", " - openai/tldr.jsonl\n", " type: summarizetldr:chat\n", " - path: winglian/evals\n", " data_files:\n", " - hellaswag/hellaswag.jsonl\n", " type: explainchoice:chat\n", " - path: metaeval/ScienceQA_text_only\n", " type: concisechoice:chat\n", " - path: teknium/GPT4-LLM-Cleaned\n", " type: alpaca_chat\n", " - path: teknium/GPTeacher-General-Instruct\n", " data_files: gpt4-instruct-similarity-0.6-dataset.json\n", " type: gpteacher:chat\n", " - path: QingyiSi/Alpaca-CoT\n", " data_files:\n", " - Chain-of-Thought/formatted_cot_data/aqua_train.json\n", " - Chain-of-Thought/formatted_cot_data/creak_train.json\n", " - Chain-of-Thought/formatted_cot_data/ecqa_train.json\n", " - Chain-of-Thought/formatted_cot_data/esnli_train.json\n", " - Chain-of-Thought/formatted_cot_data/qasc_train.json\n", " - Chain-of-Thought/formatted_cot_data/qed_train.json\n", " - Chain-of-Thought/formatted_cot_data/sensemaking_train.json\n", " - Chain-of-Thought/formatted_cot_data/strategyqa_train.json\n", " - GPTeacher/Roleplay/formatted_roleplay-similarity_0.6-instruct-dataset.json\n", " type: alpaca_chat\n", " - path: ehartford/WizardLM_alpaca_evol_instruct_70k_unfiltered\n", " type: alpaca_chat\n", " - path: ehartford/wizard_vicuna_70k_unfiltered\n", " type: sharegpt:chat\n", "\n", "dataset_prepared_path: last_run_prepared\n", "val_set_size: 0.01\n", "# enable QLoRA\n", "adapter: qlora\n", "lora_model_dir:\n", "sequence_len: 2048\n", "max_packed_sequence_len: 2048\n", "\n", "# hyperparameters from QLoRA paper Appendix B.2\n", "# \"We find hyperparameters to be largely robust across datasets\"\n", "lora_r: 64\n", "lora_alpha: 16\n", "# 0.1 for models up to 13B\n", "# 0.05 for 33B and 65B models\n", "lora_dropout: 0.05\n", "# add LoRA modules on all linear layers of the base model\n", "lora_target_modules:\n", "lora_target_linear: true\n", "lora_fan_in_fan_out:\n", "\n", "wandb_project: falcon-qlora\n", "wandb_watch:\n", "wandb_run_id:\n", "wandb_log_model:\n", "output_dir: /content/axolotl-trained/falcon-qlora-40b-minotaur/\n", "\n", "# QLoRA paper Table 9\n", "# - 16 for 7b & 13b\n", "# - 32 for 33b, 64 for 64b\n", "# Max size tested on A6000\n", "# - 7b: 40\n", "# - 40b: 4\n", "# decrease if OOM, increase for max VRAM utilization\n", "micro_batch_size: 4\n", "gradient_accumulation_steps: 1\n", "num_epochs: 3\n", "# Optimizer for QLoRA\n", "optimizer: adamw_bnb_8bit\n", "torchdistx_path:\n", "lr_scheduler: cosine\n", "# QLoRA paper Table 9\n", "# - 2e-4 for 7b & 13b\n", "# - 1e-4 for 33b & 64b\n", "learning_rate: 0.0002\n", "train_on_inputs: false\n", "group_by_length: false\n", "bf16: true\n", "fp16: false\n", "tf32: true\n", "gradient_checkpointing: true\n", "# stop training after this many evaluation losses have increased in a row\n", "# https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback\n", "# early_stopping_patience: 3\n", "resume_from_checkpoint:\n", "auto_resume_from_checkpoints: true\n", "local_rank:\n", "logging_steps: 1\n", "xformers_attention: true\n", "flash_attention:\n", "gptq_groupsize:\n", "gptq_model_v1:\n", "warmup_steps: 10\n", "eval_steps: 5\n", "save_steps: 10\n", "debug:\n", "deepspeed:\n", "weight_decay: 0.01\n", "adam_beta1:\n", "adam_beta2: 0.999\n", "adam_epsilon:\n", "# Gradient clipping max norm\n", "max_grad_norm: 0.3\n", "\n", "fsdp:\n", "fsdp_config:\n", "special_tokens:\n", " pad_token: \"<|endoftext|>\"\n", " bos_token: \">>ABSTRACT<<\"\n", " eos_token: \"<|endoftext|>\"\n" ] } ], "source": [ "!cat examples/falcon/config-40b-qlora.yml" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "3EeOND66dDnF" }, "outputs": [], "source": [ "!ds_report" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "jnz7AF94LJTa" }, "outputs": [], "source": [ "#%%writefile examples/falcon/config-40b-qlora.yml" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "xR8QPcw3DY7K", "scrolled": true, "outputId": "89641200-ca18-4720-c942-4836c079f203" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Setting ds_accelerator to cuda (auto detect)\n", "The following values were not passed to `accelerate launch` and had defaults used instead:\n", "\t`--num_processes` was set to a value of `2`\n", "\t\tMore than one GPU was found, enabling multi-GPU training.\n", "\t\tIf this was unintended please pass in `--num_processes=1`.\n", "\t`--num_machines` was set to a value of `1`\n", "\t`--mixed_precision` was set to a value of `'no'`\n", "\t`--dynamo_backend` was set to a value of `'no'`\n", "To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.\n", "Setting ds_accelerator to cuda (auto detect)\n", "Setting ds_accelerator to cuda (auto detect)\n", "\n", "===================================BUG REPORT===================================\n", "Welcome to bitsandbytes. For bug reports, please run\n", "\n", "python -m bitsandbytes\n", "\n", " and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n", "================================================================================\n", "\n", "===================================BUG REPORT===================================\n", "Welcome to bitsandbytes. For bug reports, please run\n", "\n", "python -m bitsandbytes\n", "\n", " and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues\n", "================================================================================\n", "bin /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//matplotlib_inline.backend_inline'), PosixPath('module')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDvnE4umHheXhWsDJbbukYvvyc47/mC4z8syS93btA72T90WDrQagOy5O+DrhdXOvr5i/JwsTlAImy57eLRrtRFOrQq73jyi7Dzo0tvrAiNLVgX2q2dFLoplRyXDXiVYLPmPieMWQOeUCLeSb8FC5zzllcocZwjMXpxScDerZqnlAR0ccpSkGyKIod4ZMkn/29A/C5kHEb/wT8cOAq+MWJ/2okZZgbiR0AMV4DynAkrtcx9JnJnTs9chiMyH+dyCS42Ai24sHWJBkQo6TfxXkyKo9GOpu3Y2WLgrHyaot9Lk5mA1mujyIWdlReD2nvjeCQKjl3KW3xZ73m4nD97MydWSWoJfEWlr+VZvk8EWsZk3CYLZCIBLdod6xXJJ0DD0pvTIq11c8VB7XkgVjapuU/sC8M6HFzHW/NBeE+xX/txPkZkIGqrnxeQ0AtBXdN9ukyNGhGzTkPYJNliiYpY0dCvVuz/BJ2FawFTQGnD1EHOenUCRajREFGCbKoYZqi40j8= utensil@Utensils-MacBook-Pro.local')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_av2jw5za/none_skuvdgy8/attempt_0/0/error.json')}\n", " warn(msg)\n", "CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.\n", "Either way, this might cause trouble in the future:\n", "If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.\n", " warn(msg)\n", "CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0\n", "CUDA SETUP: Highest compute capability among GPUs detected: 8.0\n", "CUDA SETUP: Detected CUDA version 118\n", "CUDA SETUP: Loading binary /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so...\n", "bin /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('//matplotlib_inline.backend_inline'), PosixPath('module')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABgQDvnE4umHheXhWsDJbbukYvvyc47/mC4z8syS93btA72T90WDrQagOy5O+DrhdXOvr5i/JwsTlAImy57eLRrtRFOrQq73jyi7Dzo0tvrAiNLVgX2q2dFLoplRyXDXiVYLPmPieMWQOeUCLeSb8FC5zzllcocZwjMXpxScDerZqnlAR0ccpSkGyKIod4ZMkn/29A/C5kHEb/wT8cOAq+MWJ/2okZZgbiR0AMV4DynAkrtcx9JnJnTs9chiMyH+dyCS42Ai24sHWJBkQo6TfxXkyKo9GOpu3Y2WLgrHyaot9Lk5mA1mujyIWdlReD2nvjeCQKjl3KW3xZ73m4nD97MydWSWoJfEWlr+VZvk8EWsZk3CYLZCIBLdod6xXJJ0DD0pvTIq11c8VB7XkgVjapuU/sC8M6HFzHW/NBeE+xX/txPkZkIGqrnxeQ0AtBXdN9ukyNGhGzTkPYJNliiYpY0dCvVuz/BJ2FawFTQGnD1EHOenUCRajREFGCbKoYZqi40j8= utensil@Utensils-MacBook-Pro.local')}\n", " warn(msg)\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_av2jw5za/none_skuvdgy8/attempt_0/1/error.json')}\n", " warn(msg)\n", "CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...\n", "/root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.\n", "Either way, this might cause trouble in the future:\n", "If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.\n", " warn(msg)\n", "CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so\n", "CUDA SETUP: Highest compute capability among GPUs detected: 8.0\n", "CUDA SETUP: Detected CUDA version 118\n", "CUDA SETUP: Loading binary /root/miniconda3/envs/py3.9/lib/python3.9/site-packages/bitsandbytes-0.39.0-py3.9.egg/bitsandbytes/libbitsandbytes_cuda118.so...\n", "WARNING:root:`trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.\n", "INFO:root:loading tokenizer... /content/llm-playground/models/tiiuae_falcon-40b\n", "WARNING:root:`trust_remote_code` is set to true. Please make sure that you reviewed the remote code/model.\n", "INFO:root:loading tokenizer... /content/llm-playground/models/tiiuae_falcon-40b\n", "Using bos_token, but it is not set yet.\n", "Using pad_token, but it is not set yet.\n", "Using unk_token, but it is not set yet.\n", "INFO:root:Checking for packed prepared dataset from hub... utensil/216aa4e2be3974d0a1d25d8c19db01a4\n", "Using bos_token, but it is not set yet.\n", "Using pad_token, but it is not set yet.\n", "Using unk_token, but it is not set yet.\n", "INFO:root:Checking for packed prepared dataset from hub... utensil/216aa4e2be3974d0a1d25d8c19db01a4\n", "Downloading readme: 100%|██████████████████████| 471/471 [00:00<00:00, 2.06MB/s]\n", "Downloading and preparing dataset None/None to /root/.cache/huggingface/datasets/utensil___parquet/utensil--216aa4e2be3974d0a1d25d8c19db01a4-be603db1612da2a9/0.0.0/14a00e99c0d15a23649d0db8944380ac81082d4b021f398733dd84f3a6c569a7...\n", "Downloading data files: 0%| | 0/1 [00:00 1 {print $9}'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "XK8IWmrtKvlJ" }, "outputs": [], "source": [ "# ls -lhta |grep checkpoint- | awk 'NR > 1 {print $9}' | xargs rm -rf" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "DgyQDH-YDY7R" }, "outputs": [], "source": [ "!python /workspace/llm-playground/helper/storage.py utensil/axolotl-trained /content/ -u" ] }, { "cell_type": "markdown", "metadata": { "id": "H5TnMCeLDY7M" }, "source": [ "## Below are ad hoc cells handling issues during training\n", "\n", "current out dir:\n", "\n", "```\n", "/content/axolotl-trained/falcon-qlora-40b-gsm8k/\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "UdCBx0UZDY7M" }, "source": [ "### Force release VRAM" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "_0VdzsqSDY7M" }, "outputs": [], "source": [ "# First interupt the kernel, wait a few seconds then run this to kill finetune to release VRAM\n", "!ps aux|grep python|grep finetune|awk '{print $2}'|xargs kill" ] }, { "cell_type": "markdown", "metadata": { "id": "bT7StF_PDY7N" }, "source": [ "### Clean the finetuned model and all checkpoints" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "FT4c-7KHDY7N" }, "outputs": [], "source": [ "# Only run this to start over\n", "!rm -rf /content/axolotl-trained/falcon-qlora-40b-gsm8k/" ] }, { "cell_type": "markdown", "metadata": { "id": "p6s4xi9jDY7N" }, "source": [ "### Zip the prepared dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "MZ-aBbwEDY7N" }, "outputs": [], "source": [ "!apt install zip\n", "!zip -r last_run_prepared.zip -xi last_run_prepared" ] }, { "cell_type": "markdown", "metadata": { "id": "8BFC20jMDY7N" }, "source": [ "### Monitoring GPU" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "MgPaXew2DY7O" }, "outputs": [], "source": [ "# Run this in a seperate terminal\n", "!nvitop -m full" ] }, { "cell_type": "markdown", "metadata": { "id": "pYFlPnbXDY7O" }, "source": [ "### Fix DISK FULL" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yv5CWZu7DY7O" }, "outputs": [], "source": [ "%cd /" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wvqYxcpZDY7P" }, "outputs": [], "source": [ "!du -d 2 -h|grep G" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "tIIJxrv1DY7P" }, "outputs": [], "source": [ "!du -d 2 -h /root/.local" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "25FTebpHDY7P" }, "outputs": [], "source": [ "!rm -rf /root/.local/share/Trash/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "8K4d-2ocDY7P" }, "outputs": [], "source": [ "!rm -rf /root/.local/share/wandb/" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "UtOoZLugDY7P" }, "outputs": [], "source": [ "!rm -rf /root/.cache/wandb/" ] }, { "cell_type": "markdown", "metadata": { "id": "17SexQnqDY7P" }, "source": [ "### Check who is using GPU" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "kE8A5crNDY7P", "scrolled": true }, "outputs": [], "source": [ "!apt install lsof" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "dwNCASirDY7Q", "scrolled": true }, "outputs": [], "source": [ "!lsof /dev/nvidia*" ] }, { "cell_type": "markdown", "metadata": { "id": "gxZv_qjgDY7Y" }, "source": [ "### A new bash without tmux etc." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NDkJAXwTDY7Z" }, "outputs": [], "source": [ "!bash --norc --noprofile" ] }, { "cell_type": "markdown", "metadata": { "id": "pR3s54THDY7Z" }, "source": [ "### Clean up all checkpoints but last one" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "4fSoQ8hteEDX" }, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "SJtsGJ0oeEDX", "outputId": "315d5568-a7d3-4e26-9a7c-a6b2fec2c7dc" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "checkpoint-10\n" ] } ], "source": [ "!cd /content/axolotl-trained/falcon-qlora-40b-gsm8k/ && ls -lhta |grep checkpoint- | awk '{print $9}'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "LdO1EybEDY7Z" }, "outputs": [], "source": [ "!cd /content/axolotl-trained/falcon-qlora-40b-gsm8k/ && ls -lhta |grep checkpoint- | awk 'NR > 1 {print $9}' | xargs rm -rf" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "yyk_zXK5eEDX" }, "outputs": [], "source": [ "#@title Keep this tab alive to prevent Colab from disconnecting you { display-mode: \"form\" }\n", "\n", "#@markdown Press play on the music player that will appear below:\n", "%%html\n", "