{ "cells": [ { "cell_type": "markdown", "id": "4f7a13de", "metadata": {}, "source": [ "## Overview\n", "\n", "In `Nb.2.2ma` I had \"[first structure error](https://discuss.ray.io/t/multi-agent-where-does-the-first-structure-comes-from/7010)\" and I want to check what exactly was wrong by checking this [example](https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_different_spaces_for_agents.py) :)\n", "\n", "```\n", "Example showing how one can create a multi-agent env, in which the different agents\n", "have different observation and action spaces.\n", "These spaces do NOT necessarily have to be specified manually by the user. Instead,\n", "RLlib will try to automatically infer them from the env provided spaces dicts\n", "(agentID -> obs/act space) and the policy mapping fn (mapping agent IDs to policy IDs).\n", "---\n", "Run this example with defaults (using Tune):\n", " $ python multi_agent_different_spaces_for_agents.py\n", "```" ] }, { "cell_type": "markdown", "id": "cb7047bd", "metadata": {}, "source": [ "### Boiler Plate" ] }, { "cell_type": "code", "execution_count": 1, "id": "767ca024", "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "id": "280574b2", "metadata": {}, "source": [ "## Import" ] }, { "cell_type": "code", "execution_count": 2, "id": "255ee9cf", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\tqdm\\auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n", " from .autonotebook import tqdm as notebook_tqdm\n" ] } ], "source": [ "import argparse\n", "import gym\n", "import os\n", "\n", "import ray\n", "from ray import tune # air, \n", "from ray.rllib.env.multi_agent_env import MultiAgentEnv" ] }, { "cell_type": "code", "execution_count": 3, "id": "df446868", "metadata": {}, "outputs": [], "source": [ "from ray.rllib.agents import ppo\n", "from ray.tune.registry import register_env" ] }, { "cell_type": "markdown", "id": "12ed70ef", "metadata": {}, "source": [ "## Environment" ] }, { "cell_type": "code", "execution_count": 4, "id": "b2109c70", "metadata": {}, "outputs": [], "source": [ "class BasicMultiAgentMultiSpaces(MultiAgentEnv):\n", " \"\"\"A simple multi-agent example environment where agents have different spaces.\n", " agent0: obs=(10,), act=Discrete(2)\n", " agent1: obs=(20,), act=Discrete(3)\n", " The logic of the env doesn't really matter for this example. The point of this env\n", " is to show how one can use multi-agent envs, in which the different agents utilize\n", " different obs- and action spaces.\n", " \"\"\"\n", "\n", " def __init__(self, config=None):\n", " self.agents = {\"agent0\", \"agent1\"}\n", " self._agent_ids = set(self.agents)\n", "\n", " self.dones = set()\n", "\n", " # Provide full (preferred format) observation- and action-spaces as Dicts\n", " # mapping agent IDs to the individual agents' spaces.\n", " self._spaces_in_preferred_format = True\n", " self.observation_space = gym.spaces.Dict(\n", " {\n", " \"agent0\": gym.spaces.Box(low=-1.0, high=1.0, shape=(10,)),\n", " \"agent1\": gym.spaces.Box(low=-1.0, high=1.0, shape=(20,)),\n", " }\n", " )\n", " self.action_space = gym.spaces.Dict(\n", " {\"agent0\": gym.spaces.Discrete(2), \"agent1\": gym.spaces.Discrete(3)}\n", " )\n", "\n", " super().__init__()\n", "\n", " def reset(self):\n", " self.dones = set()\n", " return {i: self.observation_space[i].sample() for i in self.agents}\n", "\n", " def step(self, action_dict):\n", " obs, rew, done, info = {}, {}, {}, {}\n", " for i, action in action_dict.items():\n", " obs[i] = self.observation_space[i].sample()\n", " rew[i] = 1.0\n", " done[i] = False\n", " info[i] = {}\n", " done[\"__all__\"] = len(self.dones) == len(self.agents)\n", " return obs, rew, done, info" ] }, { "cell_type": "markdown", "id": "a7cdebd0", "metadata": {}, "source": [ "## RLlib" ] }, { "cell_type": "code", "execution_count": 5, "id": "00a502f8", "metadata": {}, "outputs": [], "source": [ "env_config = {}" ] }, { "cell_type": "code", "execution_count": 6, "id": "251c25f3", "metadata": {}, "outputs": [], "source": [ "config={\n", " \"env\": BasicMultiAgentMultiSpaces,\n", "# \"env_config\": env_config,\n", " # Use GPUs iff `RLLIB_NUM_GPUS` env var set to > 0.\n", " \"num_gpus\": int(os.environ.get(\"RLLIB_NUM_GPUS\", \"0\")),\n", " \"num_workers\": 1,\n", " \"multiagent\": {\n", " # Use a simple set of policy IDs. Spaces for the individual policies\n", " # will be inferred automatically using reverse lookup via the\n", " # `policy_mapping_fn` and the env provided spaces for the different\n", " # agents. Alternatively, you could use:\n", " # policies: {main0: PolicySpec(...), main1: PolicySpec}\n", " \"policies\": {\"main0\", \"main1\"},\n", " # Simple mapping fn, mapping agent0 to main0 and agent1 to main1.\n", " \"policy_mapping_fn\": (\n", " lambda aid, episode, worker, **kw: f\"main{aid[-1]}\"\n", " ),\n", " # Only train main0.\n", " \"policies_to_train\": [\"main0\"],\n", " },\n", " \"framework\": \"torch\", # torch tf tf2\n", " \"eager_tracing\": \"store_true\",\n", "}" ] }, { "cell_type": "code", "execution_count": 7, "id": "d4fc197b", "metadata": {}, "outputs": [], "source": [ "agent_config = ppo.DEFAULT_CONFIG.copy()\n", "agent_config.update(config)" ] }, { "cell_type": "markdown", "id": "81af49e9", "metadata": {}, "source": [ "## Trainig" ] }, { "cell_type": "code", "execution_count": 8, "id": "ed7597c4", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "RayContext(dashboard_url=None, python_version='3.8.12', ray_version='1.12.1', ray_commit='4863e33856b54ccf8add5cbe75e41558850a1b75', address_info={'node_ip_address': '127.0.0.1', 'raylet_ip_address': '127.0.0.1', 'redis_address': None, 'object_store_address': 'tcp://127.0.0.1:64403', 'raylet_socket_name': 'tcp://127.0.0.1:57948', 'webui_url': None, 'session_dir': 'C:\\\\Users\\\\milos\\\\AppData\\\\Local\\\\Temp\\\\ray\\\\session_2022-08-03_09-43-00_258476_15284', 'metrics_export_port': 56566, 'gcs_address': '127.0.0.1:57990', 'address': '127.0.0.1:57990', 'node_id': '605602b8041dae54fb0356d4e3ef35120be096c1fdd1e69b98f68f85'})" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# (Re)Start the ray runtime.\n", "if ray.is_initialized():\n", " ray.shutdown()\n", "ray.init(include_dashboard=False, ignore_reinit_error=True)" ] }, { "cell_type": "code", "execution_count": 9, "id": "636b41c7", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\tune\\trial_runner.py:321: UserWarning: fail_fast='raise' detected. Be careful when using this mode as resources (such as Ray processes, file descriptors, and temporary files) may not be cleaned up properly. To use a safer mode, use fail_fast=True.\n", " warnings.warn(\n", "2022-08-03 09:43:04,761\tINFO trial_runner.py:803 -- starting PPOTrainer_BasicMultiAgentMultiSpaces_e890c_00000\n", "2022-08-03 09:43:04,818\tERROR syncer.py:119 -- Log sync requires rsync to be installed.\n", "\u001b[2m\u001b[36m(PPOTrainer pid=2020)\u001b[0m 2022-08-03 09:43:09,263\tINFO ppo.py:268 -- In multi-agent mode, policies will be optimized sequentially by the multi-GPU optimizer. Consider setting simple_optimizer=True if this doesn't work for you.\n", "\u001b[2m\u001b[36m(PPOTrainer pid=2020)\u001b[0m 2022-08-03 09:43:09,264\tINFO trainer.py:864 -- Current log_level is WARN. For more information, set 'log_level': 'INFO' / 'DEBUG' or use the -v and -vv flags.\n", "\u001b[2m\u001b[36m(RolloutWorker pid=13316)\u001b[0m 2022-08-03 09:43:13,216\tWARNING rollout_worker.py:498 -- We've added a module for checking environments that are used in experiments. It will cause your environment to fail if your environment is not set upcorrectly. You can disable check env by setting `disable_env_checking` to True in your experiment config dictionary. You can run the environment checking module standalone by calling ray.rllib.utils.check_env(env).\n", "2022-08-03 09:43:14,479\tERROR trial_runner.py:872 -- Trial PPOTrainer_BasicMultiAgentMultiSpaces_e890c_00000: Error processing event.\n", "\u001b[2m\u001b[36m(PPOTrainer pid=2020)\u001b[0m 2022-08-03 09:43:14,471\tWARNING trainer.py:1083 -- Worker crashed during call to `step_attempt()`. To try to continue training without the failed worker, set `ignore_worker_failures=True`.\n" ] }, { "ename": "RayTaskError(ValueError)", "evalue": "\u001b[36mray::PPOTrainer.train()\u001b[39m (pid=2020, ip=127.0.0.1, repr=PPOTrainer)\n File \"python\\ray\\_raylet.pyx\", line 663, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 667, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 614, in ray._raylet.execute_task.function_executor\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\_private\\function_manager.py\", line 701, in actor_method_executor\n return method(__ray_actor, *args, **kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\tune\\trainable.py\", line 349, in train\n result = self.step()\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\agents\\trainer.py\", line 1088, in step\n raise e\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\agents\\trainer.py\", line 1074, in step\n step_attempt_results = self.step_attempt()\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\agents\\trainer.py\", line 1155, in step_attempt\n step_results = self._exec_plan_or_training_iteration_fn()\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\agents\\trainer.py\", line 2174, in _exec_plan_or_training_iteration_fn\n results = next(self.train_exec_impl)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 779, in __next__\n return next(self.built_iterator)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 869, in apply_filter\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 869, in apply_filter\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n [Previous line repeated 1 more time]\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 904, in apply_flatten\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n [Previous line repeated 1 more time]\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 492, in base_iterator\n yield ray.get(futures, timeout=timeout)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\_private\\client_mode_hook.py\", line 105, in wrapper\n return func(*args, **kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\worker.py\", line 1809, in get\n raise value.as_instanceof_cause()\nray.exceptions.RayTaskError(ValueError): \u001b[36mray::RolloutWorker.par_iter_next()\u001b[39m (pid=13316, ip=127.0.0.1, repr=)\nValueError: The two structures don't have the same nested structure.\n\nFirst structure: type=ndarray str=[-0.6484811 -0.24129443 0.08446018 -0.60363895 -0.2602198 -0.5603006\n -0.61620265 0.35957032 0.97200704 -0.9894121 0.8001538 0.54838014\n 0.485548 0.69472945 -0.7663002 0.21971236 0.01151593 0.59717155\n 0.69245756 -0.2534561 ]\n\nSecond structure: type=OrderedDict str=OrderedDict([('agent0', array([ 0.665456 , 0.8671211 , -0.80171853, 0.930409 , -0.9449052 ,\n -0.21203883, -0.2598595 , 0.6611393 , -0.6856699 , 0.7166635 ],\n dtype=float32)), ('agent1', array([ 0.19255306, 0.8727036 , -0.09591831, -0.44907162, -0.47003892,\n 0.24288067, 0.03934164, 0.1409452 , -0.2482065 , 0.4656972 ,\n -0.62981224, -0.955172 , 0.5294034 , 0.34777784, -0.840635 ,\n -0.9268064 , 0.8727926 , 0.16473597, -0.2449859 , 0.0885732 ],\n dtype=float32))])\n\nMore specifically: Substructure \"type=OrderedDict str=OrderedDict([('agent0', array([ 0.665456 , 0.8671211 , -0.80171853, 0.930409 , -0.9449052 ,\n -0.21203883, -0.2598595 , 0.6611393 , -0.6856699 , 0.7166635 ],\n dtype=float32)), ('agent1', array([ 0.19255306, 0.8727036 , -0.09591831, -0.44907162, -0.47003892,\n 0.24288067, 0.03934164, 0.1409452 , -0.2482065 , 0.4656972 ,\n -0.62981224, -0.955172 , 0.5294034 , 0.34777784, -0.840635 ,\n -0.9268064 , 0.8727926 , 0.16473597, -0.2449859 , 0.0885732 ],\n dtype=float32))])\" is a sequence, while substructure \"type=ndarray str=[-0.6484811 -0.24129443 0.08446018 -0.60363895 -0.2602198 -0.5603006\n -0.61620265 0.35957032 0.97200704 -0.9894121 0.8001538 0.54838014\n 0.485548 0.69472945 -0.7663002 0.21971236 0.01151593 0.59717155\n 0.69245756 -0.2534561 ]\" is not\n\nDuring handling of the above exception, another exception occurred:\n\n\u001b[36mray::RolloutWorker.par_iter_next()\u001b[39m (pid=13316, ip=127.0.0.1, repr=)\n File \"python\\ray\\_raylet.pyx\", line 656, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 697, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 663, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 667, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 614, in ray._raylet.execute_task.function_executor\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\_private\\function_manager.py\", line 701, in actor_method_executor\n return method(__ray_actor, *args, **kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 1186, in par_iter_next\n return next(self.local_it)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\rollout_worker.py\", line 404, in gen_rollouts\n yield self.sample()\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\rollout_worker.py\", line 815, in sample\n batches = [self.input_reader.next()]\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\sampler.py\", line 116, in next\n batches = [self.get_data()]\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\sampler.py\", line 289, in get_data\n item = next(self._env_runner)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\sampler.py\", line 679, in _env_runner\n active_envs, to_eval, outputs = _process_observations(\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\sampler.py\", line 906, in _process_observations\n prep_obs = preprocessor.transform(raw_obs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\models\\preprocessors.py\", line 282, in transform\n self.check_shape(observation)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\models\\preprocessors.py\", line 69, in check_shape\n observation = convert_element_to_space_type(\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\utils\\spaces\\space_utils.py\", line 344, in convert_element_to_space_type\n return tree.map_structure(map_, element, sampled_element, check_types=False)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\tree\\__init__.py\", line 428, in map_structure\n assert_same_structure(structures[0], other, check_types=check_types)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\tree\\__init__.py\", line 284, in assert_same_structure\n raise type(e)(\"%s\\n\"\nValueError: The two structures don't have the same nested structure.\n\nFirst structure: type=ndarray str=[-0.6484811 -0.24129443 0.08446018 -0.60363895 -0.2602198 -0.5603006\n -0.61620265 0.35957032 0.97200704 -0.9894121 0.8001538 0.54838014\n 0.485548 0.69472945 -0.7663002 0.21971236 0.01151593 0.59717155\n 0.69245756 -0.2534561 ]\n\nSecond structure: type=OrderedDict str=OrderedDict([('agent0', array([ 0.665456 , 0.8671211 , -0.80171853, 0.930409 , -0.9449052 ,\n -0.21203883, -0.2598595 , 0.6611393 , -0.6856699 , 0.7166635 ],\n dtype=float32)), ('agent1', array([ 0.19255306, 0.8727036 , -0.09591831, -0.44907162, -0.47003892,\n 0.24288067, 0.03934164, 0.1409452 , -0.2482065 , 0.4656972 ,\n -0.62981224, -0.955172 , 0.5294034 , 0.34777784, -0.840635 ,\n -0.9268064 , 0.8727926 , 0.16473597, -0.2449859 , 0.0885732 ],\n dtype=float32))])\n\nMore specifically: Substructure \"type=OrderedDict str=OrderedDict([('agent0', array([ 0.665456 , 0.8671211 , -0.80171853, 0.930409 , -0.9449052 ,\n -0.21203883, -0.2598595 , 0.6611393 , -0.6856699 , 0.7166635 ],\n dtype=float32)), ('agent1', array([ 0.19255306, 0.8727036 , -0.09591831, -0.44907162, -0.47003892,\n 0.24288067, 0.03934164, 0.1409452 , -0.2482065 , 0.4656972 ,\n -0.62981224, -0.955172 , 0.5294034 , 0.34777784, -0.840635 ,\n -0.9268064 , 0.8727926 , 0.16473597, -0.2449859 , 0.0885732 ],\n dtype=float32))])\" is a sequence, while substructure \"type=ndarray str=[-0.6484811 -0.24129443 0.08446018 -0.60363895 -0.2602198 -0.5603006\n -0.61620265 0.35957032 0.97200704 -0.9894121 0.8001538 0.54838014\n 0.485548 0.69472945 -0.7663002 0.21971236 0.01151593 0.59717155\n 0.69245756 -0.2534561 ]\" is not\nEntire first structure:\n.\nEntire second structure:\nOrderedDict([('agent0', .), ('agent1', .)])", "output_type": "error", "traceback": [ "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", "\u001b[1;31mRayTaskError(ValueError)\u001b[0m Traceback (most recent call last)", "File \u001b[1;32m:1\u001b[0m, in \u001b[0;36m\u001b[1;34m\u001b[0m\n", "File \u001b[1;32m~\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\tune\\tune.py:672\u001b[0m, in \u001b[0;36mrun\u001b[1;34m(run_or_experiment, name, metric, mode, stop, time_budget_s, config, resources_per_trial, num_samples, local_dir, search_alg, scheduler, keep_checkpoints_num, checkpoint_score_attr, checkpoint_freq, checkpoint_at_end, verbose, progress_reporter, log_to_file, trial_name_creator, trial_dirname_creator, sync_config, export_formats, max_failures, fail_fast, restore, server_port, resume, reuse_actors, trial_executor, raise_on_failed_trial, callbacks, max_concurrent_trials, _experiment_checkpoint_dir, queue_trials, loggers, _remote)\u001b[0m\n\u001b[0;32m 670\u001b[0m progress_reporter\u001b[38;5;241m.\u001b[39mset_start_time(tune_start)\n\u001b[0;32m 671\u001b[0m \u001b[38;5;28;01mwhile\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m runner\u001b[38;5;241m.\u001b[39mis_finished() \u001b[38;5;129;01mand\u001b[39;00m \u001b[38;5;129;01mnot\u001b[39;00m state[signal\u001b[38;5;241m.\u001b[39mSIGINT]:\n\u001b[1;32m--> 672\u001b[0m \u001b[43mrunner\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mstep\u001b[49m\u001b[43m(\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 673\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m has_verbosity(Verbosity\u001b[38;5;241m.\u001b[39mV1_EXPERIMENT):\n\u001b[0;32m 674\u001b[0m _report_progress(runner, progress_reporter)\n", "File \u001b[1;32m~\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\tune\\trial_runner.py:767\u001b[0m, in \u001b[0;36mTrialRunner.step\u001b[1;34m(self)\u001b[0m\n\u001b[0;32m 761\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_callbacks\u001b[38;5;241m.\u001b[39mon_step_begin(\n\u001b[0;32m 762\u001b[0m iteration\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_iteration, trials\u001b[38;5;241m=\u001b[39m\u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_trials\n\u001b[0;32m 763\u001b[0m )\n\u001b[0;32m 765\u001b[0m next_trial \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_update_trial_queue_and_get_next_trial()\n\u001b[1;32m--> 767\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_wait_and_handle_event\u001b[49m\u001b[43m(\u001b[49m\u001b[43mnext_trial\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 769\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_stop_experiment_if_needed()\n\u001b[0;32m 771\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n", "File \u001b[1;32m~\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\tune\\trial_runner.py:745\u001b[0m, in \u001b[0;36mTrialRunner._wait_and_handle_event\u001b[1;34m(self, next_trial)\u001b[0m\n\u001b[0;32m 743\u001b[0m \u001b[38;5;28;01mexcept\u001b[39;00m \u001b[38;5;167;01mException\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m e:\n\u001b[0;32m 744\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m e \u001b[38;5;129;01mis\u001b[39;00m TuneError \u001b[38;5;129;01mor\u001b[39;00m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_fail_fast \u001b[38;5;241m==\u001b[39m TrialRunner\u001b[38;5;241m.\u001b[39mRAISE:\n\u001b[1;32m--> 745\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m e\n\u001b[0;32m 746\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m 747\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m TuneError(traceback\u001b[38;5;241m.\u001b[39mformat_exc())\n", "File \u001b[1;32m~\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\tune\\trial_runner.py:730\u001b[0m, in \u001b[0;36mTrialRunner._wait_and_handle_event\u001b[1;34m(self, next_trial)\u001b[0m\n\u001b[0;32m 728\u001b[0m result \u001b[38;5;241m=\u001b[39m future_result\u001b[38;5;241m.\u001b[39mresult\n\u001b[0;32m 729\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m future_result\u001b[38;5;241m.\u001b[39mtype \u001b[38;5;241m==\u001b[39m ExecutorEventType\u001b[38;5;241m.\u001b[39mERROR:\n\u001b[1;32m--> 730\u001b[0m \u001b[38;5;28;43mself\u001b[39;49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43m_on_executor_error\u001b[49m\u001b[43m(\u001b[49m\u001b[43mtrial\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[43mresult\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 731\u001b[0m \u001b[38;5;28;01melif\u001b[39;00m future_result\u001b[38;5;241m.\u001b[39mtype \u001b[38;5;241m==\u001b[39m ExecutorEventType\u001b[38;5;241m.\u001b[39mRESTORING_RESULT:\n\u001b[0;32m 732\u001b[0m \u001b[38;5;28mself\u001b[39m\u001b[38;5;241m.\u001b[39m_on_restoring_result(trial)\n", "File \u001b[1;32m~\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\tune\\trial_runner.py:874\u001b[0m, in \u001b[0;36mTrialRunner._on_executor_error\u001b[1;34m(self, trial, result)\u001b[0m\n\u001b[0;32m 872\u001b[0m logger\u001b[38;5;241m.\u001b[39merror(error_msg)\n\u001b[0;32m 873\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(result[\u001b[38;5;241m0\u001b[39m], \u001b[38;5;167;01mException\u001b[39;00m)\n\u001b[1;32m--> 874\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m result[\u001b[38;5;241m0\u001b[39m]\n\u001b[0;32m 875\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m 876\u001b[0m logger\u001b[38;5;241m.\u001b[39mexception(error_msg)\n", "File \u001b[1;32m~\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\tune\\ray_trial_executor.py:901\u001b[0m, in \u001b[0;36mRayTrialExecutor.get_next_executor_event\u001b[1;34m(self, live_trials, next_trial_exists)\u001b[0m\n\u001b[0;32m 899\u001b[0m \u001b[38;5;28;01massert\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(trial, Trial)\n\u001b[0;32m 900\u001b[0m \u001b[38;5;28;01mtry\u001b[39;00m:\n\u001b[1;32m--> 901\u001b[0m future_result \u001b[38;5;241m=\u001b[39m \u001b[43mray\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43mget\u001b[49m\u001b[43m(\u001b[49m\u001b[43mready_future\u001b[49m\u001b[43m)\u001b[49m\n\u001b[0;32m 902\u001b[0m \u001b[38;5;66;03m# For local mode\u001b[39;00m\n\u001b[0;32m 903\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(future_result, _LocalWrapper):\n", "File \u001b[1;32m~\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\_private\\client_mode_hook.py:105\u001b[0m, in \u001b[0;36mclient_mode_hook..wrapper\u001b[1;34m(*args, **kwargs)\u001b[0m\n\u001b[0;32m 103\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m func\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m \u001b[38;5;241m!=\u001b[39m \u001b[38;5;124m\"\u001b[39m\u001b[38;5;124minit\u001b[39m\u001b[38;5;124m\"\u001b[39m \u001b[38;5;129;01mor\u001b[39;00m is_client_mode_enabled_by_default:\n\u001b[0;32m 104\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[38;5;28mgetattr\u001b[39m(ray, func\u001b[38;5;241m.\u001b[39m\u001b[38;5;18m__name__\u001b[39m)(\u001b[38;5;241m*\u001b[39margs, \u001b[38;5;241m*\u001b[39m\u001b[38;5;241m*\u001b[39mkwargs)\n\u001b[1;32m--> 105\u001b[0m \u001b[38;5;28;01mreturn\u001b[39;00m \u001b[43mfunc\u001b[49m\u001b[43m(\u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43margs\u001b[49m\u001b[43m,\u001b[49m\u001b[43m \u001b[49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[38;5;241;43m*\u001b[39;49m\u001b[43mkwargs\u001b[49m\u001b[43m)\u001b[49m\n", "File \u001b[1;32m~\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\worker.py:1809\u001b[0m, in \u001b[0;36mget\u001b[1;34m(object_refs, timeout)\u001b[0m\n\u001b[0;32m 1807\u001b[0m worker\u001b[38;5;241m.\u001b[39mcore_worker\u001b[38;5;241m.\u001b[39mdump_object_store_memory_usage()\n\u001b[0;32m 1808\u001b[0m \u001b[38;5;28;01mif\u001b[39;00m \u001b[38;5;28misinstance\u001b[39m(value, RayTaskError):\n\u001b[1;32m-> 1809\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m value\u001b[38;5;241m.\u001b[39mas_instanceof_cause()\n\u001b[0;32m 1810\u001b[0m \u001b[38;5;28;01melse\u001b[39;00m:\n\u001b[0;32m 1811\u001b[0m \u001b[38;5;28;01mraise\u001b[39;00m value\n", "\u001b[1;31mRayTaskError(ValueError)\u001b[0m: \u001b[36mray::PPOTrainer.train()\u001b[39m (pid=2020, ip=127.0.0.1, repr=PPOTrainer)\n File \"python\\ray\\_raylet.pyx\", line 663, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 667, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 614, in ray._raylet.execute_task.function_executor\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\_private\\function_manager.py\", line 701, in actor_method_executor\n return method(__ray_actor, *args, **kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\tune\\trainable.py\", line 349, in train\n result = self.step()\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\agents\\trainer.py\", line 1088, in step\n raise e\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\agents\\trainer.py\", line 1074, in step\n step_attempt_results = self.step_attempt()\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\agents\\trainer.py\", line 1155, in step_attempt\n step_results = self._exec_plan_or_training_iteration_fn()\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\agents\\trainer.py\", line 2174, in _exec_plan_or_training_iteration_fn\n results = next(self.train_exec_impl)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 779, in __next__\n return next(self.built_iterator)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 869, in apply_filter\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 869, in apply_filter\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n [Previous line repeated 1 more time]\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 904, in apply_flatten\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 807, in apply_foreach\n for item in it:\n [Previous line repeated 1 more time]\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 492, in base_iterator\n yield ray.get(futures, timeout=timeout)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\_private\\client_mode_hook.py\", line 105, in wrapper\n return func(*args, **kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\worker.py\", line 1809, in get\n raise value.as_instanceof_cause()\nray.exceptions.RayTaskError(ValueError): \u001b[36mray::RolloutWorker.par_iter_next()\u001b[39m (pid=13316, ip=127.0.0.1, repr=)\nValueError: The two structures don't have the same nested structure.\n\nFirst structure: type=ndarray str=[-0.6484811 -0.24129443 0.08446018 -0.60363895 -0.2602198 -0.5603006\n -0.61620265 0.35957032 0.97200704 -0.9894121 0.8001538 0.54838014\n 0.485548 0.69472945 -0.7663002 0.21971236 0.01151593 0.59717155\n 0.69245756 -0.2534561 ]\n\nSecond structure: type=OrderedDict str=OrderedDict([('agent0', array([ 0.665456 , 0.8671211 , -0.80171853, 0.930409 , -0.9449052 ,\n -0.21203883, -0.2598595 , 0.6611393 , -0.6856699 , 0.7166635 ],\n dtype=float32)), ('agent1', array([ 0.19255306, 0.8727036 , -0.09591831, -0.44907162, -0.47003892,\n 0.24288067, 0.03934164, 0.1409452 , -0.2482065 , 0.4656972 ,\n -0.62981224, -0.955172 , 0.5294034 , 0.34777784, -0.840635 ,\n -0.9268064 , 0.8727926 , 0.16473597, -0.2449859 , 0.0885732 ],\n dtype=float32))])\n\nMore specifically: Substructure \"type=OrderedDict str=OrderedDict([('agent0', array([ 0.665456 , 0.8671211 , -0.80171853, 0.930409 , -0.9449052 ,\n -0.21203883, -0.2598595 , 0.6611393 , -0.6856699 , 0.7166635 ],\n dtype=float32)), ('agent1', array([ 0.19255306, 0.8727036 , -0.09591831, -0.44907162, -0.47003892,\n 0.24288067, 0.03934164, 0.1409452 , -0.2482065 , 0.4656972 ,\n -0.62981224, -0.955172 , 0.5294034 , 0.34777784, -0.840635 ,\n -0.9268064 , 0.8727926 , 0.16473597, -0.2449859 , 0.0885732 ],\n dtype=float32))])\" is a sequence, while substructure \"type=ndarray str=[-0.6484811 -0.24129443 0.08446018 -0.60363895 -0.2602198 -0.5603006\n -0.61620265 0.35957032 0.97200704 -0.9894121 0.8001538 0.54838014\n 0.485548 0.69472945 -0.7663002 0.21971236 0.01151593 0.59717155\n 0.69245756 -0.2534561 ]\" is not\n\nDuring handling of the above exception, another exception occurred:\n\n\u001b[36mray::RolloutWorker.par_iter_next()\u001b[39m (pid=13316, ip=127.0.0.1, repr=)\n File \"python\\ray\\_raylet.pyx\", line 656, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 697, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 663, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 667, in ray._raylet.execute_task\n File \"python\\ray\\_raylet.pyx\", line 614, in ray._raylet.execute_task.function_executor\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\_private\\function_manager.py\", line 701, in actor_method_executor\n return method(__ray_actor, *args, **kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\iter.py\", line 1186, in par_iter_next\n return next(self.local_it)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\rollout_worker.py\", line 404, in gen_rollouts\n yield self.sample()\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\util\\tracing\\tracing_helper.py\", line 462, in _resume_span\n return method(self, *_args, **_kwargs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\rollout_worker.py\", line 815, in sample\n batches = [self.input_reader.next()]\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\sampler.py\", line 116, in next\n batches = [self.get_data()]\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\sampler.py\", line 289, in get_data\n item = next(self._env_runner)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\sampler.py\", line 679, in _env_runner\n active_envs, to_eval, outputs = _process_observations(\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\evaluation\\sampler.py\", line 906, in _process_observations\n prep_obs = preprocessor.transform(raw_obs)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\models\\preprocessors.py\", line 282, in transform\n self.check_shape(observation)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\models\\preprocessors.py\", line 69, in check_shape\n observation = convert_element_to_space_type(\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\ray\\rllib\\utils\\spaces\\space_utils.py\", line 344, in convert_element_to_space_type\n return tree.map_structure(map_, element, sampled_element, check_types=False)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\tree\\__init__.py\", line 428, in map_structure\n assert_same_structure(structures[0], other, check_types=check_types)\n File \"C:\\Users\\milos\\Anaconda3\\envs\\EnvRL\\lib\\site-packages\\tree\\__init__.py\", line 284, in assert_same_structure\n raise type(e)(\"%s\\n\"\nValueError: The two structures don't have the same nested structure.\n\nFirst structure: type=ndarray str=[-0.6484811 -0.24129443 0.08446018 -0.60363895 -0.2602198 -0.5603006\n -0.61620265 0.35957032 0.97200704 -0.9894121 0.8001538 0.54838014\n 0.485548 0.69472945 -0.7663002 0.21971236 0.01151593 0.59717155\n 0.69245756 -0.2534561 ]\n\nSecond structure: type=OrderedDict str=OrderedDict([('agent0', array([ 0.665456 , 0.8671211 , -0.80171853, 0.930409 , -0.9449052 ,\n -0.21203883, -0.2598595 , 0.6611393 , -0.6856699 , 0.7166635 ],\n dtype=float32)), ('agent1', array([ 0.19255306, 0.8727036 , -0.09591831, -0.44907162, -0.47003892,\n 0.24288067, 0.03934164, 0.1409452 , -0.2482065 , 0.4656972 ,\n -0.62981224, -0.955172 , 0.5294034 , 0.34777784, -0.840635 ,\n -0.9268064 , 0.8727926 , 0.16473597, -0.2449859 , 0.0885732 ],\n dtype=float32))])\n\nMore specifically: Substructure \"type=OrderedDict str=OrderedDict([('agent0', array([ 0.665456 , 0.8671211 , -0.80171853, 0.930409 , -0.9449052 ,\n -0.21203883, -0.2598595 , 0.6611393 , -0.6856699 , 0.7166635 ],\n dtype=float32)), ('agent1', array([ 0.19255306, 0.8727036 , -0.09591831, -0.44907162, -0.47003892,\n 0.24288067, 0.03934164, 0.1409452 , -0.2482065 , 0.4656972 ,\n -0.62981224, -0.955172 , 0.5294034 , 0.34777784, -0.840635 ,\n -0.9268064 , 0.8727926 , 0.16473597, -0.2449859 , 0.0885732 ],\n dtype=float32))])\" is a sequence, while substructure \"type=ndarray str=[-0.6484811 -0.24129443 0.08446018 -0.60363895 -0.2602198 -0.5603006\n -0.61620265 0.35957032 0.97200704 -0.9894121 0.8001538 0.54838014\n 0.485548 0.69472945 -0.7663002 0.21971236 0.01151593 0.59717155\n 0.69245756 -0.2534561 ]\" is not\nEntire first structure:\n.\nEntire second structure:\nOrderedDict([('agent0', .), ('agent1', .)])" ] } ], "source": [ "%%time\n", "\n", "analysis = tune.run(\n", " ppo.PPOTrainer,\n", " stop={\n", " \"training_iteration\": 10,\n", " \"timesteps_total\": 10_000,\n", " \"episode_reward_mean\": 80.0,\n", " },\n", " config=agent_config,\n", " # Milos\n", " verbose = 0, \n", " fail_fast = \"raise\", # for debugging!\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "d17676b8", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "EnvRL", "language": "python", "name": "envrl" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" } }, "nbformat": 4, "nbformat_minor": 5 }