{ "cells": [ { "cell_type": "markdown", "id": "cfe5ea27636d539", "metadata": {}, "source": [ "# Examples\n", "\n", "## Example #1\n", "In this example we will use **Fictitious Play** to solve the **Beach Bar** environment. We will set the ``verbose`` parameter to 5 to view a status printout every five iterations." ] }, { "cell_type": "code", "execution_count": null, "id": "209016b743e68ca0", "metadata": { "jupyter": { "is_executing": true } }, "outputs": [], "source": [ "from mfglib.env import Environment\n", "from mfglib.alg import FictitiousPlay\n", "\n", "env = Environment.beach_bar()\n", "alg = FictitiousPlay(alpha=0.13)\n", "\n", "_ = alg.solve(env, verbose=True, print_every=5)" ] }, { "cell_type": "markdown", "id": "cc9aabd15159f61e", "metadata": {}, "source": [ "## Example #2\n", "\n", "In this example we will demonstrate how to use the tuning API to identify optimal hyperparameters for **MF-OMI** on the **Beach Bar** environment. While we will use **MF-OMI** in this demonstration, it is important to note that **all** algorithms may benefit from tuning." ] }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "import matplotlib.pyplot as plt\n", "import optuna\n", "\n", "from mfglib.env import Environment\n", "from mfglib.alg import OccupationMeasureInclusion\n", "from mfglib.tuning import GeometricMean\n", "\n", "# By default, optuna displays logs. This silences them.\n", "optuna.logging.set_verbosity(optuna.logging.WARNING)\n", "\n", "env = Environment.beach_bar()\n", "\n", "# initialize algorithm\n", "alg_orig = OccupationMeasureInclusion(alpha=0.09)\n", "\n", "# note; atol=rtol=None would turn off early stop and let iterations continue to max_iter\n", "_, expls_orig, _ = alg_orig.solve(env, atol=1e-8, rtol=1e-8)\n", "\n", "# tune() returns an optuna.Study object; can use solve_kwargs to match tuner inner solve behavior and final solve behavior\n", "study = alg_orig.tune(metric=GeometricMean(shift=0.5), envs=[env], n_trials=40, solve_kwargs={\"atol\":1e-8, \"rtol\": 1e-8})\n", "\n", "# which we can use to initialize a new instance\n", "alg_tuned = alg_orig.from_study(study)\n", "\n", "_, expls_tuned, _ = alg_tuned.solve(env, atol=1e-8, rtol=1e-8)\n", "\n", "plt.xlabel(\"Iteration\")\n", "plt.ylabel(\"Exploitability\")\n", "plt.plot(expls_orig, label=\"Original\")\n", "plt.plot(expls_tuned, label=\"Tuned\")\n", "plt.semilogy()\n", "plt.grid()\n", "plt.legend();" ], "id": "d4a9eb664b8dd089" }, { "cell_type": "markdown", "id": "6a9ea984d962a05c", "metadata": {}, "source": [ "## Example #3\n", "In this example we will use **Online Mirror Descent** to solve the **Building Evacuation** and visualize how the population behaves under the optimal policy found." ] }, { "metadata": {}, "cell_type": "code", "outputs": [], "execution_count": null, "source": [ "import numpy as np\n", "\n", "from mfglib.env import Environment\n", "from mfglib.alg import OnlineMirrorDescent\n", "from mfglib.utils import mean_field_from_policy\n", "\n", "N_TIMESTEPS = 20\n", "N_FLOORS = 3\n", "SIZE = 8\n", "\n", "env = Environment.building_evacuation(\n", " T=N_TIMESTEPS,\n", " n_floor=N_FLOORS,\n", " floor_l=SIZE,\n", " floor_w=SIZE,\n", " eta=0.1,\n", ")\n", "alg = OnlineMirrorDescent(alpha=5e-3)\n", "\n", "pis, expls, _ = alg.solve(env, atol=None, rtol=None, max_iter=1000)\n", "\n", "# Identify the index of the \"best\" policy, as measured by exploitability\n", "i = np.argmin(expls)\n", "\n", "# Compute the corresponding mean-field and state marginal\n", "L_i = mean_field_from_policy(pis[i], env=env)\n", "mu_i = L_i.sum(dim=-1)" ], "id": "4ca5142ab964ffb9" }, { "cell_type": "markdown", "id": "fed713788aafa45f", "metadata": {}, "source": [ "For any state $s = (x, y, z) \\in \\mathbb{Z}_+^3$ representing a position in the building ($z$ being the height) let\n", "\n", "$$\\mu_t^i(s) = \\mu_t^i(x, y, z) = \\sum_{a \\in \\mathcal{A}} L_t^{\\pi^i}(x, y, z, a)$$\n", "\n", "denote the population state marginal at time $t$ of the induced mean-field $L^{\\pi^i}$ where $\\pi^i$ is the $i$'th policy iterate. Below we plot the population's state distribution at various times $t$ as a heatmap." ] }, { "cell_type": "code", "execution_count": null, "id": "1e6d2d5e2523fdaa", "metadata": {}, "outputs": [], "source": [ "from matplotlib.colors import LogNorm\n", "\n", "T_PNTS = [0, N_TIMESTEPS // 2, N_TIMESTEPS]\n", "\n", "vmin = mu_i[T_PNTS].min().item()\n", "vmax = mu_i[T_PNTS].max().item()\n", "\n", "norm = LogNorm(vmin=vmin + 1e-12, vmax=vmax, clip=True)\n", "\n", "fig, axs = plt.subplots(\n", " nrows=len(T_PNTS), ncols=N_FLOORS, sharex=True, sharey=True, layout=\"constrained\"\n", ")\n", "ims = []\n", "for j, t in enumerate(T_PNTS):\n", " for z in range(N_FLOORS):\n", " im = axs[j, z].imshow(mu_i[t, z], norm=norm)\n", " ims += [im]\n", " axs[j, z].set_xlabel(\"x\")\n", " axs[j, z].set_ylabel(\"y\")\n", " axs[j, z].set_title(rf\"$\\mu_{{{t}}}^i(x, y, {z})$\")\n", "\n", "fig.colorbar(ims[0], ax=axs.ravel().tolist())\n", "\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "48898701ab8ff317", "metadata": {}, "source": [ "On the top line we see the population evenly distributed throughout the building. In the second line we see the population making their way to the stairs. And the bottom line indicates that the building is fully evacuated at time $t = 20$." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.22" } }, "nbformat": 4, "nbformat_minor": 5 }