API Documentation

class mfglib.env.Environment(*, T: int, S: tuple[int, ...], A: tuple[int, ...], mu0: Tensor, r_max: float, reward_fn: RewardFn, transition_fn: TransitionFn)

General environment class.

classmethod beach_bar(T: int = 2, n: int = 4, bar_loc: int = 2, log_eps: float = 1e-20, p_still: float = 0.5, mu0: Literal['uniform'] | Tensor = 'uniform') Environment

Beach Bar environment.

The beach bar process is a Markov Decision Process with \(|X|\) states disposed on a one dimensional torus (\(X = {0,..., |X|-1}\)), which represents a beach. A bar is located in one of the states. As the weather is very hot, players want to be as close as possible to the bar, while keeping away from too crowded areas.[#1bb]_

classmethod building_evacuation(T: int = 3, n_floor: int = 5, floor_l: int = 10, floor_w: int = 10, log_eps: float = 1e-20, eta: float = 1.0, evac_r: float = 10.0, mu0: Literal['uniform'] | Tensor = 'uniform') Environment

Building Evacuation environment.

In this problem, there is a multilevel building and each agent of the crowd wants to go downstairs as quickly as possible while favoring social distancing. At each floor, two staircases are located at two opposite corners, such as the crowd has to cross the whole floor to take the next staircase. Each agent can remain in place, move in the 4 directions (up, down, right, left) as well as go up or down when on a staircase location.[#be]_

classmethod conservative_treasure_hunting(T: int = 5, n: int = 3, r: tuple[float, ...] = (1.0, 1.0, 1.0), c: tuple[float, ...] = (1.0, 1.0, 1.0, 1.0, 1.0), mu0: Literal['uniform'] | Tensor = 'uniform') Environment

Conservative Treasure Hunting environment.

MF-OMO: An Optimization Formulation of Mean-Field Games Guo, X., Hu, A., & Zhang, J. (2022). arXiv:2206.09608.

classmethod crowd_motion(T: int = 3, torus_l: int = 20, torus_w: int = 20, loc_change_freq: int = 2, c: float = 10.0, log_eps: float = 1e-10, p_still: float = 0.5, seed: int = 0, mu0: Literal['uniform'] | Tensor = 'uniform') Environment

Crowd Motion environment.

An adaptation of Crowd Motion environment, which extends the Beach Bar environment in 2 dimensions, introduced in

Perolat, Julien, et al. “Scaling up mean field games with online mirror descent.” arXiv preprint arXiv:2103.00623 (2021).

classmethod equilibrium_price(T: int = 4, s_inv: int = 3, Q: int = 2, H: int = 2, d: float = 1.0, e0: float = 1.0, sigma: float = 1.0, c: tuple[float, float, float, float, float] = (1.0, 1.0, 1.0, 1.0, 1.0), mu0: Literal['uniform'] | Tensor = 'uniform') Environment

Equilibrium Price environment.

In this problem, a large number of homogeneous firms producing the same product under perfect competition are considered. The price of the product is determined endogenously by the supply-demand equilibrium. Each firm, meanwhile, maintains a certain inventory level of the raw materials for production, and decides about the quantity of raw materials to consume for production and the quantity of raw materials to replenish the inventory.

Guo, X., Hu, A., Xu, R., & Zhang, J. (2022). A general framework for learning mean-field games. Mathematics of Operations Research.

classmethod left_right(mu0: tuple[float, float, float] = (1.0, 0.0, 0.0)) Environment

Left-Right environment.

A large number of agents choose simultaneously between going left (L) or right (R). Afterwards, each agent shall be punished proportional to the number of agents that chose the same action, but more-so for choosing right than left.

Cui, Kai, and Heinz Koeppl. “Approximately solving mean field games via entropy-regularized deep reinforcement learning.” International Conference on Artificial Intelligence and Statistics. PMLR, 2021. https://proceedings.mlr.press/v130/cui21a.html

classmethod linear_quadratic(T: int = 3, el: int = 5, m: int = 2, sigma: float = 3.0, delta: float = 0.1, k: float = 1.0, q: float = 0.01, kappa: float = 0.5, c_term: float = 1.0, mu0: Literal['uniform'] | Tensor = 'uniform') Environment

Linear Quadratic environment.

Perrin, Sarah, et al. “Fictitious play for mean field games: Continuous time analysis and applications.” Advances in Neural Information Processing Systems 33 (2020): 13199-13213.

classmethod random_linear(T: int = 3, n: int = 5, m: float = 10.0, seed: int = 0, mu0: Literal['uniform'] | Tensor = 'uniform') Environment

General linear environment.

A custom environment in which the rewards and transition probabilities are random affine functions of the mean-field. For transition probabilities to be valid, a softmax function is applied on top of the corresponding affine function.

classmethod rock_paper_scissors(T: int = 1, mu0: tuple[float, float, float, float] = (1.0, 0.0, 0.0, 0.0)) Environment

Rock-Paper-Scissors environment.

This game is inspired by Shapley (1964) and their generalized non-zero-sum version of Rock-Paper-Scissors, for which classical fictitious play would not converge. Each of the agents can choose between rock, paper and scissors, and obtains a reward proportional to double the number of beaten agents minus the number of agents beating the agent.

Cui, Kai, and Heinz Koeppl. “Approximately solving mean field games via entropy-regularized deep reinforcement learning.” International Conference on Artificial Intelligence and Statistics. PMLR, 2021. https://proceedings.mlr.press/v130/cui21a.html

classmethod susceptible_infected(T: int = 50, mu0: tuple[float, float] = (0.4, 0.6)) Environment

SIS environment.

In this problem, a large number of agents can choose between social distancing (D) or going out (U). If a susceptible (S) agent chooses social distancing, they may not become infected (I). Otherwise, an agent may become infected with a probability proportional to the number of agents being infected. If infected, an agent will recover with a fixed chance every time step. Both social distancing and being infected have an associated cost.

Cui, Kai, and Heinz Koeppl. “Approximately solving mean field games via entropy-regularized deep reinforcement learning.” International Conference on Artificial Intelligence and Statistics. PMLR, 2021. https://proceedings.mlr.press/v130/cui21a.html

class mfglib.alg.FictitiousPlay(alpha: float | None = None)

Fictitious Play algorithm.

Notes

The implementation is based on Fictitious Play Damped.

When alpha=None, the algorithm is the same as the original Fictitious Play algorithm. When alpha=1, the algorithm is the same as Fixed Point Iteration algorithm.

See [3] and [4] for algorithm details.

solve(env_instance: Environment, *, pi: Literal['uniform'] | Tensor = 'uniform', max_iter: int = 100, atol: float | None = 0.001, rtol: float | None = 0.001, verbose: bool = False) tuple[list[torch.Tensor], list[float], list[float]]

Run the algorithm and solve for a Nash-Equilibrium policy.

Parameters:
  • env_instance – An instance of a specific environment.

  • pi – A tensor of size (T+1,)+S+A representing the initial policy. If ‘uniform’, the initial policy will be the uniform distribution.

  • max_iter – Maximum number of iterations to run.

  • atol – Absolute tolerance criteria for early stopping.

  • rtol – Relative tolerance criteria for early stopping.

  • verbose – Print convergence information during iteration.

tune(env_suite: list[mfglib.env.environment.Environment], *, max_iter: int = 100, atol: float = 0.001, rtol: float = 0.001, metric: Literal['shifted_geo_mean', 'failure_rate'] = 'shifted_geo_mean', n_trials: int | None = 10, timeout: float = 30.0) FictitiousPlay

Tune the algorithm over a given environment suite.

Parameters:
  • env_suite – A list of environment instances.

  • max_iter – The number of iterations to run the algorithm on each environment instance.

  • atol – Absolute tolerance criteria for early stopping.

  • rtol – Relative tolerance criteria for early stopping.

  • metric – Determines which metric to be used for scoring a trial. Either shifted_geo_mean or failure_rate.

  • n_trials – The number of trials. If this argument is not given, as many trials are run as possible.

  • timeout – Stop tuning after the given number of second(s) on each environment instance. If this argument is not given, as many trials are run as possible.

class mfglib.alg.MFOMO(L: Tensor | None = None, z: Tensor | None = None, y: Tensor | None = None, u: Tensor | None = None, v: Tensor | None = None, w: Tensor | None = None, loss: Literal['l1'] | Literal['l2'] | Literal['l1_l2'] = 'l1_l2', c1: float = 1.0, c2: float = 1.0, c3: float = 1.0, rb_freq: int | None = None, m1: float = 10.0, m2: float = 2.0, m3: float = 0.1, optimizer: dict[str, Any] = {'config': {'lr': 0.1}, 'name': 'Adam'}, parameterize: bool = False, hat_init: bool = False)

Mean-Field Occupation Measure Optimization algorithm.

Notes

See [5] for algorithm details.

solve(env_instance: Environment, *, pi: Literal['uniform'] | Tensor = 'uniform', max_iter: int = 100, atol: float | None = 0.001, rtol: float | None = 0.001, verbose: bool = False) tuple[list[torch.Tensor], list[float], list[float]]

Mean-Field Occupation Measure Optimization algorithm.

Parameters:
  • env_instance – An instance of a specific environment.

  • pi – A user-provided array of size (T+1,) + S + A representing the initial policy. If ‘uniform’, the initial policy will be uniformly distributed.

  • max_iter – Maximum number of iterations to run.

  • atol – Absolute tolerance criteria for early stopping.

  • rtol – Relative tolerance criteria for early stopping.

  • verbose – Print convergence information during iteration.

tune(env_suite: list[mfglib.env.environment.Environment], *, max_iter: int = 100, atol: float = 0.001, rtol: float = 0.001, metric: Literal['shifted_geo_mean', 'failure_rate'] = 'shifted_geo_mean', n_trials: int | None = 10, timeout: float = 30.0) MFOMO

Tune the algorithm over a given environment suite.

Parameters:
  • env_suite – A list of environment instances.

  • max_iter – The number of iterations to run the algorithm on each environment instance.

  • atol – Absolute tolerance criteria for early stopping.

  • rtol – Relative tolerance criteria for early stopping.

  • metric – Determines which metric to be used for scoring a trial. Either shifted_geo_mean or failure_rate.

  • n_trials – The number of trials. If this argument is not given, as many trials are run as possible.

  • timeout – Stop tuning after the given number of second(s) on each environment instance. If this argument is not given, as many trials are run as possible.

class mfglib.alg.OnlineMirrorDescent(alpha: float = 1.0)

Online Mirror Descent algorithm.

Notes

See [6] for algorithm details.

solve(env_instance: Environment, *, pi: Literal['uniform'] | Tensor = 'uniform', max_iter: int = 100, atol: float | None = 0.001, rtol: float | None = 0.001, verbose: bool = False) tuple[list[torch.Tensor], list[float], list[float]]

Run the algorithm and solve for a Nash-Equilibrium policy.

Parameters:
  • env_instance – An instance of a specific environment.

  • pi – A numpy array of size (T+1,)+S+A representing the initial policy. If ‘uniform’, the initial policy will be the uniform distribution.

  • max_iter – Maximum number of iterations to run.

  • atol – Absolute tolerance criteria for early stopping.

  • rtol – Relative tolerance criteria for early stopping.

  • verbose – Print convergence information during iteration.

tune(env_suite: list[mfglib.env.environment.Environment], *, max_iter: int = 100, atol: float = 0.001, rtol: float = 0.001, metric: Literal['shifted_geo_mean', 'failure_rate'] = 'shifted_geo_mean', n_trials: int | None = 10, timeout: float = 30.0) OnlineMirrorDescent

Tune the algorithm over a given environment suite.

Parameters:
  • env_suite – A list of environment instances.

  • max_iter – The number of iterations to run the algorithm on each environment instance.

  • atol – Absolute tolerance criteria for early stopping.

  • rtol – Relative tolerance criteria for early stopping.

  • metric – Determines which metric to be used for scoring a trial. Either shifted_geo_mean or failure_rate.

  • n_trials – The number of trials. If this argument is not given, as many trials are run as possible.

  • timeout – Stop tuning after the given number of second(s) on each environment instance. If this argument is not given, as many trials are run as possible.

class mfglib.alg.PriorDescent(eta: float = 1.0, n_inner: int | None = None)

Prior Descent algorithm.

Notes

When n_inner=None, the algorithm is the same as GMF-V.

See [7] and [8] for algorithm details.

solve(env_instance: Environment, *, pi: Literal['uniform'] | Tensor = 'uniform', max_iter: int = 100, atol: float | None = 0.001, rtol: float | None = 0.001, verbose: bool = False) tuple[list[torch.Tensor], list[float], list[float]]

Run the algorithm and solve for a Nash-Equilibrium policy.

Parameters:
  • env_instance – An instance of a specific environment.

  • pi – A numpy array of size (T+1,)+S+A representing the initial policy. If ‘uniform’, the initial policy will be the uniform distribution.

  • max_iter – Maximum number of iterations to run.

  • atol – Absolute tolerance criteria for early stopping.

  • rtol – Relative tolerance criteria for early stopping.

  • verbose – Print convergence information during iteration.

tune(env_suite: list[mfglib.env.environment.Environment], *, max_iter: int = 100, atol: float = 0.001, rtol: float = 0.001, metric: Literal['shifted_geo_mean', 'failure_rate'] = 'shifted_geo_mean', n_trials: int | None = 10, timeout: float = 30.0) PriorDescent

Tune the algorithm over a given environment suite.

Parameters:
  • env_suite – A list of environment instances.

  • max_iter – Notes number of iterations to run the algorithm on each environment instance.

  • atol – Absolute tolerance criteria for early stopping.

  • rtol – Relative tolerance criteria for early stopping.

  • metric – Determines which metric to be used for scoring a trial. Either shifted_geo_mean or failure_rate.

  • n_trials – The number of trials. If this argument is not given, as many trials are run as possible.

  • timeout – Stop tuning after the given number of second(s) on each environment instance. If this argument is not given, as many trials are run as possible.