Algorithms¶

The following algorithms come pre-implemented with MFGLib.

Algorithm	Reference
Fictitious Play	Perrin et al.⁹
Online Mirror Descent	Perolat et al.⁸
Prior Descent	Cui and Koeppl³
Mean-Field Occupation Measure Optimization	Guo et al.⁶
Mean-Field Occupation Measure Inclusion	Hu and Zhang⁷

Further algorithm details are provided below. To learn how to implement your own algorithm, see the section on User-Defined Algorithms.

API Reference¶

All the pre-implemented algorithms are importable from mfglib.alg. Each algorithm is equipped with a .solve() method that serves to find a Nash equilibrium of a supplied environment instance.

abstract Algorithm.solve(env: Environment, *, pi_0: Literal['uniform'] | Tensor = Ellipsis, max_iter: int = Ellipsis, atol: float | None = Ellipsis, rtol: float | None = Ellipsis, verbose: bool = Ellipsis, print_every: int = Ellipsis) → tuple[list[Tensor], list[float], list[float]]¶

In addition to the environment instance env, you can provide an initial policy pi_0 and various algorithm parameters such as max_iter and tolerance parameters atol and rtol. MFGLib provides the following algorithms:

class FictitiousPlay(alpha: float = 0.0)¶

Fictitious Play algorithm.

Notes

The implementation is based on Fictitious Play Damped.

When alpha=0.0, the algorithm is the same as the original Fictitious Play algorithm. When alpha=1, the algorithm is the same as Fixed Point Iteration algorithm.

See also

Refer to Cui and Koeppl³ and Guo et al.⁵ for additional details.

class MFOMO(L: Tensor | None = None, z: Tensor | None = None, y: Tensor | None = None, u: Tensor | None = None, v: Tensor | None = None, w: Tensor | None = None, loss: Literal['l1', 'l2', 'l1_l2'] = 'l1_l2', c1: float = 1.0, c2: float = 1.0, c3: float = 1.0, rb_freq: int | None = None, m1: float = 10.0, m2: float = 2.0, m3: float = 0.1, optimizer: dict[str, Any] = {'config': {'lr': 0.1}, 'name': 'Adam'}, parameterize: bool = False, hat_init: bool = True)¶

MF-OMO, or Mean-Field Occupation Measure Inclusion, reformulates the MFG as an optimization problem.

In its most basic form MF-OMO solves

\[\begin{split}\text{minimize}_{L, y, z} \quad& \left \lVert A_L L - b \right \rVert_2^2 + \left \lVert A_L^\top y + z - c_L \right \rVert_2^2 + z^\top L \\ \text{subject to} \quad& \mathbf{1}^\top L_t = 0 \quad \forall t \in \mathcal{T} \\ & \mathbf{1}^\top z \leq SA(T^2 + T + 2) r_{\max} \\ & \left \lVert y \right \rVert_2 \leq S ( T + 1 ) ( T + 2 ) r_{\max} / 2 \\ & L \in \mathbb{R}_+^{\mathcal{T} \mathcal{S} \mathcal{A}} \\ & y \in \mathbb{R}^{\mathcal{T} \mathcal{S}} \\ & z \in \mathbb{R}^{\mathcal{T} \mathcal{S} \mathcal{A}}\end{split}\]

This constrained optimization problem can be solved with a variety of methods, such as Projected Gradient Descent.

Parameterized Formulation. Replacing the constrained optimization problem with a smooth unconstrained problem enables us to use a broader range of optimization solvers. As explained in Appendix A.3[#], we can reparameterize the variables in MF-OMO to completely eliminate the constraints. The new problem is called the “parameterized” formulation. Pass parameterize=True to use this second formulation.

Hat Initialization. Given an initial mean-field \(L\), the “hat initialization” sets the initial \(y,z\) to \(\hat{y}(L), \hat{z}(L)\) as explained in Proposition 6[#]. Pass hat_init=True to enable this option.

Redesigned Objective. One can also assign different coefficients to the three terms in the objective function, and come up with a “redesigned objective”

\[c_1 \left \lVert A_L L - b \right \rVert_2^2 + c_2 \left \lVert A_L^\top y + z - c_L \right \rVert_2^2 + c_3 z^\top L\]

Note

Without loss of generality we can always let \(c_1 = 1\).

You can also apply different norm to the objective terms. The pre-configured options are

"l1" with objective \(c_1 \left \lVert A_L L - b \right \rVert_1 + c_2 \left \lVert A_L^\top y + z - c_L \right \rVert_1 + c_3 z^\top L\)
"l2" with objective \(c_1 \left \lVert A_L L - b \right \rVert_2^2 + c_2 \left \lVert A_L^\top y + z - c_L \right \rVert_2^2 + c_3 ( z^\top L)^2\)
"l1_l2" with objective \(c_1 \left \lVert A_L L - b \right \rVert_2^2 + c_2 \left \lVert A_L^\top y + z - c_L \right \rVert_2^2 + c_3 z^\top L\)

Adaptive Residual Balancing: We can adaptively change the coefficients (\(c_1\), \(c_2\), and \(c_3\)) of the redesigned objective based on the value of their corresponding objective term. This process is controlled by the three parameters, m1, m2, and m3.

Let’s denote by \(O_1\) the value of the first objective term (depending on the norm used, it could be either \(\left \lVert A_L L-b \right \rVert_1\) or \(\lVert A_L L-b \rVert_2^2\)), and let \(O_2\) and \(O_3\) be the values of the second and third objective terms, respectively. When adaptive residual balancing is applied, we modify the coefficients in the following way:

If \(O_1/ \max \{ O_2, O_3 \} > m_1\), then multiply c1 by m2.
If \(O_1/ \min \{ O_2, O_3 \} < m_3\), then divide c1 by m2.
If \(O_2/ \max\{ O_1, O_3 \} > m_1\), then multiply c2 by m2.
If \(O_2/ \max\{ O_1, O_3 \} > m_3\), then divide c2 by m2.

rb_freq determines how frequently the residual rebalancing is applied.

Initialization: We can set the initial policy for any algorithm using the input argument pi through the ` solve()` method. MF-OMO uses the initial policy to compute the initial values of the variables \(L\), \(z\), and \(y\). However, if you want to initialize these variables directly, you can do so by passing the L, y, and z parameters to the constructor. If you’re using the parameterized version, you should instead pass u, v, and w.

Parameters:

L – Initialization value, used only when parameterize=False and otherwise ignored.
z – Initialization value, used only when parameterize=False and otherwise ignored.
y – Initialization value, used only when parameterize=False and otherwise ignored.
u – Initialization value, used only when parameterize=True and otherwise ignored.
v – Initialization value, used only when parameterize=True and otherwise ignored.
w – Initialization value, used only when parameterize=True and otherwise ignored.
loss – Determines the type of norm used in the objective function.
c1 – Objective function coefficient.
c2 – Objective function coefficient.
c3 – Objective function coefficient.
rb_freq – Determines how often residual balancing is applied. If None, residual balancing will not be applied.
m1 – Residual balancing parameter.
m2 – Residual balancing parameter.
m3 – Residual balancing parameter.
optimizer – Name and configuration of a pytorch optimizer.
parameterize – Optionally solve the alternate “parameterized” formulation.
hat_init – A boolean determining whether to use hat initialization.
seealso:: (..) – Refer to Guo et al.⁶ for additional details.

class OccupationMeasureInclusion(alpha: float = 0.001, eta: float = 0.0, osqp_atol: float | None = None, osqp_rtol: float | None = None, osqp_warmstart: bool = True)¶

Mean-Field Occupation Measure Inclusion with Forward-Backward Splitting.

Notes

MF-OMI-FBS recasts the objective of finding a mean-field Nash equilibrium as an inclusion problem with occupation-measure variables. The algorithm is known to have polynomial regret bounds in games with the Lasry-Lions monotonicity property.

User-Defined Algorithms¶

To create an MFGLib-compatible custom algorithm, you can subclass the abstract base class mfglib.alg.abc.Algorithm. Your custom logic will be contained in the solve() method.

class Algorithm.solve(self, env: Environment, *, pi_0: Literal['uniform'] | Tensor = Ellipsis, max_iter: int = Ellipsis, atol: float | None = Ellipsis, rtol: float | None = Ellipsis, verbose: bool = Ellipsis, print_every: int = Ellipsis)

Here is a brief example, in which we design an iterative algorithm that applies some abstract rule \(F\) to update the policy. To keep the example is simple as possible, we won’t allow initialization with arbitrary policies, we won’t implement early stopping, and we won’t support verbose print-outs.

import typing
import time

import torch

from mfglib.alg.abc import Algorithm
from mfglib.env import Environment
from mfglib.scoring import exploitability_score

class MyAlgorithm(Algorithm):
    def solve(
        env_instance: Environment,
        *,
        pi: typing.Literal["uniform"] | torch.Tensor = "uniform",
        max_iter: int = 100,
        atol: float | None = 0.001,
        rtol: float | None = 0.001,
        verbose: int = 0,
    ) -> tuple[list[torch.Tensor], list[float], list[float]]:
        # Initialize the three lists we will return: policies, exploitability
        # scores, and runtimes.
        pis = []
        expls = []
        rts = []

        # Force uniform initialization. Recall our state and action spaces
        # are, in general, multi-dimensional and must be handled accordingly.
        T = env_instance.T
        S = env_instance.S
        A = env_instance.A
        pi = torch.ones(size=[T + 1, *S, *A]) / torch.tensor(A).prod()

        t0 = time.perf_counter()
        for _ in range(max_iter):
            expl = exploitability_score()
            pis.append(pi)
            expls.append(expl)
            rts.append(time.perf_counter() - t0)
            # F here is some generic policy update rule that, in practice,
            # might take additional arguments.
            pi = F(pi)

        return pis, expls, rts