Algorithms¶
The following algorithms come pre-implemented with MFGLib.
Algorithm |
Reference |
|---|---|
Fictitious Play |
Perrin et al.9 |
Online Mirror Descent |
Perolat et al.8 |
Prior Descent |
Cui and Koeppl3 |
Mean-Field Occupation Measure Optimization |
Guo et al.6 |
Mean-Field Occupation Measure Inclusion |
Hu and Zhang7 |
Further algorithm details are provided below. To learn how to implement your own algorithm, see the section on User-Defined Algorithms.
API Reference¶
All the pre-implemented algorithms are importable from mfglib.alg. Each algorithm is equipped with
a .solve() method that serves to find a Nash equilibrium of a supplied environment instance.
- abstract Algorithm.solve(env: Environment, *, pi_0: Literal['uniform'] | Tensor = Ellipsis, max_iter: int = Ellipsis, atol: float | None = Ellipsis, rtol: float | None = Ellipsis, verbose: bool = Ellipsis, print_every: int = Ellipsis) tuple[list[Tensor], list[float], list[float]]¶
In addition to the environment instance env, you can provide an initial policy pi_0 and various
algorithm parameters such as max_iter and tolerance parameters atol and rtol. MFGLib provides
the following algorithms:
- class FictitiousPlay(alpha: float = 0.0)¶
Fictitious Play algorithm.
Notes
The implementation is based on Fictitious Play Damped.
When
alpha=0.0, the algorithm is the same as the original Fictitious Play algorithm. Whenalpha=1, the algorithm is the same as Fixed Point Iteration algorithm.
- class OnlineMirrorDescent(alpha: float = 1.0)¶
Online Mirror Descent algorithm.
See also
Refer to Perolat et al.8 for additional details.
- class PriorDescent(eta: float = 1.0, n_inner: int | None = None)¶
Prior Descent algorithm.
Notes
When n_inner=None, the algorithm is the same as GMF-V.
- class MFOMO(L: Tensor | None = None, z: Tensor | None = None, y: Tensor | None = None, u: Tensor | None = None, v: Tensor | None = None, w: Tensor | None = None, loss: Literal['l1', 'l2', 'l1_l2'] = 'l1_l2', c1: float = 1.0, c2: float = 1.0, c3: float = 1.0, rb_freq: int | None = None, m1: float = 10.0, m2: float = 2.0, m3: float = 0.1, optimizer: dict[str, Any] = {'config': {'lr': 0.1}, 'name': 'Adam'}, parameterize: bool = False, hat_init: bool = True)¶
MF-OMO, or Mean-Field Occupation Measure Inclusion, reformulates the MFG as an optimization problem.
In its most basic form MF-OMO solves
\[\begin{split}\text{minimize}_{L, y, z} \quad& \left \lVert A_L L - b \right \rVert_2^2 + \left \lVert A_L^\top y + z - c_L \right \rVert_2^2 + z^\top L \\ \text{subject to} \quad& \mathbf{1}^\top L_t = 0 \quad \forall t \in \mathcal{T} \\ & \mathbf{1}^\top z \leq SA(T^2 + T + 2) r_{\max} \\ & \left \lVert y \right \rVert_2 \leq S ( T + 1 ) ( T + 2 ) r_{\max} / 2 \\ & L \in \mathbb{R}_+^{\mathcal{T} \mathcal{S} \mathcal{A}} \\ & y \in \mathbb{R}^{\mathcal{T} \mathcal{S}} \\ & z \in \mathbb{R}^{\mathcal{T} \mathcal{S} \mathcal{A}}\end{split}\]This constrained optimization problem can be solved with a variety of methods, such as Projected Gradient Descent.
Parameterized Formulation. Replacing the constrained optimization problem with a smooth unconstrained problem enables us to use a broader range of optimization solvers. As explained in Appendix A.3[#], we can reparameterize the variables in MF-OMO to completely eliminate the constraints. The new problem is called the “parameterized” formulation. Pass
parameterize=Trueto use this second formulation.Hat Initialization. Given an initial mean-field \(L\), the “hat initialization” sets the initial \(y,z\) to \(\hat{y}(L), \hat{z}(L)\) as explained in Proposition 6[#]. Pass
hat_init=Trueto enable this option.Redesigned Objective. One can also assign different coefficients to the three terms in the objective function, and come up with a “redesigned objective”
\[c_1 \left \lVert A_L L - b \right \rVert_2^2 + c_2 \left \lVert A_L^\top y + z - c_L \right \rVert_2^2 + c_3 z^\top L\]Note
Without loss of generality we can always let \(c_1 = 1\).
- You can also apply different norm to the objective terms. The pre-configured options are
"l1"with objective \(c_1 \left \lVert A_L L - b \right \rVert_1 + c_2 \left \lVert A_L^\top y + z - c_L \right \rVert_1 + c_3 z^\top L\)"l2"with objective \(c_1 \left \lVert A_L L - b \right \rVert_2^2 + c_2 \left \lVert A_L^\top y + z - c_L \right \rVert_2^2 + c_3 ( z^\top L)^2\)"l1_l2"with objective \(c_1 \left \lVert A_L L - b \right \rVert_2^2 + c_2 \left \lVert A_L^\top y + z - c_L \right \rVert_2^2 + c_3 z^\top L\)
Adaptive Residual Balancing: We can adaptively change the coefficients (\(c_1\), \(c_2\), and \(c_3\)) of the redesigned objective based on the value of their corresponding objective term. This process is controlled by the three parameters,
m1,m2, andm3.Let’s denote by \(O_1\) the value of the first objective term (depending on the norm used, it could be either \(\left \lVert A_L L-b \right \rVert_1\) or \(\lVert A_L L-b \rVert_2^2\)), and let \(O_2\) and \(O_3\) be the values of the second and third objective terms, respectively. When adaptive residual balancing is applied, we modify the coefficients in the following way:
If \(O_1/ \max \{ O_2, O_3 \} > m_1\), then multiply
c1bym2.If \(O_1/ \min \{ O_2, O_3 \} < m_3\), then divide
c1bym2.If \(O_2/ \max\{ O_1, O_3 \} > m_1\), then multiply
c2bym2.If \(O_2/ \max\{ O_1, O_3 \} > m_3\), then divide
c2bym2.
rb_freqdetermines how frequently the residual rebalancing is applied.Initialization: We can set the initial policy for any algorithm using the input argument
pithrough the ` solve()` method. MF-OMO uses the initial policy to compute the initial values of the variables \(L\), \(z\), and \(y\). However, if you want to initialize these variables directly, you can do so by passing theL,y, andzparameters to the constructor. If you’re using the parameterized version, you should instead passu,v, andw.- Parameters:
L – Initialization value, used only when
parameterize=Falseand otherwise ignored.z – Initialization value, used only when
parameterize=Falseand otherwise ignored.y – Initialization value, used only when
parameterize=Falseand otherwise ignored.u – Initialization value, used only when
parameterize=Trueand otherwise ignored.v – Initialization value, used only when
parameterize=Trueand otherwise ignored.w – Initialization value, used only when
parameterize=Trueand otherwise ignored.loss – Determines the type of norm used in the objective function.
c1 – Objective function coefficient.
c2 – Objective function coefficient.
c3 – Objective function coefficient.
rb_freq – Determines how often residual balancing is applied. If None, residual balancing will not be applied.
m1 – Residual balancing parameter.
m2 – Residual balancing parameter.
m3 – Residual balancing parameter.
optimizer – Name and configuration of a
pytorchoptimizer.parameterize – Optionally solve the alternate “parameterized” formulation.
hat_init – A boolean determining whether to use hat initialization.
seealso:: (..) – Refer to Guo et al.6 for additional details.
- class OccupationMeasureInclusion(alpha: float = 0.001, eta: float = 0.0, osqp_atol: float | None = None, osqp_rtol: float | None = None, osqp_warmstart: bool = True)¶
Mean-Field Occupation Measure Inclusion with Forward-Backward Splitting.
Notes
MF-OMI-FBS recasts the objective of finding a mean-field Nash equilibrium as an inclusion problem with occupation-measure variables. The algorithm is known to have polynomial regret bounds in games with the Lasry-Lions monotonicity property.
See also
Refer to Hu and Zhang7 for additional details.
User-Defined Algorithms¶
To create an MFGLib-compatible custom algorithm, you can subclass the abstract base class
mfglib.alg.abc.Algorithm. Your custom logic will be contained in the solve() method.
- class Algorithm.solve(self, env: Environment, *, pi_0: Literal['uniform'] | Tensor = Ellipsis, max_iter: int = Ellipsis, atol: float | None = Ellipsis, rtol: float | None = Ellipsis, verbose: bool = Ellipsis, print_every: int = Ellipsis)
Here is a brief example, in which we design an iterative algorithm that applies some abstract rule \(F\) to update the policy. To keep the example is simple as possible, we won’t allow initialization with arbitrary policies, we won’t implement early stopping, and we won’t support verbose print-outs.
import typing
import time
import torch
from mfglib.alg.abc import Algorithm
from mfglib.env import Environment
from mfglib.scoring import exploitability_score
class MyAlgorithm(Algorithm):
def solve(
env_instance: Environment,
*,
pi: typing.Literal["uniform"] | torch.Tensor = "uniform",
max_iter: int = 100,
atol: float | None = 0.001,
rtol: float | None = 0.001,
verbose: int = 0,
) -> tuple[list[torch.Tensor], list[float], list[float]]:
# Initialize the three lists we will return: policies, exploitability
# scores, and runtimes.
pis = []
expls = []
rts = []
# Force uniform initialization. Recall our state and action spaces
# are, in general, multi-dimensional and must be handled accordingly.
T = env_instance.T
S = env_instance.S
A = env_instance.A
pi = torch.ones(size=[T + 1, *S, *A]) / torch.tensor(A).prod()
t0 = time.perf_counter()
for _ in range(max_iter):
expl = exploitability_score()
pis.append(pi)
expls.append(expl)
rts.append(time.perf_counter() - t0)
# F here is some generic policy update rule that, in practice,
# might take additional arguments.
pi = F(pi)
return pis, expls, rts