4. Benchmark Problems¶
4.1. Overview¶
Benchmark sets such as MIPLIB or TSPLIB are usually employed to evaluate the performance of conventional MIP solvers. Two shortcomings, however, make existing benchmark sets less suitable for evaluating the performance of learning-enhanced MIP solvers: (i) while existing benchmark sets typically contain hundreds or thousands of instances, machine learning (ML) methods typically benefit from having orders of magnitude more instances available for training; (ii) current machine learning methods typically provide best performance on sets of homogeneous instances, buch general-purpose benchmark sets contain relatively few examples of each problem type.
To tackle this challenge, MIPLearn provides random instance generators for a wide variety of classical optimization problems, covering applications from different fields, that can be used to evaluate new learning-enhanced MIP techniques in a measurable and reproducible way. As of MIPLearn 0.3, nine problem generators are available, each customizable with user-provided probability distribution and flexible parameters. The generators can be configured, for example, to produce large sets of very similar instances of same size, where only the objective function changes, or more diverse sets of instances, with various sizes and characteristics, belonging to a particular problem class.
In the following, we describe the problems included in the library, their MIP formulation and the generation algorithm.
Warning
The random instance generators and formulations shown below are subject to change. If you use them in your research, for reproducibility, you should specify the MIPLearn version and all parameters.
Note
To make the instances easier to process, all formulations are written as a minimization problem.
Some problem formulations, such as the one for the traveling salesman problem, contain an exponential number of constraints, which are enforced through constraint generation. The MPS files for these problems contain only the constraints that were generated during a trial run, not the entire set of constraints. Resolving the MPS file, therefore, may not generate a feasible primal solution for the problem.
4.2. Bin Packing¶
Bin packing is a combinatorial optimization problem that asks for the optimal way to pack a given set of items into a finite number of containers (or bins) of fixed capacity. More specifically, the problem is to assign indivisible items of different sizes to identical bins, while minimizing the number of bins used. The problem is NP-hard and has many practical applications, including logistics and warehouse management, where it is used to determine how to best store and transport goods using a limited amount of space.
Formulation¶
Let \(n\) be the number of items, and \(s_i\) the size of the \(i\)-th item. Also let \(B\) be the size of the bins. For each bin \(j\), let \(y_j\) be a binary decision variable which equals one if the bin is used. For every item-bin pair \((i,j)\), let \(x_{ij}\) be a binary decision variable which equals one if item \(i\) is assigned to bin \(j\). The bin packing problem is formulated as:
Random instance generator¶
Random instances of the bin packing problem can be generated using the class BinPackGenerator.
If fix_items=False
, the class samples the user-provided probability distributions n
, sizes
and capacity
to decide, respectively, the number of items, the sizes of the items and capacity of the bin. All values are sampled independently.
If fix_items=True
, the class creates a reference instance, using the method previously described, then generates additional instances by perturbing its item sizes and bin capacity. More specifically, the sizes of the items are set to \(s_i \gamma_i\), where \(s_i\) is the size of the \(i\)-th item in the reference instance and \(\gamma_i\) is sampled from sizes_jitter
. Similarly, the bin size is set to \(B \beta\), where \(B\) is the reference bin size and
\(\beta\) is sampled from capacity_jitter
. The number of items remains the same across all generated instances.
Example¶
[1]:
import numpy as np
from scipy.stats import uniform, randint
from miplearn.problems.binpack import BinPackGenerator, build_binpack_model
# Set random seed, to make example reproducible
np.random.seed(42)
# Generate random instances of the binpack problem with ten items
data = BinPackGenerator(
n=randint(low=10, high=11),
sizes=uniform(loc=0, scale=25),
capacity=uniform(loc=100, scale=0),
sizes_jitter=uniform(loc=0.9, scale=0.2),
capacity_jitter=uniform(loc=0.9, scale=0.2),
fix_items=True,
).generate(10)
# Print sizes and capacities
for i in range(10):
print(i, data[i].sizes, data[i].capacity)
print()
# Optimize first instance
model = build_binpack_model(data[0])
model.optimize()
0 [ 8.47 26. 19.52 14.11 3.65 3.65 1.4 21.76 14.82 16.96] 102.24
1 [ 8.69 22.78 17.81 14.83 4.12 3.67 1.46 22.05 13.66 18.08] 93.41
2 [ 8.55 25.9 20. 15.89 3.75 3.59 1.51 21.4 13.89 17.68] 90.69
3 [10.13 22.62 18.89 14.4 3.92 3.94 1.36 23.69 15.85 19.26] 107.9
4 [ 9.55 25.77 16.79 14.06 3.55 3.76 1.42 20.66 16.02 17.19] 95.62
5 [ 9.44 22.06 19.41 13.69 4.28 4.11 1.36 19.51 15.98 18.43] 104.58
6 [ 9.87 21.74 17.78 13.82 4.18 4. 1.4 19.76 14.46 17.08] 104.59
7 [ 9.62 25.61 18.2 13.83 4.07 4.1 1.47 22.83 15.01 17.78] 98.55
8 [ 8.47 21.9 16.58 15.37 3.76 3.91 1.57 20.57 14.76 18.61] 94.58
9 [ 8.57 22.77 17.06 16.25 4.14 4. 1.56 22.97 14.09 19.09] 100.79
Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (linux64)
CPU model: AMD Ryzen 9 7950X 16-Core Processor, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Optimize a model with 20 rows, 110 columns and 210 nonzeros
Model fingerprint: 0x1ff9913f
Variable types: 0 continuous, 110 integer (110 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+02]
Objective range [1e+00, 1e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+00]
Found heuristic solution: objective 5.0000000
Presolve time: 0.00s
Presolved: 20 rows, 110 columns, 210 nonzeros
Variable types: 0 continuous, 110 integer (110 binary)
Root relaxation: objective 1.274844e+00, 38 iterations, 0.00 seconds (0.00 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 1.27484 0 4 5.00000 1.27484 74.5% - 0s
H 0 0 4.0000000 1.27484 68.1% - 0s
H 0 0 2.0000000 1.27484 36.3% - 0s
0 0 1.27484 0 4 2.00000 1.27484 36.3% - 0s
Explored 1 nodes (38 simplex iterations) in 0.01 seconds (0.00 work units)
Thread count was 32 (of 32 available processors)
Solution count 3: 2 4 5
Optimal solution found (tolerance 1.00e-04)
Best objective 2.000000000000e+00, best bound 2.000000000000e+00, gap 0.0000%
/home/axavier/.conda/envs/miplearn2/lib/python3.9/site-packages/tqdm/auto.py:22: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
4.3. Multi-Dimensional Knapsack¶
The multi-dimensional knapsack problem is a generalization of the classic knapsack problem, which involves selecting a subset of items to be placed in a knapsack such that the total value of the items is maximized without exceeding a maximum weight. In this generalization, items have multiple weights (representing multiple resources), and multiple weight constraints must be satisfied.
Formulation¶
Let \(n\) be the number of items and \(m\) be the number of resources. For each item \(j\) and resource \(i\), let \(p_j\) be the price of the item, let \(w_{ij}\) be the amount of resource \(j\) item \(i\) consumes (i.e. the \(j\)-th weight of the item), and let \(b_i\) be the total amount of resource \(i\) available (or the size of the \(j\)-th knapsack). The formulation is given by:
Random instance generator¶
The class MultiKnapsackGenerator can be used to generate random instances of this problem. The number of items \(n\) and knapsacks \(m\) are sampled from the user-provided probability distributions n
and m
. The weights \(w_{ij}\) are sampled independently from the provided distribution w
. The capacity of knapsack \(i\) is set to
where \(\alpha_i\), the tightness ratio, is sampled from the provided probability distribution alpha
. To make the instances more challenging, the costs of the items are linearly correlated to their average weights. More specifically, the price of each item \(j\) is set to:
where \(K\), the correlation coefficient, and \(u_j\), the correlation multiplier, are sampled from the provided probability distributions K
and u
.
If fix_w=True
is provided, then \(w_{ij}\) are kept the same in all generated instances. This also implies that \(n\) and \(m\) are kept fixed. Although the prices and capacities are derived from \(w_{ij}\), as long as u
and K
are not constants, the generated instances will still not be completely identical.
If a probability distribution w_jitter
is provided, then item weights will be set to \(w_{ij} \gamma_{ij}\) where \(\gamma_{ij}\) is sampled from w_jitter
. When combined with fix_w=True
, this argument may be used to generate instances where the weight of each item is roughly the same, but not exactly identical, across all instances. The prices of the items and the capacities of the knapsacks will be calculated as above, but using these perturbed weights instead.
By default, all generated prices, weights and capacities are rounded to the nearest integer number. If round=False
is provided, this rounding will be disabled.
References
Freville, Arnaud, and Gérard Plateau. An efficient preprocessing procedure for the multidimensional 0–1 knapsack problem. Discrete applied mathematics 49.1-3 (1994): 189-212.
Fréville, Arnaud. The multidimensional 0–1 knapsack problem: An overview. European Journal of Operational Research 155.1 (2004): 1-21.
Example¶
[2]:
import numpy as np
from scipy.stats import uniform, randint
from miplearn.problems.multiknapsack import (
MultiKnapsackGenerator,
build_multiknapsack_model,
)
# Set random seed, to make example reproducible
np.random.seed(42)
# Generate ten similar random instances of the multiknapsack problem with
# ten items, five resources and weights around [0, 1000].
data = MultiKnapsackGenerator(
n=randint(low=10, high=11),
m=randint(low=5, high=6),
w=uniform(loc=0, scale=1000),
K=uniform(loc=100, scale=0),
u=uniform(loc=1, scale=0),
alpha=uniform(loc=0.25, scale=0),
w_jitter=uniform(loc=0.95, scale=0.1),
p_jitter=uniform(loc=0.75, scale=0.5),
fix_w=True,
).generate(10)
# Print data for one of the instances
print("prices\n", data[0].prices)
print("weights\n", data[0].weights)
print("capacities\n", data[0].capacities)
print()
# Build model and optimize
model = build_multiknapsack_model(data[0])
model.optimize()
prices
[350. 692. 454. 709. 605. 543. 321. 674. 571. 341.]
weights
[[392. 977. 764. 622. 158. 163. 56. 840. 574. 696.]
[ 20. 948. 860. 209. 178. 184. 293. 541. 414. 305.]
[629. 135. 278. 378. 466. 803. 205. 492. 584. 45.]
[630. 173. 64. 907. 947. 794. 312. 99. 711. 439.]
[117. 506. 35. 915. 266. 662. 312. 516. 521. 178.]]
capacities
[1310. 988. 1004. 1269. 1007.]
Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (linux64)
CPU model: AMD Ryzen 9 7950X 16-Core Processor, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Optimize a model with 5 rows, 10 columns and 50 nonzeros
Model fingerprint: 0xaf3ac15e
Variable types: 0 continuous, 10 integer (10 binary)
Coefficient statistics:
Matrix range [2e+01, 1e+03]
Objective range [3e+02, 7e+02]
Bounds range [1e+00, 1e+00]
RHS range [1e+03, 1e+03]
Found heuristic solution: objective -804.0000000
Presolve removed 0 rows and 3 columns
Presolve time: 0.00s
Presolved: 5 rows, 7 columns, 34 nonzeros
Variable types: 0 continuous, 7 integer (7 binary)
Root relaxation: objective -1.428726e+03, 4 iterations, 0.00 seconds (0.00 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 -1428.7265 0 4 -804.00000 -1428.7265 77.7% - 0s
H 0 0 -1279.000000 -1428.7265 11.7% - 0s
Cutting planes:
Cover: 1
Explored 1 nodes (4 simplex iterations) in 0.01 seconds (0.00 work units)
Thread count was 32 (of 32 available processors)
Solution count 2: -1279 -804
No other solutions better than -1279
Optimal solution found (tolerance 1.00e-04)
Best objective -1.279000000000e+03, best bound -1.279000000000e+03, gap 0.0000%
4.4. Capacitated P-Median¶
The capacitated p-median problem is a variation of the classic \(p\)-median problem, in which a set of customers must be served by a set of facilities. In the capacitated \(p\)-Median problem, each facility has a fixed capacity, and the goal is to minimize the total cost of serving the customers while ensuring that the capacity of each facility is not exceeded. Variations of problem are often used in logistics and supply chain management to determine the most efficient locations for warehouses or distribution centers.
Formulation¶
Let \(I=\{1,\ldots,n\}\) be the set of customers. For each customer \(i \in I\), let \(d_i\) be its demand and let \(y_i\) be a binary decision variable that equals one if we decide to open a facility at that customer’s location. For each pair \((i,j) \in I \times I\), let \(x_{ij}\) be a binary decision variable that equals one if customer \(i\) is assigned to facility \(j\). Furthermore, let \(w_{ij}\) be the cost of serving customer \(i\) from facility \(j\), let \(p\) be the number of facilities we must open, and let \(c_j\) be the capacity of facility \(j\). The problem is formulated as:
Random instance generator¶
The class PMedianGenerator can be used to generate random instances of this problem. First, it decides the number of customers and the parameter \(p\) by sampling the provided n
and p
distributions, respectively. Then, for each customer \(i\), the class builds its geographical location \((x_i, y_i)\) by sampling the provided x
and y
distributions. For each \(i\), the demand for customer \(i\)
and the capacity of facility \(i\) are decided by sampling the provided distributions demands
and capacities
, respectively. Finally, the costs \(w_{ij}\) are set to the Euclidean distance between the locations of customers \(i\) and \(j\).
If fixed=True
, then the number of customers, their locations, the parameter \(p\), the demands and the capacities are only sampled from their respective distributions exactly once, to build a reference instance which is then randomly perturbed. Specifically, in each perturbation, the distances, demands and capacities are multiplied by random scaling factors sampled from the distributions distances_jitter
, demands_jitter
and capacities_jitter
, respectively. The result is a
list of instances that have the same set of customers, but slightly different demands, capacities and distances.
Example¶
[3]:
import numpy as np
from scipy.stats import uniform, randint
from miplearn.problems.pmedian import PMedianGenerator, build_pmedian_model
# Set random seed, to make example reproducible
np.random.seed(42)
# Generate random instances with ten customers located in a
# 100x100 square, with demands in [0,10], capacities in [0, 250].
data = PMedianGenerator(
x=uniform(loc=0.0, scale=100.0),
y=uniform(loc=0.0, scale=100.0),
n=randint(low=10, high=11),
p=randint(low=5, high=6),
demands=uniform(loc=0, scale=10),
capacities=uniform(loc=0, scale=250),
distances_jitter=uniform(loc=0.9, scale=0.2),
demands_jitter=uniform(loc=0.9, scale=0.2),
capacities_jitter=uniform(loc=0.9, scale=0.2),
fixed=True,
).generate(10)
# Print data for one of the instances
print("p =", data[0].p)
print("distances =\n", data[0].distances)
print("demands =", data[0].demands)
print("capacities =", data[0].capacities)
print()
# Build and optimize model
model = build_pmedian_model(data[0])
model.optimize()
p = 5
distances =
[[ 0. 50.17 82.42 32.76 33.2 35.45 86.88 79.11 43.17 66.2 ]
[ 50.17 0. 72.64 72.51 17.06 80.25 39.92 68.93 43.41 42.96]
[ 82.42 72.64 0. 71.69 70.92 82.51 67.88 3.76 39.74 30.73]
[ 32.76 72.51 71.69 0. 56.56 11.03 101.35 69.39 42.09 68.58]
[ 33.2 17.06 70.92 56.56 0. 63.68 54.71 67.16 34.89 44.99]
[ 35.45 80.25 82.51 11.03 63.68 0. 111.04 80.29 52.78 79.36]
[ 86.88 39.92 67.88 101.35 54.71 111.04 0. 65.13 61.37 40.82]
[ 79.11 68.93 3.76 69.39 67.16 80.29 65.13 0. 36.26 27.24]
[ 43.17 43.41 39.74 42.09 34.89 52.78 61.37 36.26 0. 26.62]
[ 66.2 42.96 30.73 68.58 44.99 79.36 40.82 27.24 26.62 0. ]]
demands = [6.12 1.39 2.92 3.66 4.56 7.85 2. 5.14 5.92 0.46]
capacities = [151.89 42.63 16.26 237.22 241.41 202.1 76.15 24.42 171.06 110.04]
Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (linux64)
CPU model: AMD Ryzen 9 7950X 16-Core Processor, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Optimize a model with 21 rows, 110 columns and 220 nonzeros
Model fingerprint: 0x8d8d9346
Variable types: 0 continuous, 110 integer (110 binary)
Coefficient statistics:
Matrix range [5e-01, 2e+02]
Objective range [4e+00, 1e+02]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 5e+00]
Found heuristic solution: objective 368.7900000
Presolve time: 0.00s
Presolved: 21 rows, 110 columns, 220 nonzeros
Variable types: 0 continuous, 110 integer (110 binary)
Found heuristic solution: objective 245.6400000
Root relaxation: objective 0.000000e+00, 18 iterations, 0.00 seconds (0.00 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 0.00000 0 6 245.64000 0.00000 100% - 0s
H 0 0 185.1900000 0.00000 100% - 0s
H 0 0 148.6300000 17.14595 88.5% - 0s
H 0 0 113.1800000 17.14595 84.9% - 0s
0 0 17.14595 0 10 113.18000 17.14595 84.9% - 0s
H 0 0 99.5000000 17.14595 82.8% - 0s
H 0 0 98.3900000 17.14595 82.6% - 0s
H 0 0 93.9800000 64.28872 31.6% - 0s
0 0 64.28872 0 15 93.98000 64.28872 31.6% - 0s
H 0 0 93.9200000 64.28872 31.5% - 0s
0 0 86.06884 0 15 93.92000 86.06884 8.36% - 0s
* 0 0 0 91.2300000 91.23000 0.00% - 0s
Explored 1 nodes (70 simplex iterations) in 0.02 seconds (0.00 work units)
Thread count was 32 (of 32 available processors)
Solution count 10: 91.23 93.92 93.98 ... 368.79
Optimal solution found (tolerance 1.00e-04)
Best objective 9.123000000000e+01, best bound 9.123000000000e+01, gap 0.0000%
4.5. Set cover¶
The set cover problem is a classical NP-hard optimization problem which aims to minimize the number of sets needed to cover all elements in a given universe. Each set may contain a different number of elements, and sets may overlap with each other. This problem can be useful in various real-world scenarios such as scheduling, resource allocation, and network design.
Formulation¶
Let \(U = \{1,\ldots,n\}\) be a given universe set, and let \(S=\{S_1,\ldots,S_m\}\) be a collection of sets whose union equal \(U\). For each \(j \in \{1,\ldots,m\}\), let \(w_j\) be the weight of set \(S_j\), and let \(x_j\) be a binary decision variable that equals one if set \(S_j\) is chosen. The set cover problem is formulated as:
Random instance generator¶
The class SetCoverGenerator can generate random instances of this problem. The class first decides the number of elements and sets by sampling the provided distributions n_elements
and n_sets
, respectively. Then it generates a random incidence matrix \(M\), as follows:
The density \(d\) of \(M\) is decided by sampling the provided probability distribution
density
.Each entry of \(M\) is then sampled from the Bernoulli distribution, with probability \(d\).
To ensure that each element belongs to at least one set, the class identifies elements that are not contained in any set, then assigns them to a random set (chosen uniformly).
Similarly, to ensure that each set contains at least one element, the class identifies empty sets, then modifies them to include one random element (chosen uniformly).
Finally, the weight of set \(j\) is set to \(w_j + K | S_j |\), where \(w_j\) and \(k\) are sampled from costs
and K
, respectively, and where \(|S_j|\) denotes the size of set \(S_j\). The parameter \(K\) is used to introduce some correlation between the size of the set and its weight, making the instance more challenging. Note that K
is only sampled once for the entire instance.
If fix_sets=True
, then all generated instances have exactly the same sets and elements. The costs of the sets, however, are multiplied by random scaling factors sampled from the provided probability distribution costs_jitter
.
Example¶
[4]:
import numpy as np
from scipy.stats import uniform, randint
from miplearn.problems.setcover import SetCoverGenerator, build_setcover_model_gurobipy
# Set random seed, to make example reproducible
np.random.seed(42)
# Build random instances with five elements, ten sets and costs
# in the [0, 1000] interval, with a correlation factor of 25 and
# an incidence matrix with 25% density.
data = SetCoverGenerator(
n_elements=randint(low=5, high=6),
n_sets=randint(low=10, high=11),
costs=uniform(loc=0.0, scale=1000.0),
costs_jitter=uniform(loc=0.90, scale=0.20),
density=uniform(loc=0.5, scale=0.00),
K=uniform(loc=25.0, scale=0.0),
fix_sets=True,
).generate(10)
# Print problem data for one instance
print("matrix\n", data[0].incidence_matrix)
print("costs", data[0].costs)
print()
# Build and optimize model
model = build_setcover_model_gurobipy(data[0])
model.optimize()
matrix
[[1 0 0 0 1 1 1 0 0 0]
[1 0 0 1 1 1 1 0 1 1]
[0 1 1 1 1 0 1 0 0 1]
[0 1 1 0 0 0 1 1 0 1]
[1 1 1 0 1 0 1 0 0 1]]
costs [1044.58 850.13 1014.5 944.83 697.9 971.87 213.49 220.98 70.23
425.33]
Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (linux64)
CPU model: AMD Ryzen 9 7950X 16-Core Processor, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Optimize a model with 5 rows, 10 columns and 28 nonzeros
Model fingerprint: 0xe5c2d4fa
Variable types: 0 continuous, 10 integer (10 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [7e+01, 1e+03]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+00]
Found heuristic solution: objective 213.4900000
Presolve removed 5 rows and 10 columns
Presolve time: 0.00s
Presolve: All rows and columns removed
Explored 0 nodes (0 simplex iterations) in 0.00 seconds (0.00 work units)
Thread count was 1 (of 32 available processors)
Solution count 1: 213.49
Optimal solution found (tolerance 1.00e-04)
Best objective 2.134900000000e+02, best bound 2.134900000000e+02, gap 0.0000%
4.6. Set Packing¶
Set packing is a classical optimization problem that asks for the maximum number of disjoint sets within a given list. This problem often arises in real-world situations where a finite number of resources need to be allocated to tasks, such as airline flight crew scheduling.
Formulation¶
Let \(U=\{1,\ldots,n\}\) be a given universe set, and let \(S = \{S_1, \ldots, S_m\}\) be a collection of subsets of \(U\). For each subset \(j \in \{1, \ldots, m\}\), let \(w_j\) be the weight of \(S_j\) and let \(x_j\) be a binary decision variable which equals one if set \(S_j\) is chosen. The problem is formulated as:
Random instance generator¶
The class SetPackGenerator can generate random instances of this problem. It accepts exactly the same arguments, and generates instance data in exactly the same way as SetCoverGenerator. For more details, please see the documentation for that class.
Example¶
[5]:
import numpy as np
from scipy.stats import uniform, randint
from miplearn.problems.setpack import SetPackGenerator, build_setpack_model
# Set random seed, to make example reproducible
np.random.seed(42)
# Build random instances with five elements, ten sets and costs
# in the [0, 1000] interval, with a correlation factor of 25 and
# an incidence matrix with 25% density.
data = SetPackGenerator(
n_elements=randint(low=5, high=6),
n_sets=randint(low=10, high=11),
costs=uniform(loc=0.0, scale=1000.0),
costs_jitter=uniform(loc=0.90, scale=0.20),
density=uniform(loc=0.5, scale=0.00),
K=uniform(loc=25.0, scale=0.0),
fix_sets=True,
).generate(10)
# Print problem data for one instance
print("matrix\n", data[0].incidence_matrix)
print("costs", data[0].costs)
print()
# Build and optimize model
model = build_setpack_model(data[0])
model.optimize()
matrix
[[1 0 0 0 1 1 1 0 0 0]
[1 0 0 1 1 1 1 0 1 1]
[0 1 1 1 1 0 1 0 0 1]
[0 1 1 0 0 0 1 1 0 1]
[1 1 1 0 1 0 1 0 0 1]]
costs [1044.58 850.13 1014.5 944.83 697.9 971.87 213.49 220.98 70.23
425.33]
Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (linux64)
CPU model: AMD Ryzen 9 7950X 16-Core Processor, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Optimize a model with 5 rows, 10 columns and 28 nonzeros
Model fingerprint: 0x4ee91388
Variable types: 0 continuous, 10 integer (10 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [7e+01, 1e+03]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+00]
Found heuristic solution: objective -1265.560000
Presolve removed 5 rows and 10 columns
Presolve time: 0.00s
Presolve: All rows and columns removed
Explored 0 nodes (0 simplex iterations) in 0.00 seconds (0.00 work units)
Thread count was 1 (of 32 available processors)
Solution count 2: -1986.37 -1265.56
No other solutions better than -1986.37
Optimal solution found (tolerance 1.00e-04)
Best objective -1.986370000000e+03, best bound -1.986370000000e+03, gap 0.0000%
4.7. Stable Set¶
The maximum-weight stable set problem is a classical optimization problem in graph theory which asks for the maximum-weight subset of vertices in a graph such that no two vertices in the subset are adjacent. The problem often arises in real-world scheduling or resource allocation situations, where stable sets represent tasks or resources that can be chosen simultaneously without conflicts.
Formulation¶
Let \(G=(V,E)\) be a simple undirected graph, and for each vertex \(v \in V\), let \(w_v\) be its weight. The problem is formulated as:
where \(\mathcal{C}\) is the set of cliques in \(G\). We recall that a clique is a subset of vertices in which every pair of vertices is adjacent.
Random instance generator¶
The class MaxWeightStableSetGenerator can be used to generate random instances of this problem. The class first samples the user-provided probability distributions n
and p
to decide the number of vertices and the density of the graph. Then, it generates a random Erdős-Rényi graph \(G_{n,p}\). We recall that, in such a graph, each potential edge is included with probabilty \(p\), independently for each
other. The class then samples the provided probability distribution w
to decide the vertex weights.
If fix_graph=True
, then all generated instances have the same random graph. For each instance, the weights are decided by sampling w
, as described above.
Example¶
[6]:
import random
import numpy as np
from scipy.stats import uniform, randint
from miplearn.problems.stab import (
MaxWeightStableSetGenerator,
build_stab_model_gurobipy,
)
# Set random seed to make example reproducible
random.seed(42)
np.random.seed(42)
# Generate random instances with a fixed 10-node graph,
# 25% density and random weights in the [0, 100] interval.
data = MaxWeightStableSetGenerator(
w=uniform(loc=0.0, scale=100.0),
n=randint(low=10, high=11),
p=uniform(loc=0.25, scale=0.0),
fix_graph=True,
).generate(10)
# Print the graph and weights for two instances
print("graph", data[0].graph.edges)
print("weights[0]", data[0].weights)
print("weights[1]", data[1].weights)
print()
# Load and optimize the first instance
model = build_stab_model_gurobipy(data[0])
model.optimize()
graph [(0, 2), (0, 4), (0, 8), (1, 2), (1, 3), (1, 5), (1, 6), (1, 9), (2, 5), (2, 9), (3, 6), (3, 7), (6, 9), (7, 8), (8, 9)]
weights[0] [37.45 95.07 73.2 59.87 15.6 15.6 5.81 86.62 60.11 70.81]
weights[1] [ 2.06 96.99 83.24 21.23 18.18 18.34 30.42 52.48 43.19 29.12]
Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (linux64)
CPU model: AMD Ryzen 9 7950X 16-Core Processor, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Optimize a model with 10 rows, 10 columns and 24 nonzeros
Model fingerprint: 0xf4c21689
Variable types: 0 continuous, 10 integer (10 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [6e+00, 1e+02]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+00]
Found heuristic solution: objective -219.1400000
Presolve removed 2 rows and 2 columns
Presolve time: 0.00s
Presolved: 8 rows, 8 columns, 19 nonzeros
Variable types: 0 continuous, 8 integer (8 binary)
Root relaxation: objective -2.205650e+02, 4 iterations, 0.00 seconds (0.00 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 infeasible 0 -219.14000 -219.14000 0.00% - 0s
Explored 1 nodes (4 simplex iterations) in 0.01 seconds (0.00 work units)
Thread count was 32 (of 32 available processors)
Solution count 1: -219.14
No other solutions better than -219.14
Optimal solution found (tolerance 1.00e-04)
Best objective -2.191400000000e+02, best bound -2.191400000000e+02, gap 0.0000%
4.8. Traveling Salesman¶
Given a list of cities and the distances between them, the traveling salesman problem asks for the shortest route starting at the first city, visiting each other city exactly once, then returning to the first city. This problem is a generalization of the Hamiltonian path problem, one of Karp’s 21 NP-complete problems, and has many practical applications, including routing delivery trucks and scheduling airline routes.
Formulation¶
Let \(G=(V,E)\) be a simple undirected graph. For each edge \(e \in E\), let \(d_e\) be its weight (or distance) and let \(x_e\) be a binary decision variable which equals one if \(e\) is included in the route. The problem is formulated as:
where \(\delta(v)\) denotes the set of edges adjacent to vertex \(v\), and \(\delta(S)\) denotes the set of edges that have one extremity in \(S\) and one in \(V \setminus S\). Because of its exponential size, we enforce the second set of inequalities as lazy constraints.
Random instance generator¶
The class TravelingSalesmanGenerator can be used to generate random instances of this problem. Initially, the class samples the user-provided probability distribution n
to decide how many cities to generate. Then, for each city \(i\), the class generates its geographical location \((x_i, y_i)\) by sampling the provided distributions x
and y
. The distance \(d_{ij}\) between cities \(i\) and
\(j\) is then set to
where \(\gamma\) is a random scaling factor sampled from the provided probability distribution gamma
.
If fix_cities=True
, then the list of cities is kept the same for all generated instances. The \(\gamma\) values, however, and therefore also the distances, are still different. By default, all distances \(d_{ij}\) are rounded to the nearest integer. If round=False
is provided, this rounding will be disabled.
Example¶
[7]:
import random
import numpy as np
from scipy.stats import uniform, randint
from miplearn.problems.tsp import TravelingSalesmanGenerator, build_tsp_model
# Set random seed to make example reproducible
random.seed(42)
np.random.seed(42)
# Generate random instances with a fixed ten cities in the 1000x1000 box
# and random distance scaling factors in the [0.90, 1.10] interval.
data = TravelingSalesmanGenerator(
n=randint(low=10, high=11),
x=uniform(loc=0.0, scale=1000.0),
y=uniform(loc=0.0, scale=1000.0),
gamma=uniform(loc=0.90, scale=0.20),
fix_cities=True,
round=True,
).generate(10)
# Print distance matrices for the first two instances
print("distances[0]\n", data[0].distances)
print("distances[1]\n", data[1].distances)
print()
# Load and optimize the first instance
model = build_tsp_model(data[0])
model.optimize()
distances[0]
[[ 0. 513. 762. 358. 325. 374. 932. 731. 391. 634.]
[ 513. 0. 726. 765. 163. 754. 409. 719. 446. 400.]
[ 762. 726. 0. 780. 756. 744. 656. 40. 383. 334.]
[ 358. 765. 780. 0. 549. 117. 925. 702. 422. 728.]
[ 325. 163. 756. 549. 0. 663. 526. 708. 377. 462.]
[ 374. 754. 744. 117. 663. 0. 1072. 802. 501. 853.]
[ 932. 409. 656. 925. 526. 1072. 0. 654. 603. 433.]
[ 731. 719. 40. 702. 708. 802. 654. 0. 381. 255.]
[ 391. 446. 383. 422. 377. 501. 603. 381. 0. 287.]
[ 634. 400. 334. 728. 462. 853. 433. 255. 287. 0.]]
distances[1]
[[ 0. 493. 900. 354. 323. 367. 841. 727. 444. 668.]
[ 493. 0. 690. 687. 175. 725. 368. 744. 398. 446.]
[ 900. 690. 0. 666. 728. 827. 736. 41. 371. 317.]
[ 354. 687. 666. 0. 570. 104. 1090. 712. 454. 648.]
[ 323. 175. 728. 570. 0. 655. 521. 650. 356. 469.]
[ 367. 725. 827. 104. 655. 0. 1146. 779. 476. 752.]
[ 841. 368. 736. 1090. 521. 1146. 0. 681. 565. 394.]
[ 727. 744. 41. 712. 650. 779. 681. 0. 374. 286.]
[ 444. 398. 371. 454. 356. 476. 565. 374. 0. 274.]
[ 668. 446. 317. 648. 469. 752. 394. 286. 274. 0.]]
Set parameter LazyConstraints to value 1
Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (linux64)
CPU model: AMD Ryzen 9 7950X 16-Core Processor, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Optimize a model with 10 rows, 45 columns and 90 nonzeros
Model fingerprint: 0x719675e5
Variable types: 0 continuous, 45 integer (45 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [4e+01, 1e+03]
Bounds range [1e+00, 1e+00]
RHS range [2e+00, 2e+00]
Presolve time: 0.00s
Presolved: 10 rows, 45 columns, 90 nonzeros
Variable types: 0 continuous, 45 integer (45 binary)
Root relaxation: objective 2.921000e+03, 17 iterations, 0.00 seconds (0.00 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
* 0 0 0 2921.0000000 2921.00000 0.00% - 0s
Cutting planes:
Lazy constraints: 3
Explored 1 nodes (17 simplex iterations) in 0.01 seconds (0.00 work units)
Thread count was 32 (of 32 available processors)
Solution count 1: 2921
Optimal solution found (tolerance 1.00e-04)
Best objective 2.921000000000e+03, best bound 2.921000000000e+03, gap 0.0000%
User-callback calls 106, time in user-callback 0.00 sec
4.9. Unit Commitment¶
The unit commitment problem is a mixed-integer optimization problem which asks which power generation units should be turned on and off, at what time, and at what capacity, in order to meet the demand for electricity generation at the lowest cost. Numerous operational constraints are typically enforced, such as ramping constraints, which prevent generation units from changing power output levels too quickly from one time step to the next, and minimum-up and minimum-down constraints, which prevent units from switching on and off too frequently. The unit commitment problem is widely used in power systems planning and operations.
Note
MIPLearn includes a simple formulation for the unit commitment problem, which enforces only minimum and maximum power production, as well as minimum-up and minimum-down constraints. The formulation does not enforce, for example, ramping trajectories, piecewise-linear cost curves, start-up costs or transmission and n-1 security constraints. For a more complete set of formulations, solution methods and realistic benchmark instances for the problem, see UnitCommitment.jl.
Formulation¶
Let \(T\) be the number of time steps, \(G\) be the number of generation units, and let \(D_t\) be the power demand (in MW) at time \(t\). For each generating unit \(g\), let \(P^\max_g\) and \(P^\min_g\) be the maximum and minimum amount of power the unit is able to produce when switched on; let \(L_g\) and \(l_g\) be the minimum up- and down-time for unit \(g\); let \(C^\text{fixed}\) be the cost to keep unit \(g\) on for one time step, regardless of its power output level; let \(C^\text{start}\) be the cost to switch unit \(g\) on; and let \(C^\text{var}\) be the cost for generator \(g\) to produce 1 MW of power. In this formulation, we assume linear production costs. For each generator \(g\) and time \(t\), let \(x_{gt}\) be a binary variable which equals one if unit \(g\) is on at time \(t\), let \(w_{gt}\) be a binary variable which equals one if unit \(g\) switches from being off at time \(t-1\) to being on at time \(t\), and let \(p_{gt}\) be a continuous variable which indicates the amount of power generated. The formulation is given by:
The first set of inequalities enforces minimum up-time constraints: if unit \(g\) is down at time \(t\), then it cannot start up during the previous \(L_g\) time steps. The second set of inequalities enforces minimum down-time constraints, and is symmetrical to the previous one. The third set ensures that if unit \(g\) starts up at time \(t\), then the start up variable must be one. The fourth set ensures that demand is satisfied at each time period. The fifth and sixth sets enforce bounds to the quantity of power generated by each unit.
References
Bendotti, P., Fouilhoux, P. & Rottner, C. The min-up/min-down unit commitment polytope. J Comb Optim 36, 1024-1058 (2018). https://doi.org/10.1007/s10878-018-0273-y
Random instance generator¶
The class UnitCommitmentGenerator
can be used to generate random instances of this problem.
First, the user-provided probability distributions n_units
and n_periods
are sampled to determine the number of generating units and the number of time steps, respectively. Then, for each unit, the probabilities max_power
and min_power
are sampled to determine the unit’s maximum and minimum power output. To make it easier to generate valid ranges, min_power
is not specified as the absolute power level in MW, but rather as a multiplier of max_power
; for example, if
max_power
samples to 100 and min_power
samples to 0.5, then the unit’s power range is set to [50,100]
. Then, the distributions cost_startup
, cost_prod
and cost_fixed
are sampled to determine the unit’s startup, variable and fixed costs, while the distributions min_uptime
and min_downtime
are sampled to determine its minimum up/down-time.
After parameters for the units have been generated, the class then generates a periodic demand curve, with a peak every 12 time steps, in the range \((0.4C, 0.8C)\), where \(C\) is the sum of all units’ maximum power output. Finally, all costs and demand values are perturbed by random scaling factors independently sampled from the distributions cost_jitter
and demand_jitter
, respectively.
If fix_units=True
, then the list of generators (with their respective parameters) is kept the same for all generated instances. If cost_jitter
and demand_jitter
are provided, the instances will still have slightly different costs and demands.
Example¶
[8]:
import random
import numpy as np
from scipy.stats import uniform, randint
from miplearn.problems.uc import UnitCommitmentGenerator, build_uc_model
# Set random seed to make example reproducible
random.seed(42)
np.random.seed(42)
# Generate a random instance with 5 generators and 24 time steps
data = UnitCommitmentGenerator(
n_units=randint(low=5, high=6),
n_periods=randint(low=24, high=25),
max_power=uniform(loc=50, scale=450),
min_power=uniform(loc=0.5, scale=0.25),
cost_startup=uniform(loc=0, scale=10_000),
cost_prod=uniform(loc=0, scale=50),
cost_fixed=uniform(loc=0, scale=1_000),
min_uptime=randint(low=2, high=8),
min_downtime=randint(low=2, high=8),
cost_jitter=uniform(loc=0.75, scale=0.5),
demand_jitter=uniform(loc=0.9, scale=0.2),
fix_units=True,
).generate(10)
# Print problem data for the two first instances
for i in range(2):
print(f"min_power[{i}]", data[i].min_power)
print(f"max_power[{i}]", data[i].max_power)
print(f"min_uptime[{i}]", data[i].min_uptime)
print(f"min_downtime[{i}]", data[i].min_downtime)
print(f"min_power[{i}]", data[i].min_power)
print(f"cost_startup[{i}]", data[i].cost_startup)
print(f"cost_prod[{i}]", data[i].cost_prod)
print(f"cost_fixed[{i}]", data[i].cost_fixed)
print(f"demand[{i}]\n", data[i].demand)
print()
# Load and optimize the first instance
model = build_uc_model(data[0])
model.optimize()
min_power[0] [117.79 245.85 271.85 207.7 81.38]
max_power[0] [218.54 477.82 379.4 319.4 120.21]
min_uptime[0] [7 6 3 5 7]
min_downtime[0] [7 3 5 6 2]
min_power[0] [117.79 245.85 271.85 207.7 81.38]
cost_startup[0] [3042.42 5247.56 4319.45 2912.29 6118.53]
cost_prod[0] [ 6.97 14.61 18.32 22.8 39.26]
cost_fixed[0] [199.67 514.23 592.41 46.45 607.54]
demand[0]
[ 905.06 915.41 1166.52 1212.29 1127.81 953.52 905.06 796.21 783.78
866.23 768.62 899.59 905.06 946.23 1087.61 1004.24 1048.36 992.03
905.06 750.82 691.48 606.15 658.5 809.95]
min_power[1] [117.79 245.85 271.85 207.7 81.38]
max_power[1] [218.54 477.82 379.4 319.4 120.21]
min_uptime[1] [7 6 3 5 7]
min_downtime[1] [7 3 5 6 2]
min_power[1] [117.79 245.85 271.85 207.7 81.38]
cost_startup[1] [2458.08 6200.26 4585.74 2666.05 4783.34]
cost_prod[1] [ 6.31 13.33 20.42 24.37 46.86]
cost_fixed[1] [196.9 416.42 655.57 52.51 626.15]
demand[1]
[ 981.42 840.07 1095.59 1102.03 1088.41 932.29 863.67 848.56 761.33
828.28 775.18 834.99 959.76 865.72 1193.52 1058.92 985.19 893.92
962.16 781.88 723.15 639.04 602.4 787.02]
Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (linux64)
CPU model: AMD Ryzen 9 7950X 16-Core Processor, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Optimize a model with 578 rows, 360 columns and 2128 nonzeros
Model fingerprint: 0x4dc1c661
Variable types: 120 continuous, 240 integer (240 binary)
Coefficient statistics:
Matrix range [1e+00, 5e+02]
Objective range [7e+00, 6e+03]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+03]
Presolve removed 244 rows and 131 columns
Presolve time: 0.02s
Presolved: 334 rows, 229 columns, 842 nonzeros
Variable types: 116 continuous, 113 integer (113 binary)
Found heuristic solution: objective 440662.46430
Found heuristic solution: objective 429461.97680
Found heuristic solution: objective 374043.64040
Root relaxation: objective 3.361348e+05, 142 iterations, 0.00 seconds (0.00 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 336134.820 0 18 374043.640 336134.820 10.1% - 0s
H 0 0 368600.14450 336134.820 8.81% - 0s
H 0 0 364721.76610 336134.820 7.84% - 0s
0 0 cutoff 0 364721.766 364721.766 0.00% - 0s
Cutting planes:
Gomory: 3
Cover: 8
Implied bound: 29
Clique: 222
MIR: 7
Flow cover: 7
RLT: 1
Relax-and-lift: 7
Explored 1 nodes (234 simplex iterations) in 0.04 seconds (0.02 work units)
Thread count was 32 (of 32 available processors)
Solution count 5: 364722 368600 374044 ... 440662
Optimal solution found (tolerance 1.00e-04)
Best objective 3.647217661000e+05, best bound 3.647217661000e+05, gap 0.0000%
4.10. Vertex Cover¶
Minimum weight vertex cover is a classical optimization problem in graph theory where the goal is to find the minimum-weight set of vertices that are connected to all of the edges in the graph. The problem generalizes one of Karp’s 21 NP-complete problems and has applications in various fields, including bioinformatics and machine learning.
Formulation¶
Let \(G=(V,E)\) be a simple graph. For each vertex \(v \in V\), let \(w_g\) be its weight, and let \(x_v\) be a binary decision variable which equals one if \(v\) is included in the cover. The mixed-integer linear formulation for the problem is given by:
Random instance generator¶
The class MinWeightVertexCoverGenerator can be used to generate random instances of this problem. The class accepts exactly the same parameters and behaves exactly in the same way as MaxWeightStableSetGenerator. See the stable set section for more details.
Example¶
[9]:
import random
import numpy as np
from scipy.stats import uniform, randint
from miplearn.problems.vertexcover import (
MinWeightVertexCoverGenerator,
build_vertexcover_model,
)
# Set random seed to make example reproducible
random.seed(42)
np.random.seed(42)
# Generate random instances with a fixed 10-node graph,
# 25% density and random weights in the [0, 100] interval.
data = MinWeightVertexCoverGenerator(
w=uniform(loc=0.0, scale=100.0),
n=randint(low=10, high=11),
p=uniform(loc=0.25, scale=0.0),
fix_graph=True,
).generate(10)
# Print the graph and weights for two instances
print("graph", data[0].graph.edges)
print("weights[0]", data[0].weights)
print("weights[1]", data[1].weights)
print()
# Load and optimize the first instance
model = build_vertexcover_model(data[0])
model.optimize()
graph [(0, 2), (0, 4), (0, 8), (1, 2), (1, 3), (1, 5), (1, 6), (1, 9), (2, 5), (2, 9), (3, 6), (3, 7), (6, 9), (7, 8), (8, 9)]
weights[0] [37.45 95.07 73.2 59.87 15.6 15.6 5.81 86.62 60.11 70.81]
weights[1] [ 2.06 96.99 83.24 21.23 18.18 18.34 30.42 52.48 43.19 29.12]
Gurobi Optimizer version 10.0.1 build v10.0.1rc0 (linux64)
CPU model: AMD Ryzen 9 7950X 16-Core Processor, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 16 physical cores, 32 logical processors, using up to 32 threads
Optimize a model with 15 rows, 10 columns and 30 nonzeros
Model fingerprint: 0x2d2d1390
Variable types: 0 continuous, 10 integer (10 binary)
Coefficient statistics:
Matrix range [1e+00, 1e+00]
Objective range [6e+00, 1e+02]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 1e+00]
Found heuristic solution: objective 301.0000000
Presolve removed 7 rows and 2 columns
Presolve time: 0.00s
Presolved: 8 rows, 8 columns, 19 nonzeros
Variable types: 0 continuous, 8 integer (8 binary)
Root relaxation: objective 2.995750e+02, 8 iterations, 0.00 seconds (0.00 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 infeasible 0 301.00000 301.00000 0.00% - 0s
Explored 1 nodes (8 simplex iterations) in 0.01 seconds (0.00 work units)
Thread count was 32 (of 32 available processors)
Solution count 1: 301
Optimal solution found (tolerance 1.00e-04)
Best objective 3.010000000000e+02, best bound 3.010000000000e+02, gap 0.0000%
[ ]: