7. Primal Components¶
In MIPLearn, a primal component is class that uses machine learning to predict a (potentially partial) assignment of values to the decision variables of the problem. Predicting high-quality primal solutions may be beneficial, as they allow the MIP solver to prune potentially large portions of the search space. Alternatively, if proof of optimality is not required, the MIP solver can be used to complete the partial solution generated by the machine learning model and and double-check its feasibility. MIPLearn allows both of these usage patterns.
In this page, we describe the four primal components currently included in MIPLearn, which employ machine learning in different ways. Each component is highly configurable, and accepts an user-provided machine learning model, which it uses for all predictions. Each component can also be configured to provide the solution to the solver in multiple ways, depending on whether proof of optimality is required.
7.1. Primal component actions¶
Before presenting the primal components themselves, we briefly discuss the three ways a solution may be provided to the solver. Each approach has benefits and limitations, which we also discuss in this section. All primal components can be configured to use any of the following approaches.
The first approach is to provide the solution to the solver as a warm start. This is implemented by the class SetWarmStart. The main advantage is that this method maintains all optimality and feasibility guarantees of the MIP solver, while still providing significant performance benefits for various classes of problems. If the machine learning model is able to predict multiple solutions, it is also possible to set multiple warm starts. In this case, the solver evaluates each warm start, discards the infeasible ones, then proceeds with the one that has the best objective value. The main disadvantage of this approach, compared to the next two, is that it provides relatively modest speedups for most problem classes, and no speedup at all for many others, even when the machine learning predictions are 100% accurate.
The second approach is to fix the decision variables to their predicted values, then solve a restricted optimization problem on the remaining variables. This approach is implemented by the class FixVariables
. The main advantage is its potential speedup: if machine learning can accurately predict values for a significant portion of the decision variables, then the MIP solver can typically complete the solution in a small fraction of the time it would take to find the same solution from
scratch. The main disadvantage of this approach is that it loses optimality guarantees; that is, the complete solution found by the MIP solver may no longer be globally optimal. Also, if the machine learning predictions are not sufficiently accurate, there might not even be a feasible assignment for the variables that were left free.
Finally, the third approach, which tries to strike a balance between the two previous ones, is to enforce proximity to a given solution. This strategy is implemented by the class EnforceProximity
. More precisely, given values \(\bar{x}_1,\ldots,\bar{x}_n\) for a subset of binary decision variables \(x_1,\ldots,x_n\), this approach adds the constraint
to the problem, where \(k\) is a user-defined parameter, which indicates how many of the predicted variables are allowed to deviate from the machine learning suggestion. The main advantage of this approach, compared to fixing variables, is its tolerance to lower-quality machine learning predictions. Its main disadvantage is that it typically leads to smaller speedups, especially for larger values of \(k\). This approach also loses optimality guarantees.
7.2. Memorizing primal component¶
A simple machine learning strategy for the prediction of primal solutions is to memorize all distinct solutions seen during training, then try to predict, during inference time, which of those memorized solutions are most likely to be feasible and to provide a good objective value for the current instance. The most promising solutions may alternatively be combined into a single partial solution, which is then provided to the MIP solver. Both variations of this strategy are implemented by the
MemorizingPrimalComponent
class. Note that it is only applicable if the problem size, and in fact if the meaning of the decision variables, remains the same across problem instances.
More precisely, let \(I_1,\ldots,I_n\) be the training instances, and let \(\bar{x}^1,\ldots,\bar{x}^n\) be their respective optimal solutions. Given a new instance \(I_{n+1}\), MemorizingPrimalComponent
expects a user-provided binary classifier that assigns (through the predict_proba
method, following scikit-learn’s conventions) a score \(\delta_i\) to each solution \(\bar{x}^i\), such that solutions with higher score are more likely to be good solutions for
\(I_{n+1}\). The features provided to the classifier are the instance features computed by an user-provided extractor. Given these scores, the component then performs one of the following to actions, as decided by the user:
Selects the top \(k\) solutions with the highest scores and provides them to the solver; this is implemented by
SelectTopSolutions
, and it is typically used with theSetWarmStart
action.Merges the top \(k\) solutions into a single partial solution, then provides it to the solver. This is implemented by
MergeTopSolutions
. More precisely, suppose that the machine learning regressor ordered the solutions in the sequence \(\bar{x}^{i_1},\ldots,\bar{x}^{i_n}\), with the most promising solutions appearing first, and with ties being broken arbitrarily. The component starts by keeping only the \(k\) most promising solutions \(\bar{x}^{i_1},\ldots,\bar{x}^{i_k}\). Then it computes, for each binary decision variable \(x_l\), its average assigned value \(\tilde{x}_l\):\[\tilde{x}_l = \frac{1}{k} \sum_{j=1}^k \bar{x}^{i_j}_l.\]Finally, the component constructs a merged solution \(y\), defined as:
\[\begin{split}y_j = \begin{cases} 0 & \text{ if } \tilde{x}_l \le \theta_0 \\ 1 & \text{ if } \tilde{x}_l \ge \theta_1 \\ \square & \text{otherwise,} \end{cases}\end{split}\]where \(\theta_0\) and \(\theta_1\) are user-specified parameters, and where \(\square\) indicates that the variable is left undefined. The solution \(y\) is then provided by the solver using any of the three approaches defined in the previous section.
The above specification of MemorizingPrimalComponent
is meant to be as general as possible. Simpler strategies can be implemented by configuring this component in specific ways. For example, a simpler approach employed in the literature is to collect all optimal solutions, then provide the entire list of solutions to the solver as warm starts, without any filtering or post-processing. This strategy can be implemented with MemorizingPrimalComponent
by using a model that returns a constant
value for all solutions (e.g. scikit-learn’s DummyClassifier), then selecting the top \(n\) (instead of \(k\)) solutions. See example below. Another simple approach is taking the solution to the most similar instance, and using it, by itself, as a warm start. This can be implemented by using a model that computes distances between the current instance and the training ones (e.g. scikit-learn’s
KNeighborsClassifier), then select the solution to the nearest one. See also example below. More complex strategies, of course, can also be configured.
Examples¶
[1]:
from sklearn.dummy import DummyClassifier
from sklearn.neighbors import KNeighborsClassifier
from miplearn.components.primal.actions import (
SetWarmStart,
FixVariables,
EnforceProximity,
)
from miplearn.components.primal.mem import (
MemorizingPrimalComponent,
SelectTopSolutions,
MergeTopSolutions,
)
from miplearn.extractors.dummy import DummyExtractor
from miplearn.extractors.fields import H5FieldsExtractor
# Configures a memorizing primal component that collects
# all distinct solutions seen during training and provides
# them to the solver without any filtering or post-processing.
comp1 = MemorizingPrimalComponent(
clf=DummyClassifier(),
extractor=DummyExtractor(),
constructor=SelectTopSolutions(1_000_000),
action=SetWarmStart(),
)
# Configures a memorizing primal component that finds the
# training instance with the closest objective function, then
# fixes the decision variables to the values they assumed
# at the optimal solution for that instance.
comp2 = MemorizingPrimalComponent(
clf=KNeighborsClassifier(n_neighbors=1),
extractor=H5FieldsExtractor(
instance_fields=["static_var_obj_coeffs"],
),
constructor=SelectTopSolutions(1),
action=FixVariables(),
)
# Configures a memorizing primal component that finds the distinct
# solutions to the 10 most similar training problem instances,
# selects the 3 solutions that were most often optimal to these
# training instances, combines them into a single partial solution,
# then enforces proximity, allowing at most 3 variables to deviate
# from the machine learning suggestion.
comp3 = MemorizingPrimalComponent(
clf=KNeighborsClassifier(n_neighbors=10),
extractor=H5FieldsExtractor(instance_fields=["static_var_obj_coeffs"]),
constructor=MergeTopSolutions(k=3, thresholds=[0.25, 0.75]),
action=EnforceProximity(3),
)
7.3. Independent vars primal component¶
Instead of memorizing previously-seen primal solutions, it is also natural to use machine learning models to directly predict the values of the decision variables, constructing a solution from scratch. This approach has the benefit of potentially constructing novel high-quality solutions, never observed in the training data. Two variations of this strategy are supported by MIPLearn: (i) predicting the values of the decision variables independently, using multiple ML models; or (ii) predicting the values jointly, with a single model. We describe the first variation in this section, and the second variation in the next section.
Let \(I_1,\ldots,I_n\) be the training instances, and let \(\bar{x}^1,\ldots,\bar{x}^n\) be their respective optimal solutions. For each binary decision variable \(x_j\), the component IndependentVarsPrimalComponent
creates a copy of a user-provided binary classifier and trains it to predict the optimal value of \(x_j\), given \(\bar{x}^1_j,\ldots,\bar{x}^n_j\) as training labels. The features provided to the model are the variable features computed by an user-provided
extractor. During inference time, the component uses these \(n\) binary classifiers to construct a solution and provides it to the solver using one of the available actions.
Three issues often arise in practice when using this approach:
For certain binary variables \(x_j\), it is frequently the case that its optimal value is either always zero or always one in the training dataset, which poses problems to some standard scikit-learn classifiers, since they do not expect a single class. The wrapper
SingleClassFix
can be used to fix this issue (see example below).It is also frequently the case that machine learning classifier can only reliably predict the values of some variables with high accuracy, not all of them. In this situation, instead of computing a complete primal solution, it may be more beneficial to construct a partial solution containing values only for the variables for which the ML made a high-confidence prediction. The meta-classifier
MinProbabilityClassifier
can be used for this purpose. It asks the base classifier for the probability of the value being zero or one (using thepredict_proba
method) and erases from the primal solution all values whose probabilities are below a given threshold.To make multiple copies of the provided ML classifier, MIPLearn uses the standard
sklearn.base.clone
method, which may not be suitable for classifiers from other frameworks. To handle this, it is possible to override the clone function using theclone_fn
constructor argument.
Examples¶
[2]:
from sklearn.linear_model import LogisticRegression
from miplearn.classifiers.minprob import MinProbabilityClassifier
from miplearn.classifiers.singleclass import SingleClassFix
from miplearn.components.primal.indep import IndependentVarsPrimalComponent
from miplearn.extractors.AlvLouWeh2017 import AlvLouWeh2017Extractor
from miplearn.components.primal.actions import SetWarmStart
# Configures a primal component that independently predicts the value of each
# binary variable using logistic regression and provides it to the solver as
# warm start. Erases predictions with probability less than 99%; applies
# single-class fix; and uses AlvLouWeh2017 features.
comp = IndependentVarsPrimalComponent(
base_clf=SingleClassFix(
MinProbabilityClassifier(
base_clf=LogisticRegression(),
thresholds=[0.99, 0.99],
),
),
extractor=AlvLouWeh2017Extractor(),
action=SetWarmStart(),
)
7.4. Joint vars primal component¶
In the previous subsection, we used multiple machine learning models to independently predict the values of the binary decision variables. When these values are correlated, an alternative approach is to jointly predict the values of all binary variables using a single machine learning model. This strategy is implemented by JointVarsPrimalComponent
. Compared to the previous ones, this component is much more straightforwad. It simply extracts instance features, using the user-provided feature
extractor, then directly trains the user-provided binary classifier (using the fit
method), without making any copies. The trained classifier is then used to predict entire solutions (using the predict
method), which are given to the solver using one of the previously discussed methods. In the example below, we illustrate the usage of this component with a simple feed-forward neural network.
JointVarsPrimalComponent
can also be used to implement strategies that use multiple machine learning models, but not indepedently. For example, a common strategy in multioutput prediction is building a classifier chain. In this approach, the first decision variable is predicted using the instance features alone; but the \(n\)-th decision variable is predicted using the instance features plus the predicted values of the \(n-1\) previous variables. This can be easily implemented
using scikit-learn’s ClassifierChain
estimator, as shown in the example below.
Examples¶
[3]:
from sklearn.multioutput import ClassifierChain
from sklearn.neural_network import MLPClassifier
from miplearn.components.primal.joint import JointVarsPrimalComponent
from miplearn.extractors.fields import H5FieldsExtractor
from miplearn.components.primal.actions import SetWarmStart
# Configures a primal component that uses a feedforward neural network
# to jointly predict the values of the binary variables, based on the
# objective cost function, and provides the solution to the solver as
# a warm start.
comp = JointVarsPrimalComponent(
clf=MLPClassifier(),
extractor=H5FieldsExtractor(
instance_fields=["static_var_obj_coeffs"],
),
action=SetWarmStart(),
)
# Configures a primal component that uses a chain of logistic regression
# models to jointly predict the values of the binary variables, based on
# the objective function.
comp = JointVarsPrimalComponent(
clf=ClassifierChain(SingleClassFix(LogisticRegression())),
extractor=H5FieldsExtractor(
instance_fields=["static_var_obj_coeffs"],
),
action=SetWarmStart(),
)
7.5. Expert primal component¶
Before spending time and effort choosing a machine learning strategy and tweaking its parameters, it is usually a good idea to evaluate what would be the performance impact of the model if its predictions were 100% accurate. This is especially important for the prediction of warm starts, since they are not always very beneficial. To simplify this task, MIPLearn provides ExpertPrimalComponent
, a component which simply loads the optimal solution from the HDF5 file, assuming that it has already
been computed, then directly provides it to the solver using one of the available methods. This component is useful in benchmarks, to evaluate how close to the best theoretical performance the machine learning components are.
Example¶
[4]:
from miplearn.components.primal.expert import ExpertPrimalComponent
from miplearn.components.primal.actions import SetWarmStart
# Configures an expert primal component, which reads a pre-computed
# optimal solution from the HDF5 file and provides it to the solver
# as warm start.
comp = ExpertPrimalComponent(action=SetWarmStart())