Kaggle is a site that is most known for hosting machine learning competitions. However, once a year, Kaggle team runs an optimization competition on some problem Santa Claus could face.

This year competition is a stochastic optimization problem: we are asked to optimize some outcome when the data is known with some uncertainty. Many real word problems are of this form. For instance, optimizing store replenishment and inventory levels takes as input sales forecasts. By definition, future sales are only known up to some prediction uncertainty. In a case like this, one can optimize for the worst case for instance: find a replenishment plan that minimizes the likelihood of out of stock. I could go an and expand on this, but let’s go back to Kaggle competition for now.

## The Problem

This year competition description is the following:

*♫ Bells are ringing, children singing, all is merry and bright. Santa’s elves made a big mistake, now he needs your help tonight ♫*

*All was well in Santa’s workshop. The gifts were made, the route was planned, the naughty and nice list complete. Santa thought this would finally be the year he didn’t need Kaggle’s help with his combinatorial conundrums. At last, the Claus family could take the elves and reindeer on that well deserved vacation to the South Pole.*

*Then, with just days until the big night, Santa received an email from a panicked database admin elf. Attached was a server log with the six least jolly words a jolly old St. Nick could read:*

ALTER TABLE Gifts DROP COLUMN Weight

*One of the North Pole elf interns had mistakenly deleted the weights for all of the inventory in the workshop! Santa didn’t have a backup (remember, this is a guy who makes a list and checks it twice) and, without knowing each present’s weight, he didn’t know how he would safely pack his many gift bags. Gifts were already on their way to the sleigh packing facility and there wasn’t time to re-weigh all the presents. It was once again necessary to summon the holiday talents of Kaggle’s elite.*

*Can you help Santa fill his multiple bags with sets of uncertain gifts? Save the season by turning Santa’s uncertain probabilities into presents for good little boys and girls.*

The data section contains additional information:

*Santa has 1000 bags to fill to fill with 9 types of gifts. Due to regulations at the North Pole workshop, no bag can contain more than 50 pounds of gifts. If a bag is overweight, it is confiscated by regulators from the North Pole Department of Labor without warning! Even Santa has to worry about throwing out his bad back.*

*Each present has a fixed weight, but the individual weights are unknown. The weights for each present type are not identical because the elves make them in many types and sizes.*

It then provides a way to compute the weight distributions of each gift type. Details are available on Kaggle site, just follow the above link.

It is definitely a stochastic optimization problem: we are asked to optimize the weights of the gifts Santa can distribute, when these weights are only known via a probability distributions. I decided to approach this as a stochastic cutting stock problem. All the code used here is available in a notebook on Kaggle site. The code can be run for free by all using the DOcplexcloud service, or with the freely available academic version of CPLEX if you are eligible to it.

## The Model

One way to approach the competition is to look for a solution structure that has a good chance to yield good submission. A solution structure is defined by a number of bag types, plus a number of occurrence of each bag type. A bag type is defined by the number of gifts of each type it contains. For instance *3 blocks and 1 train*.

We can focus on bag types because all bags have the same capacity (50 pounds).

There is a finite number of bag types that are possible. We define one random variables for each bag type.

All we need is an estimate the expected value and the variance of each possible bag type. Then we use two properties to find a combination of bags that maximizes a combination of expected value and standard deviation:

- the expected value of a sum of random variables is the sum of the expected values of the random variables
- the variance of a sum of independent random variables is the sum of the variances of the random variable

We estimate the mean and variance of each bag type via the law of large numbers: we use Monte Carlo simulation (with 1M sample), and compute mean and variance of the simulated results.

While simple, this approach is way too expensive to run. We improved running time in two ways:

- limiting the number of bags to those that are inside the Pareto frontier (details below),
- precomputing distributions for bags made of one gift type and reusing them for more complex bag types.

Let me expand a bit on the Pareto frontier idea. Let’s consider two bags for the sake of clarity:

- Three blocks, one train
- Three blocks, one bike, and one train

The second bag is obtained by adding one gift to the first bag. We can compute the expected weight for each of these bags. If the expected weight is lower for the second bag than for the first bag, then the second bag can be ignored. Why? Because it uses more gifts for a lower value. More generally, if the first bag cannot be extended with an additional gift without lowering the expected value then it is Pareto optimal.

Given this, we start with an empty bag, and add one gift at a time in every possible way. We do it until the expected value of the bag decreases. When this happens then we can discard the newly created bag, as it uses more items and yields a lower expected value. This results in about 40,000 bag types.

## Optimizing Expected Value

Next step is to solve the optimization problem. As said before, it is a cutting stock problem.

The mathematical formulation is as follows.

where

*n*is the number of bag type*m*is the number of gift types*mean*is the expected weight of bag type_{i}*i**g*is the number of gifts of type_{ij}*j*in bag type*i**cap*is the number of available gifts of type_{j}*j**x*is an integer decision variable that takes value_{j}*v*if bag type*i*is used*v*times*mean*is a continuous variable that represents the expected value of the solution structure

This defines a mixed integer problem with linear constraints (MIP).

Solving it is pretty straightforward with a state of the art MIP solver like CPLEX. I used the recent DOcplex package to call it from Python. The code is close to the above mathematical formulation.

from docplex.mp.model import Model def mip_solve(gift_types, bags, nbags=1000): mdl = Model('Santa') rbags = range(bags.shape[0]) x_names = ['x_%d' % i for i in range(bags.shape[0])] x = mdl.integer_var_list(rbags, lb=0, name=x_names) mean = mdl.continuous_var(lb=0, ub=mdl.infinity, name='mean') mdl.maximize(mean) for gift in gift_types: mdl.add_constraint(mdl.sum(bags[gift][i] * x[i] for i in rbags) <= allgifts[gift]) mdl.add_constraint(mdl.sum(x[i] for i in rbags) <= nbags) mdl.add_constraint(mdl.sum(bags['mean'][i] * x[i] for i in rbags) >= mean) mdl.parameters.mip.tolerances.mipgap = 0.00001 s = mdl.solve(log_output=True) assert s is not None x_val = s.get_values(x) mean_val = s.get_value(mean) print('mean:%.2f' % mean_val, 'std_val:%.2f' % std) return bags[x_val > 0]

bags is a data frame containing all bag types. We return the portion of it that contains only the bag types that are used.

Solving this MIP yields solution structures with expected value around 35,540 pounds. Results depend on how we estimate the expected value of each bag type. The more simulation runs, the more accurate it is, but the more time it takes to generate all bag types.

I thought finding the optimal expected value would be good enough to win the competition, but I was really wrong. My first submission scored about 35,880 pounds, and as I write now, top score is close to 37,000 pounds.

How could that happen? Isn’t my solution the optimal one? It is, but in a probabilistic way: it is the best one one average. Issue is that the competition isn’t about finding the best solution on average. The goal is to find the best solution given the actual (hidden) weights of the gifts.

## Optimizing Mean and Variance

One way to improve result is to generate many solutions from the same solution structure, albeit using different gifts. For instance, if the solution structure contains one bag made of one train and three blocs, a first solution could include this bag: [train_1, blocks_3, blocks_8, blocks_12]. In a second solution, the same bag could be [train_1, blocks_3, blocks_8, blocks_12]. The expected value for both bags is the same, but the actual values will be different, because the weights of individual gifts are different: the weight for train_1 is not the same as the weight of train_2.

Given we can generate many solutions from a given solution structure, how could we improve the value of the best possible one? One way to do it is to favor solution structure with larger variance. If two solution structures have the same expected value, then the one with the larger variance is more likely to generate larger value submissions. It is also more likely to generate lower value submissions.

Question is how to do that with a solver like CPLEX?

Well, the standard deviation of the solution structure is the root of its variance. And its variance is the sum of the variance of all bags in it. The mathematical formulation is therefore a slight extension of the previous one:

where

*alpha*is the relative importance of standard deviation in the objective function*n*is the number of bag type*m*is the number of gift types*mean*is the expected weight of bag type_{i}*i**var*is the variance of the weight of bag type_{i}*i**g*is the number of gifts of type_{ij}*j*in bag type*i**cap*is the number of available gifts of type_{j}*j**x*is an integer decision variable that takes value_{j}*v*if bag type*i*is used*v*times*mean*is a continuous variable that represents the expected value of the solution structure*std*is a continuous variable representing the standard deviation of the solution structure*var*is a continuous variable representing the standard deviation of the solution structure

This problem contains a quadratic constraint. It is a quadratically constrained mixed integer problem (QCMIP). Again, solving it with CPLEX is rather easy, the code becomes:

def qcpmip_solve(gift_types, bags, alpha, nbags=1000): mdl = Model('Santa') rbags = range(bags.shape[0]) x_names = ['x_%d' % i for i in range(bags.shape[0])] x = mdl.integer_var_list(rbags, lb=0, name=x_names) var = mdl.continuous_var(lb=0, ub=mdl.infinity, name='var') std = mdl.continuous_var(lb=0, ub=mdl.infinity, name='std') mean = mdl.continuous_var(lb=0, ub=mdl.infinity, name='mean') mdl.maximize(mean + alpha * std) for gift in gift_types: mdl.add_constraint(mdl.sum(bags[gift][i] * x[i] for i in rbags) <= allgifts[gift]) mdl.add_constraint(mdl.sum(x[i] for i in rbags) <= nbags) mdl.add_constraint(mdl.sum(bags['mean'][i] * x[i] for i in rbags) == mean) mdl.add_constraint(mdl.sum(bags['var'][i] * x[i] for i in rbags) == var) mdl.add_constraint(std**2 <= var) mdl.parameters.mip.tolerances.mipgap = 0.00001 s = mdl.solve(log_output=True) assert s is not None x_val = s.get_values(x) mean_val = s.get_value(mean) std_val = s.get_value(std) bags['used'] = x_val print('mean:%.2f' % mean_val, 'std_val:%.2f' % std) return bags[(bags['used'] > 0) | (bags['number'] > 0)]

Solving this QCMIP with alpha=2 yields solution structures with expected value around 35,525 pounds, and standard deviation around 333 pounds. The expected value is a bit lower, but the standard deviation is much larger. With solution structure like this, I had about 1/4 chance to get a submission above 36,400 pounds by the end of competition (about 90 submissions left at that time). This looked much better but truth is that people have found ways to generate way higher value, as shown by the current leader board of the competition. I am also able to generate better solutions, but don’t count on me to disclose how before competition ends

There is a caveat in the second approach. Can you see it?

The caveat is about how we prune the generation of candidate bags. We need to modify it to take into account the new objective function we have. When we generate *bag2* by adding one gift to *bag1* then we should compare *mean(bag1) + alpha * std(bag1)* with *mean(bag2) + alpha * std(bag2)*. If the former is higher than the latter then we can safely drop *bag2* from further consideration.

## Takeaway

What I found interesting is to use a QCMIP to optimize the maximum of the values we can get when generating solutions from one solution structure. This is significantly different from usual stochastic optimization problems. Indeed, in my experience, people are usually interested in finding solutions that are good on average (like in my first model), or that minimize worst case scenario. Here we are asked to maximize the best case scenario. It is very unusual.