Operations research (OR) uses mathematical and analytical tools to arrive at optimal solutions for complex business problems like workforce scheduling. The mathematical techniques used to solve these problems, such as linear programming and mixed-integer programming, require the use of optimization software (solvers). There are several popular and powerful solvers available, ranging from commercial options like IBM CPLEX to open-source packages like ORTools. While these solvers incorporate decades of algorithmic expertise and can solve large and complex problems effectively, they have some scalability limitations.

In this post, we’ll describe three alternatives that you can consider for solving OR problems (see Figure 1). None of these are as general purpose as traditional solvers, but they should be on your “emerging technologies” radar.

These include:

- A traditional solver running on a compute platform
- Reinforcement and machine learning (ML) algorithms running on Amazon SageMaker
- A quantum computing algorithm running on Amazon Braket. Experiments are collected in Amazon DynamoDB and the results are visualized in Amazon Elasticsearch Service.

## A reference problem and solution

Let’s start with a reference problem and solve it with a traditional solver. We’ll tackle an inventory management issue (see Figure 2). We have a sales depot that supplies products for local sales outlets. For the depot’s Region, there are seven weeks of historical sales data for each product. We also know how much each product costs and for how much it can be sold. Finally, we know the overall weekly capacity of the depot. This depends on logistical constraints like the size of the warehouse and transportation availability. This scenario is loosely based on the Grupo Bimbo retailer’s Kaggle competition and dataset.

Our job is to place an inventory order to restock our sales depot each week. We quantify our work through a reward function. We want to maximize our revenue:

revenue = (sale price * number of units sold)

*(Note that the sample dataset does not include cost of goods sold, only sale price.)*

We use these constraints:

total units sold <= depot capacity

0 <= quantity sold of any given item <= forecasted demand for that item

There are many possible solutions to this problem. Using ORTools, we get an average reward (profit) of about $5,700, in about 1,000 simulations.

We can make the scenario slightly more realistic by acknowledging that our sales forecasts are not perfect. After we get the solution from the solver, we can penalize the reward (profit) by subtracting the cost of unsold goods. With this approach, we get a reward of about $2,450.

## Solving OR problems with reinforcement learning

An alternative approach to the traditional solver is reinforcement learning (RL). RL is a field of ML that handles problems where the right answer is not immediately known, like playing a game of chess. RL fits our sales depot scenario, because we don’t know how well we will do until *after* we place the order and are able to view a week of sales activity.

Our sales depot problem resembles a knapsack problem. This is a common OR pattern where we want to fill a container (in this case, our sales depot) with as many items as possible until capacity is reached. Each item has a value (sales price) and a weight (cost). In RL we have to translate this into an observation space, an action space, a state, and a reward (see Figure 3).

The *observation space* is what our purchasing agent sees. This includes our depot capacity, the sales price, and the forecasted demand. The *action space* is what our agent can do. In the simplest case, it’s the number of each item to order for the depot, each week. The *state* is what the agent sees right now, and we model that as the sales results from last week. Finally, the *reward* function is our profit equation.

One important distinction between OR solvers and RL is that we can’t easily enforce hard constraints in RL. We can limit the amount of an individual product we purchase each week, but we can’t enforce an overall limit on the number of items purchased. We may exceed the capacity of our depot. The simplest way to handle that is to enforce a penalty. There are more sophisticated techniques available, such as interpreting our action as the percentage of budget to spend on each item. But let’s illustrate the simple case here.

Using an RL algorithm from the Ray RLLib package, our reward was $7,000 on average, including penalties for ordering too much of any given item.

## Solving OR problems with machine learning

It’s possible to model a knapsack problem using ML rather than RL in some cases, and there are simple reference implementations available. The design assumes that we know, or can accurately estimate the reward for a given week. With our simple scenario, we can compute the reward using estimates of future sales. We can use this in a custom loss function to train a neural network.

## Solving OR problems with quantum computing

Quantum computers are fundamentally different than the computers most of us use. The appeal of quantum computers is that they can tackle some types of problems much more efficiently than standard computers. Quantum computers can, in theory, solve prime number factoring for decryption in orders of magnitude faster than a standard computer. But they are still in their infancy and limited to the size of problem they can handle, due to hardware limitations.

D-Wave Systems, which make some of the types of quantum computers available through Amazon Braket, has a solver called QBSolv. QBSolv works on a specific type of optimization problem called quadratic unconstrained binary optimization (QUBO). It breaks large problems into smaller pieces that a quantum computer can handle. There is a reference pattern for translating a knapsack problem to a QUBO problem.

Running the sales depot problem through QBSolv on Amazon Braket and using a subset of the data, I was able to obtain a reward of $900. When I tried to run on the full dataset, I was not able to complete the decomposition step, likely due to a hardware limitation.

## Conclusion

In this blog post, I review OR problems and traditional OR solvers. I then discussed three alternative approaches, RL, ML, and quantum computing. Each of these alternatives has drawbacks and none is a general-purpose replacement for traditional OR solvers.

However, RL and ML are potentially more scalable because you can train those solutions on a cluster of machines, rather than running an OR solver on a single machine. RL agents can also learn from experience, giving them flexibility to handle scenarios that may be difficult to incorporate into an OR solver. Quantum computing solutions are promising but the current state of the art for quantum computers limits their application to small-scale problems at the moment. All of these alternatives can potentially derive a solution more quickly than an OR solver.

**Further Reading:**

- OR-Gym: A Reinforcement Learning Library for Operations Research Problems
- Combining Machine Learning and Operations Research Methods to Advance the Project Management Practice
- Quantum optimization: The future of operations research