Bin Fitting problems with SQL

“Bin Fitting” or “Bin Packing” means putting the greatest quantity in the smallest number of “bins” or containers. There are some good solutions with the MODEL clause, but the most concise and efficient solutions use the new 12c MATCH_RECOGNIZE clause.

There are two categories of “bin fitting” problems:

  1. The number of bins is fixed and the capacity can vary.
    > We want to equalize the quantities we put into each bin.
  2. The bins have a fixed capacity and the number of bins can vary.
    > We want to minimize the number of bins.

I’ll show an example of each.

Fill a fixed number of bins as equally as possible

Wikipedia calls this the “Multiprocessor scheduling problem“. Brendan Furey, author of a brilliant MODEL solution, calls it “balanced number partitioning“. Let’s take a simple, reproducible example:

CREATE TABLE items AS
SELECT LEVEL item_name, level item_value 
from dual
connect by level <= 10;

The total value of all the items is 55. We want to split the items into 3 “bins” or “buckets” or groups, so that the sums of the values are as equal as possible. Here the best we can do is put 19 in one bin and 18 in each of the other bins.

There are several algorithms that solve this problem more or less accurately. The one I’m going to show is fairly simple:

  • First, order the items by value in descending order
  • Then, assign each item to whatever bin has the smallest sum so far.

Here is Brendan’s MODEL implementation:

SELECT item_name, bin, item_value,
Max (bin_value) OVER (PARTITION BY bin) bin_value
FROM (
  SELECT * FROM items
  MODEL 
  DIMENSION BY (Row_Number() OVER (ORDER BY item_value DESC) rn)
  MEASURES (
    item_name, 
    item_value,
    Row_Number() OVER (ORDER BY item_value DESC) bin,
    item_value bin_value,
    Row_Number() OVER (ORDER BY item_value DESC) rn_m,
    0 min_bin,
    Count(*) OVER () - :N_BINS - 1 n_iters
  )
  RULES ITERATE(100000) UNTIL (ITERATION_NUMBER >= n_iters[1]) (
    min_bin[1] =
      Min(rn_m) KEEP (DENSE_RANK FIRST ORDER BY bin_value)[rn <= :N_BINS],
    bin[ITERATION_NUMBER + :N_BINS + 1] = 
      min_bin[1],
    bin_value[min_bin[1]] =
      bin_value[CV()] + Nvl (item_value[ITERATION_NUMBER + :N_BINS + 1], 0)
  )
)
WHERE item_name IS NOT NULL
ORDER BY item_value DESC;
ITEM_NAME BIN ITEM_VALUE BIN_VALUE
10 1 10 19
9 2 9 18
8 3 8 18
7 3 7 18
6 2 6 18
5 1 5 19
4 1 4 19
3 2 3 18
2 3 2 18
1 3 1 18

Please refer to Brendan’s post for a full explanation, but notice the number of “bins” is actually a bind variable!

Unfortunately, this is not a very scalable solution: with 10,000 records, this query runs in about 30 seconds. My implementation with the 12C MATCH_RECOGNIZE clause has to hard-code the number of buckets, but it does 10,000 records in .2 seconds!

SELECT * from  items
MATCH_RECOGNIZE (
  ORDER BY item_value desc
  MEASURES to_number(substr(classifier(),4)) bin#,
    sum(bin1.item_value) bin1,
    sum(bin2.item_value) bin2,
    sum(bin3.item_value) bin3
  ALL ROWS PER MATCH
  PATTERN ((bin1|bin2|bin3)*)
  DEFINE bin1 AS count(bin1.*) = 1
    OR sum(bin1.item_value)-bin1.item_value
        <= least(sum(bin2.item_value), sum(bin3.item_value)),
    bin2 AS count(bin2.*) = 1
      OR sum(bin2.item_value)-bin2.item_value
        <= sum(bin3.item_value)
);
ITEM_NAME ITEM_VALUE BIN# BIN1_CUM_VAL BIN2_CUM_VAL BIN3_CUM_VAL
10 10 1 10
9 9 2 10 9
8 8 3 10 9 8
7 7 3 10 9 15
6 6 2 10 15 15
5 5 1 15 15 15
4 4 1 19 15 15
3 3 2 19 18 15
2 2 3 19 18 17
1 1 3 19 18 18

I’ve already mentioned the basic syntax of this clause in previous posts, but look at the PATTERN line. The vertical pipe means “OR” – I want a line that matches either “bin1″, “bin2″ or bin3″, and the first match wins.

The DEFINE clause takes some getting used to. The way it works is, you assume that the current row matches the first condition and you test the condition; if the condition is true then there really is a match and you move on to the next row. Otherwise you test the second condition, and so one until the last. In this case, I don’t have to define BIN3 at all: it’s always true if none of the previous matches work.

The BIN1 condition starts with count(bin1.*) = 1: this can only be true if no previous row has been assigned to BIN1. The other part of the condition compares the sums of BIN1, BIN2 and BIN3. If BIN1 has the smallest amount, it gets the current row. Now here’s the tricky part: if I just say sum(bin1.item_value), then that sum will include the row I am testing. What I really want is the sum so far, not including the row I am testing, so I have to subtract that value.

The BIN2 condition is simpler, because I already know that BIN1 contains more that BIN2.

Bins have a fixed capacity: use as few bins as possible

My first post on the MATCH_RECOGNIZE clause showed how to calculate running totals that start over before the total exceeds a certain value. This is a simple version of the “fixed capacity” problem.

I’m going to complicate things a bit by letting each bin have an elastic capacity. The normal capacity is 50, but in some cases I can go over the limit of 50, as long as I never go over 55.

  • If the current bin already has 50 or more, it is full and cannot receive a new item.
  • If the current bin has less than 50, I can add a new item as long as the new total is 55 or less.

I admit that I was hoping to solve this problem by using more advanced syntax in the PATTERN clause, such as “reluctant qualifiers”. I tried 4 or 5 different solutions and they all had bugs. The only solution that seems to work uses a simple PATTERN and a slightly more complex DEFINE. Anyway, here’s the code to create a test table with different boundary conditions.

drop table t cascade constraints purge;
create table t(cat_id number, item_id number, num_units number, incr number);

insert into t
select rownum cat_id, rownum item_id, num_units, incr from (
  select level incr from dual
  connect by level <= 6
) a
, lateral(
  select 51 - level num_units from dual
  connect by level <= a.incr+1
);
insert into t
select cat_id, rownum+100 item_id, incr num_units, incr from t
, lateral(
  select null n from dual
  connect by level <= (56 - t.num_units)/t.incr+1
);
insert into t values(100, 200, 41, 1);

select cat_id, item_id, num_units,
incr "Increment"
from t
where num_units > 40
order by cat_id, item_id;
CAT_ID ITEM_ID NUM_UNITS Increment
1 1 50 1
2 2 49 1
3 3 50 2
4 4 49 2
5 5 48 2
6 6 50 3
7 7 49 3
8 8 48 3
9 9 47 3
10 10 50 4
11 11 49 4
12 12 48 4
13 13 47 4
14 14 46 4
15 15 50 5
16 16 49 5
17 17 48 5
18 18 47 5
19 19 46 5
20 20 45 5
21 21 50 6
22 22 49 6
23 23 48 6
24 24 47 6
25 25 46 6
26 26 45 6
27 27 44 6
100 200 41 1

Each category is a different test case. The first item in each category has a value between 44 and 50. The following items all have the same values, indicated in the “Increment” column. I add a last case at the end where I don’t reach the limit of 50.

Notice the use of the LATERAL clause, which is now documented in 12c. It’s a lot simpler than the old TABLE(CAST(MULTISET syntax. Now here is the solution; I only show the first bin in each category, plus the first item in the second bin. I’ve got nothing more to say, so don’t scroll to the bottom if you don’t feel like it :)

select cat_id, bin#, sum_units, num_units, item_id
from t
match_recognize(
  partition by cat_id
  order by item_id
  measures final sum(num_units) sum_units,
  match_number() bin#, count(*) cnt
  all rows per match
  pattern(A+)
  define A as sum(num_units) - num_units < 50
    and sum(num_units) <= 55
)
where bin# = 1 or cnt = 1;
CAT_ID BIN# SUM_UNITS NUM_UNITS ITEM_ID
1 1 50 50 1
1 2 7 1 101
2 1 50 49 2
2 1 50 1 108
2 2 7 1 109
3 1 50 50 3
3 2 8 2 116
4 1 51 49 4
4 1 51 2 120
4 2 6 2 121
5 1 50 48 5
5 1 50 2 124
5 2 8 2 125
6 1 50 50 6
6 2 9 3 129
7 1 52 49 7
7 1 52 3 132
7 2 6 3 133
8 1 51 48 8
8 1 51 3 135
8 2 6 3 136
9 1 50 47 9
9 1 50 3 138
9 2 9 3 139
10 1 50 50 10
10 2 8 4 142
11 1 53 49 11
11 1 53 4 144
11 2 4 4 145
12 1 52 48 12
12 1 52 4 146
12 2 8 4 147
13 1 51 47 13
13 1 51 4 149
13 2 8 4 150
14 1 50 46 14
14 1 50 4 152
14 2 8 4 153
15 1 50 50 15
15 2 10 5 155
16 1 54 49 16
16 1 54 5 157
16 2 5 5 158
17 1 53 48 17
17 1 53 5 159
17 2 5 5 160
18 1 52 47 18
18 1 52 5 161
18 2 5 5 162
19 1 51 46 19
19 1 51 5 163
19 2 10 5 164
20 1 50 45 20
20 1 50 5 166
20 2 10 5 167
21 1 50 50 21
21 2 12 6 169
22 1 55 49 22
22 1 55 6 171
22 2 6 6 172
23 1 54 48 23
23 1 54 6 173
23 2 6 6 174
24 1 53 47 24
24 1 53 6 175
24 2 6 6 176
25 1 52 46 25
25 1 52 6 177
25 2 6 6 178
26 1 51 45 26
26 1 51 6 179
26 2 6 6 180
27 1 50 44 27
27 1 50 6 181
27 2 12 6 182
100 1 41 41 200
About these ads

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s