Generating multi-subsets using arithmetic

July 11th, 2009 at 7:27 am

In the past I’ve written about how simple arithmetic can be employed to compute a powerset of a given set.

Here I want to show a generalization, that uses n-nary arithmetic. But first, let’s define the problem:

Suppose you have a set of elements and you want to select multi-subsets from it. By multi-subset in this context I mean that an element can appear more than once in it. For example, given the set {0, 1, 2, 3, 4, 5}, then {1, 1, 2} is a multi-subset. So are {5, 5, 5, 5} and {0, 1, 2, 3, 4, 5}. Suppose you want to go over all multi-subsets of a set. How can this be done?

Note that generating a superset is a private case of this problem, restricting each element to appear either 0 or 1 times in the resulting subset.

So the solution is a generalization of the binary-arithmetic solution for the powerset problem.

Intuitive motivation: consider the decimal numbers, for example 25. If we use the position of each digit (starting with the units) to convey information, this leads to an interesting observation. If we have two elements to choose from, 25 may mean 5 times the 1st element, 2 times the second element. Now, going over all numbers from 0 to 99, we are actually generating all multi-subsets of two elements where each can be picked from 0 to 9 times.

Once this is clear, the algorithm is simple. Let’s generalize to a n-ary base system, using position to point to an element and the ‘digit’ at this position to say how many times it appears in a given multi-subset. And the best part – the simple rules of addition with carry can now be used to efficiently generate all multi-subsets, given the amount of elements we have (length) and the maximal amount of times each can be picked (upto), the minimum being assumed 0.

Here’s the code:

def multiselects(upto, length):
  # Arithmetically, we create an array of digits
  # (each in the range 0..upto).
  # It's initialized with '1'
  #
  ar = [1] + [0] * (length - 1)

  while True:
      yield ar

      # The index we're currently trying to
      # advance
      #
      idx = 0

      # Advance the current index. If it reaches
      # the limit (upto), pefrorm a carry to the
      # next index (digits)
      #
      while idx < length:
          ar[idx] += 1
          if ar[idx] <= upto:
              break
          else:
              ar[idx] = 0
              idx += 1

      # We've reached the last number...
      #
      if idx == length:
          break

An an example run of:

for s in multiselects(2, 3):
    print s

Produces:

[1, 0, 0]
[2, 0, 0]
[0, 1, 0]
[1, 1, 0]
[2, 1, 0]
[0, 2, 0]
[1, 2, 0]
[2, 2, 0]
[0, 0, 1]
[1, 0, 1]
[2, 0, 1]
[0, 1, 1]
[1, 1, 1]
[2, 1, 1]
[0, 2, 1]
[1, 2, 1]
[2, 2, 1]
[0, 0, 2]
[1, 0, 2]
[2, 0, 2]
[0, 1, 2]
[1, 1, 2]
[2, 1, 2]
[0, 2, 2]
[1, 2, 2]
[2, 2, 2]

Note that the solution is general, as the lists it returns are lists of indices. These can be employed with any set to generate multi-subsets.

Background and links

I came up with this function while working on Project Euler’s problem 77. I ended up using a different method, but visualizing the possible partitions of primes was very useful.

Here are some interesting mathematical links related to this problem:

Related posts:

  1. application of combinations
  2. Allocating multi-dimensional arrays in C++
  3. Generating random sentences from a context free grammar
  4. Project Euler problem 43
  5. Equivalence classes and group partitions

5 Responses to “Generating multi-subsets using arithmetic”

  1. Larry HastingsNo Gravatar Says:

    Your call to multiselects(2, 3) didn’t generate [0, 0, 0]. Back to the drawing board!

  2. elibenNo Gravatar Says:

    @Larry,

    This is by design, because an empty subset isn’t interesting. If you need it, just change the initalization of ar in the function to:

    ar = [0] + [0] * (length – 1)

  3. SteveNo Gravatar Says:

    For Euler problem 77 (and many other ‘counting’ problems) Dynamic Programming is very useful.
    http://en.wikipedia.org/wiki/Dynamic_programming
    Its worth looking into if you are not already familiar with it.

  4. elibenNo Gravatar Says:

    @Steve,

    Yep, project Euler would be hard without DP, which was what I used for problem 77 (with a similar algorithm to 76 and 31)

  5. MarkNo Gravatar Says:

    You can also use itertools.product for this, at least in Python 2.6 and later:

    for x in itertools.product(range(3), repeat=3): print x

    I’m not sure multi-subset is a good name, since it suggests that order shouldn’t matter: that is, that {1, 1, 2} should be considered the same as {1, 2, 1}.

Leave a Reply

To post code with preserved formatting, enclose it in `backticks` (even multiple lines)