A project I’ve been working on required a random selection of postings to be shown to users. The entries were to be drawn from a larger list : a cache of references to posts held in memory.

My first effort at the selection code simply copied the original list, shuffled it, then selected the first ‘n’ entries (where ‘n’ is the number of required postings). A colleague remarked that this was hardly an optimal use of time or memory. So I decided to write a a general-purpose function to select ‘n’ unique entries from a list of ‘m’ entries – using as little memory and processing time as possible.

## A naive solution

The most obvious solution to the problem is to repeatedly draw random entries from 0.. n of the list. To reduce the repeats, keep a record of the indices of the random entries already selected. Then, as each new item is selected, the existing list is checked to see if the entry was already taken.

The problem with this is that as more entries are selected the likelihood of such a collision increases. The execution time of the function cannot therefore be predicted. In some cases where the number to select was very large this could make the function quite slow. You could speed it by using different containers and such, but a better approach seems to be avoid collisions in the first place.

Consider a case of selecting five random entries from a list of ten. The first entry index can be found by picking a random number between zero and nine. When making the second selection however there are now only nine remaining unselected entries. So if we pick from zero to eight, then find a way of converting this number (the index of *unused entries*) into a full source table index, we have avoided the possibility of collision altogether.

## The method

My solution calls for items chosen using a random number generated between 0 and m-c where ‘c’ is the number of already selected items. This represents the relative index of *unchosen* entries, not the actual entries themselves. This number is then converted into the actual entry index by an algorithm.

This algorithm involves iterating backwards through the list of already picked random numbers. The principal is at each iteration to convert the index from 0… m-c space to 0 .. 1+m-c space. In other words the space of the previously selected element. One single operation can do this, and at the same time avoid any potential collision with the last drawn entry in its target space. The operation says “if the already selected entry’s unchosen entry index is the same or less than the current index number then increment this index by one.”

When all already selected entries have been processed in this way, the resulting index is guaranteed to be in 0.. n space and clear of any collisions with already selected entries.

The method has an order of execution n*m / 2, and temporary storage requirements of n indices. This is stable and predictable however so it is a in improvement on collision avoidance methods that would mean repeating random selections.

## Code

Here is a ready-to-use implementation in Java.

- import java.util.List;
- import java.util.Random;
- import java.util.ArrayList;
- public class ListUtil {
- /**
- * Create a new list which contains the specified number of elements from the source list, in a
- * random order but without repetitions.
- *
- * @param sourceList the list from which to extract the elements.
- * @param itemsToSelect the number of items to select
- * @param random the random number generator to use
- * @return a new list containg the randomly selected elements
- */
- public static <T> List<T> chooseRandomly(List<T> sourceList, int itemsToSelect, Random random) {
- int sourceSize = sourceList.size();
- // Generate an array representing the element to select from 0... number of available
- // elements after previous elements have been selected.
- int[] selections = new int[itemsToSelect];
- // Simultaneously use the select indices table to generate the new result array
- ArrayList<T> resultArray = new ArrayList<T>();
- for (int count = 0; count < itemsToSelect; count++) {
- // An element from the elements *not yet chosen* is selected
- int selection = random.nextInt(sourceSize - count);
- selections[count] = selection;
- // Store original selection in the original range 0.. number of available elements
- // This selection is converted into actual array space by iterating through the elements
- // already chosen.
- for (int scanIdx = count - 1; scanIdx >= 0; scanIdx--) {
- if (selection >= selections[scanIdx]) {
- selection++;
- }
- }
- // When the first selected element record is reached all selections are in the range
- // 0.. number of available elements, and free of collisions with previous entries.
- // Write the actual array entry to the results
- resultArray.add(sourceList.get(selection));
- }
- return resultArray;
- }

## 2 comments

Comments feed for this article

Trackback link: http://jimblackler.net/blog/wp-trackback.php?p=59