« MovableType problems SWT Flash »

November 07, 2003

Much faster array sorting

Need to sort an Array? No sweat, Array's have a built-in "sort" method. You decide to call my_array.sort() only to find out your numbers aren't sorted correctly. You build a custom comparison function to pass into sort only to realize that sorting 3000 numbers takes 261 milliseconds. What the heck?

We all know that the built-in Array sort method isn't the fastest around. The best way around the speed issue is to either build your own sorting method, or use someone else's. Prototype is a good place to start looking for code, and now you've brought that 261 milliseconds down to 127 with the qSort method, and even further down to 103 milliseconds with sortPInt. Still, I can't help but think that we can do better...

So.. thinking a bit about the problem, I've come up with my own custom sort method. At it's heart it resembles the "Bin Sort" algorithm. I was able to reduce the sorting time down to 58 milliseconds (from the original 261). Sweetness.

For starters, this sort method was built for a project I'm working on and thus only functions correctly under a given set of conditions. There are really only 2 conditions that need to be met for this sort to work.

  1. All of the array elements must be integers (no decimals!) - positive and negative integers will both work
  2. The theoretical min and max values of the elements inside the array must be known

If those conditions aren't met, then don't expect correct sorting results using my custom sort method. It's ok if the values you supply aren't the actual min and max values contained in the array - they only need to be the theoretical min and mix. But, the closer they are to the actual values the better performance you will see.

To start off, here is the code I'm using to test. What this does is generate an array of 1000 integers, ranging from -1005 to 1994. Where did these numbers come from? Why, I made them up for this example of course!

range = 3000;
min = -1005;
max = min + range - 1 // 1994

// create the numbers array and populate it
numbers = new Array(1000);
for (var i = 0; i < 1000; i++) {
 	numbers[i] = Math.floor(Math.random() * range) + min;	
}

Note that declaring the array a fixed size and adding elements with the [] operator is faster (in this case, at least) than declaring a dynamic array (numbers = new Array()) and adding elements with "push" (numbers.push(newElement)). How much faster? Well, the dynamic array takes my computer around 51 milliseconds to build, whereas the fixed-length array takes only 45. This isn't a huge difference, but sometimes every little bit counts. Plus - Since we know we're going to have 1000 elements, why would we make the array dynamic anyway? That's a rhetorical question, for the most part.

Next comes testing the built in sort method. Here we go:

// assuming numbers is built like the code snippet above
start_t = getTimer();
numbers.sort();
trace("time: " + (getTimer() - start_t) + " ms");
trace(numbers);  

With the above code snippet, you can see that the numbers aren't sorted correctly. Flash thinks that we're trying to sort strings instead of numbers (for example, 1403 comes before 304 because 1 is less than 3). So, we build a custom comparison function and run the test again:

function numCompare(a, b) {
 	if (a < b) return -1
 	if (a == b) return 0;
 	return 1;
}

start_t = getTimer();
numbers.sort(numCompare);
trace("time: " + (getTimer() - start_t) + " ms");
trace(numbers);

That's better... but by golly that beast is slow. At least we got correct results this time, but let's see what we can do to improve the sort time.

I repeated the same testing process for those 2 prototypes mentioned in the opening paragraph....

Copy the Array.prototype.qSort method to the frame replace "numbers.sort(numCompare)" with "numbers.qSort(0, numbers.length-1)" - this is faster, but not fast enough.

Trying again.. copy the sortPInt method to the frame, and change the sorting line to "numbers.sortPInt()" - again, a little bit faster, but still not quite fast enough. Note that this sort only takes positive integers, so you'll need to change the min value to a positive integer for the sort to work correctly.

Finally, get fed up and write your own custom sort method. Copy the code below, and then replace the sort call with "numbers.customSort(min, max)". Min is the smallest possible value in the array, and max is the largest possible value in the array. Note that these variables were defined right about the for loop that populates the numbers array. Also note that these are the theoretical min and max.. and may not actually be the min and max elements contained in the array.

// Darron Schall
// 11/07/03
// pre: min and max are the theoretical min and max values
//     contained in the array.  the array contains only integers.
// post: the array will be sorted in ascending order
Array.prototype.customSort = function(min, max) {
 	var i;
 	var bins = new Array(max-min+1);
 	var binvalue;
 	
 	// distribute array to bins
 	i = this.length;
 	while (i--) {
  		binvalue = this[i]-min;
  		if (bins[binvalue] == undefined) {
   			bins[binvalue] = 0;
   		}
  		bins[binvalue]++;
  	} 
 	
 	// rebuild - this may see some speed gains
         // by making the array fixed-length instead
         // of dynamic
 	var tmp = new Array();
 	i = bins.length;
 	while (i--) {
  		if (bins[i] != undefined) {
   			while (bins[i]--) {
    				tmp.push(i);
    			}
   		}
  	}
 	
 	// reconstruct this from tmp, reversing the order
         i = this.length;
 	while(i--) {
  		this[this.length-i-1] = tmp[i]+min;
  	}
}

As stated before, my customSort method takes only 58 milliseconds to execute on my computer. Considering the built-in method takes 261 milliseconds, I would call this sort algorithm a success in this scenario. Of course, my custom sort may not always be the fastest depending on the number of elements in the Array. It seems to really outperform when the integer array to sort is large.

Read on for an explanation of my algorithm...

Here's a quck explanation about how the sort works. First, we declare a "bins" array big enough to have 1 position for every element ranging from min to max. In Flash, when you define an array like this without initializing any of the values, they all default to "undefined," which will be important later. Second, we loop over every element in the original array and determine the index it has in the bins array. We then mark that index as being in use (by incrementing it's value). If there are duplicate numbers in the array, then the value at the index representing the number will be the number of elements sharing that number value.

The above is slightly confusing. Hopefully a quick example can illustrate the magic a little clearer:

// if num_array looks like this...
num_array = [6, 1, 0, 1];

// bins would look like this:
bins = [1, 2, undefined, undefined, undefined, undefined, 1];

See how the element in num_array is used as an index in bins, and that the element at the index in bins is the number of times that index number appears in num_array? Probably not... but think about it some more, the above is the trickist part. Since the number 1 is repeated 2 times in num_array, bins[1] = 2. There is no number 2 in num_array, so bin[2] is undefined. There is only one number 6, so bins[6] = 1. Get it now? I hope so...

Ok, so the bins array is populated as described above. Now we just need to loop through it and reconstruct an array out of it. I know I said before that using dynamic arrays was bad if we know the number of elements the array will have, but it was easier (read: less time consuming for me) in this situation to just make a temporary dynamic array to rebuild with the data from the bins array. We loop through the bins array and if the value at the index is not undefined (meaning the original array contained a number equal to the index value), we push the index on the tmp array however many times the value appeared in the original array.

This leaves us with a temporary array in reverse order. The final step is to replace "this" with "tmp" and reverse the order of the array, which is what the last while loop does.

I apologize if my explanation is a little confusing. The algorithm itself isn't too hard to follow once you can get over how the "bins" array works. I hope that in reading this you've been inspired to always look for a better approach to solving problems. By not being satisfied with the built in sort, I was able to shave over 200 milliseconds off of the sorting time. There are probably even faster methods to sort an array of positive integers, but I'm pleased with the results I was able to achieve.

Note that all of these times are specific to my computer. Your execution times may vary.

A possible improvement to my custom sort is to try and reduce the size of the bins array needed in memory, and to try and remove the tmp array at the end completely.

As always, let me know if you find this method useful, or if you use my code as a building block for your own custom sorting methods!

Comments

  • Why bother doing an array.reverse()? Why not just count backwards when you're building your temp array? If you're going for speed, it would be better don't you think? Oh...it looks like you mentioned something like that at the end of your article. Nevermind.

  • Very sweet code - I'll keep it for a rainy day :)

  • Thanks for taking the time to write this up. I needed to solve a display issue where IE was taking forever to sort and display a table. By adding an index array and using a modification of your sort algorithm, it sped it up from 140 seconds to about 80 ms! Very nice, indeed!