Skip navigation.

The Roost

Programming, palaver, puffins!

Posts tagged with "javascript"

Alphanum: Javascript Natural Sorting Algorithm

, , , ...

Today I had a need for a natural sorting function in javascript. I've seen these functions before in other languages, so I thought to myself, no problem, I'll find one in a couple of minutes!. How wrong I was!

First off, a little explanation. Natural sorting, or Lexicographical order, is sorting a list of strings which contain letters and numbers, in such a way that makes sense to humans rather than computers. Given a list of filenames, a computer would sort them this way:

z1.doc, z10.doc, z17.doc, z2.doc, z23.doc, z3.doc
While a human would sort it like so:

z1.doc, z2.doc, z3.doc, z10.doc, z17.doc, z23.doc
Obviously, the second list makes more sense, no? Anyway, I needed an algorithm that would sort any list of strings in that fashion.

Right off the bat, I found two promising candidates, but both failed quite spectacularly on my test array (which included bunches of numbers up to 1000). More searching only seemed to find blog posts which merely linked to these faulty examples.

So, giving up that angle, I decided the best way to get a natural sorting algorithm in javascript was to examine one from another language (preferably one which worked) and port it to javascript.

I ended up coming across David Koelle's Alphanum algorithm which worked like a charm. I took the Perl implementation and ported it to Javascript resulting in the function below:

function alphanum(a, b) {
  function chunkify(t) {
    var tz = [], x = 0, y = -1, n = 0, i, j;

    while (i = (j = t.charAt(x++)).charCodeAt(0)) {
      var m = (i == 46 || (i >=48 && i <= 57));
      if (m !== n) {
        tz[++y] = "";
        n = m;
      }
      tz[y] += j;
    }
    return tz;
  }

  var aa = chunkify(a);
  var bb = chunkify(b);

  for (x = 0; aa[x] && bb[x]; x++) {
    if (aa[x] !== bb[x]) {
      var c = Number(aa[x]), d = Number(bb[x]);
      if (c == aa[x] && d == bb[x]) {
        return c - d;
      } else return (aa[x] > bb[x]) ? 1 : -1;
    }
  }
  return aa.length - bb.length;
}
Note that this code is case sensitive, so "A" will sort before "a". You can turn it into a case insensitive sort by lowercasing the incoming a and b test strings. Just change the chunkify lines to:

  var aa = chunkify(a.toLowerCase());
  var bb = chunkify(b.toLowerCase());

Thanks to David K. for the algorithm! This saved me tons of time. I owe you one. :smile:

An IE Issue Appears
After first posting this script, I came across an issue in IE which caused the script to fail. Don't worry! The lines of code above are the modified - and working - alphanum function. The problem occurred in the script as originally posted; specifically with the two lines:

  a = chunkify(a);
  b = chunkify(b);
Note the variable names, where the assignment tries to place the result back into the original variable. Actually, all that's required for the bug to manifest is to assign any array to the incoming a and b variables. For some strange reason this would fail in IE6 and IE7 while sorting arrays with larger than 23 or 24 elements. In these cases, at seemingly random times during the sort, "undefined" would be passed back to a, causing the next reference to it to trigger an exception.

The following is example code (from a testcase prepared by TarquinWJ) which triggers the bug in IE. This sort will fail in IE6 and IE7.

function customsort(a, b) {
  a = ['a']; // assign an array to a
  b = ['b']; // assign an array to b

  // IE will bail out when either a or b is undefined
  return a.length - b.length;
}
var arr = [
  'a','b','c','d','e','f','g','h','i','j','k','l','m',
  'n','o','p','q','r','s','t','u','v','w','x'
];
arr.sort(customsort);
To solve this, all it took was to assign the result to new variables (aa and bb) and change all further references to a and b to the new set. Definitely a JS bug in IE, and as it involves the creation and destruction of many arrays within a short time, it is likely some kind of memory issue.

Thanks very much for TarquinWJ for helping isolate this bug.

A Faster Prototype Version
For those who value speed over flexibility, I also made this function into an Array method. Because it breaks up all the array strings first, then does the actual sorting, then joins all the sub-arrays back into strings, it is a great deal faster than the version above. That version breaks up two strings for each sort iteration rather than only once for each array element.

It's harder to modify this method to sort multi-dimensional arrays, or arrays of objects with string properties, but the speed gain is hard to ignore. As an added bonus, you can specify case sensitivity on a per-call basis with the caseInsensitive argument (true|false). If you're just going to sort plain arrays of strings, use this version.

Array.prototype.alphanumSort = function(caseInsensitive) {
  for (var z = 0, t; t = this[z]; z++) {
    this[z] = [];
    var x = 0, y = -1, n = 0, i, j;

    while (i = (j = t.charAt(x++)).charCodeAt(0)) {
      var m = (i == 46 || (i >=48 && i <= 57));
      if (m !== n) {
        this[z][++y] = "";
        n = m;
      }
      this[z][y] += j;
    }
  }

  this.sort(function(a, b) {
    for (var x = 0, aa, bb; (aa = a[x]) && (bb = b[x]); x++) {
      if (caseInsensitive) {
        aa = aa.toLowerCase();
        bb = bb.toLowerCase();
      }
      if (aa !== bb) {
        var c = Number(aa), d = Number(bb);
        if (c == aa && d == bb) {
          return c - d;
        } else return (aa > bb) ? 1 : -1;
      }
    }
    return a.length - b.length;
  });

  for (var z = 0; z < this.length; z++)
    this[z] = this[z].join("");
}

Improve (?) Your Javascript Coding Style

, , , ...

I've gotten pretty good at javascript over the years, and I've gotten to the point where I try to do things using the bare minimum number of bytes, while still having the code readable and looking nice. In this short article, I'll outline a few of my own little coding style tips to help keep your code looking tight.

Keep in mind that, for the most part, these aren't optimisations. They won't make your code run significantly faster, but will merely help you accomplish the same thing in fewer bytes. And to me, fewer bytes often means more elegant algorithms.

The By-Value Operators
This is a simple thing. You probably already know that a = a + b can be shortened with the Add-by-value operator to a += b; But you might not realize that almost all discrete math operators can work this way:

a = a + b;   becomes   a += b;   // addition
a = a - b;   becomes   a -= b;   // subtraction
a = a * b;   becomes   a *= b;   // multiplication
a = a / b;   becomes   a /= b;   // division
a = a % b;   becomes   a %= b;   // modulus
a = a << b;  becomes   a <<= b;  // bitwise shift left
a = a >> b;  becomes   a >>= b;  // bitwise shift right
a = a >>> b; becomes   a >>>= b; // zero-fill bitwise shift right
a = a & b;   becomes   a &= b;   // bitwise AND
a = a | b;   becomes   a |= b;   // bitwise OR
a = a ^ b;   becomes   a ^= b;   // bitwise XOR

Merging assignments
A cool thing about javascript, and many other computer languages, is the fact that assignments not only do the task of assignment, but they also return a value. This means that the following code will alert "Hello World!" as well as assigning it to the foo variable for later use.

alert(foo = "Hello World!");

Say we want to prompt the user for some info, check to see if they actually typed something, and if so store the contents into the foo variable. We can do this all in one step!

if (foo = prompt("Type a value for foo:")) {
  alert("You typed: " + foo);
} else alert("You didn't type anything!");

This not only works with the = operator, but with any operator which assigns a value. Consider the following code:

while (i < 30) {
  i++;
  // ... Do something
}

The increment operator assigns a value of i + 1 to i, after it has returned the i value. However, why have it on it's own line? The while loop expression is evaluated every loop, let's put the assignment there instead:

while (i++ < 30) {
  // ... Do something
}

Comma separation of assignments also plays a part in reducing the amount of code used. You can use comma separation to string together multiple assignments in various contexts:

var foo = 1;
var bar = 2;
var baz = 3;

// is equivalent to:

var foo = 1, bar = 2, baz = 3;

This also works in the for loop context:

var y = 0;
for (var x = 0; x < 10; x++) {
  y++;
  // ... Do something
}

// is equivalent to:

for (var x = 0, y = 0; x < 10; x++, y++) {
  // ... Do something
}

That neatens things up nicely, if I do say so myself. Lastly, we can assign the same value to multiple values in one go:

var x = 1, y = 2, z = 3;

x = 4;
y = 4;
z = 4;

// is equivalent to:

var x = 1, y = 2, z = 3;

x = y = z = 4;

This works because the javascript engine first assigns the number 4 to variable z, then that assignment returns the value 4 which gets assigned to y, then that assignment returns 4 which gets assigned to x!

The only caveat with these methods are that they don't all work if you are also using var to define the variable at the same time. For instance, the following example will cause a syntax error:

if (var foo = prompt("Type a value for foo:")) {
  alert("You typed: " + foo);
} else alert("You didn't type anything!");

You need to forego using var (which I don't recommend), or define foo as an in-scope variable beforehand.

The Boolean Flip
Say you have a boolean variable acting as a switch. A user action can turn it on (== true) and the same user action can turn it off (== false). Just like a light-switch! :smile: When I first started using javascript, I was implementing these switches like so:

var foo = false;

if ([user action]) {
  if (foo) {
    foo = false;
  } else foo = true;
}

That's quite a lot of code for something which should be simple. Then I learned the ( ) ? : syntax which allows you to use the if structure directly within assignment statements, so I started using this:

var foo = false;

if ([user action]) {
  foo = (foo) ? false : true;
}

Awesome! All on one line! I used this method for quite some time until I stumbled across an even more compact method within someone else's code. I was using the following method up until very recently:

var foo = false;

if ([user action]) {
  foo = !foo;
}

Here we assign the NOT value of foo to the foo variable. The NOT value is the always the opposite boolean value of the variable. So if foo is true, NOT makes it false, and if foo is false, NOT makes it true! We sure have come a long way from three lines and 45 bytes down to one line and 11 bytes! But there's an even smaller way. :smile:

Consider the scenario that we're dealing with really long variable names, or deeply nested objects. Suppose we have a boolean variable named myObject.property1.subproperty5.foo. Using the method above, we get:

var foo = false;

if ([user action]) {
  myObject.property1.subproperty5.foo = !myObject.property1.subproperty5.foo;
}

Man, we have to repeat the whole variable name; that's a lot of duplicated bytes. Isn't there a way we could do it so we only have to write it once? You could try prototyping a function onto the javascript Boolean object, but it has a funny way of dealing with its own evaluation which means you can't assign it from prototyped functions. So how?

The key lies in how javascript treats other types of data. In javascript, the values false, , null, undefined, NaN and "" (the empty string) all evaluate to false, while anything else evaluates to true. Is there some kind of operator we can apply so that true becomes something that evaluates to false, and false becomes something that evaluates to true? There is! It's the XOR operator: ^. XOR chiefly operates on binary data, taking two bits (0 or 1), comparing them; if they are different return 1 and if they are the same return 0. XOR works on booleans by first casting the true or false value into a 1 or 0 binary digit respectively, then working its magic. Example output follows:

var foo = true, bar = false;

var baz = foo ^ foo;   // false
var baz = foo ^ bar;   // true
var baz = bar ^ foo;   // true
var baz = bar ^ bar;   // false

If we examine the first and third lines, we see an interesting thing. Any boolean XORed with a value of true becomes the opposite of what it was! true ^ true = false and false ^ true = true. Then using the XOR-by-value method mentioned earlier, we can finally boil this code down to:

var foo = false;

if ([user action]) {
  myObject.property1.subproperty5.foo ^= 1;
}

The number 1 is equivalent to true and only takes one byte rather than four. And there you have it, the Boolean Flip using only the variable name and 5 extra bytes, two of which are whitespace! :smile: After this operation, the variable won't be a boolean anymore, but rather either a 1 or a 0. These values can be used in all situations where booleans are used unless you require strict equality (===) which is usually never.

What byte-saving javascript tricks do you know?
December 2009
S M T W T F S
November 2009January 2010
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31