the Programming Pantheon

in servitude of Koios & Mnemosyne

Maybe Java?

, , , , , ,

I'm fairly sure you have come across the following situation from time to time:
Does this null value mean "it is null? or "does not exist?"
Of course any other (magic) constant can/will (eventually) have the same result.

The most prominent (and simplest) example of this occurence is the Map interface as found in Java. The contract of java.util.Map<K, V>#get(K):V specifies that "[it] [r]eturns the value to which the specified key is mapped, or {@code null} if this map contains no mapping for the key." Or rather, if containsKey returns false for a particular key, then it returns null for that key. Not much of a problem, wouldn't you not say? I didn't think so either, but there was this nagging feeling that something was off, not entirely right. It felt as if this was wrong. It is one of those to which you say things like "Meh, I won't be bothered by it", or it "doesn't matter", and you would be quite right about it as well. But what if you wanted to do something about this nagging feeling? Regardless if you could (and with this you can, I suppose) live with for the rest of your life.

Ok, let us solve this problem! But what is the problem anyway? Is it that Map returns null when a key isn't present? No, it's related, but it's a more specialized case, for another example, we need to get down to basics, in Java, there are only a handful of types, a set of what are now called the "primitive" types, with just value semantics (meaning no reference), plus the reference (or may I dare say it, pointer) type. Now, save for the reference primitive, none of the other can be value "null". With that, it is unique in that you can say "no value", with the other primitives, say, integers, you have to specify a certain value as "no value", but what if null suddenly becomes a proper, valid, value? How can you then say "no value"? Yeah, I hear you, still not a problem.

A somewhat related problem, at least in Java anyway, NullPointerException. A null pointer check is one of the cheapest operations on the JVM, HotSpot can even optimize null checks away, and, if triggered later on, deoptimize them back in. So what is the harm in checking it? Nothing at all. So null references aren't all that bad, sure, they are the most common bug/problem, but still, it's not the end of the world. But what if you have a function/method, where null is a valid return, that does not have to mean "no value"/"does not exist" and throwing an exception is too heavy or to crude for that particular spot? Perhaps it's in a tight loop of some kind and the entire try/catch/finally structures just pushes it out of the "fast" margin to the "slow" end, and putting the loop in the try clause isn't an option? Then there might be something for you (it works for me), any Haskeller would, most likely, have recognized the Maybe Monad trying to shout out to you by the second paragraph of this log post.

The Maybe Monad, or the Option in Scala, are incredibly simple constructs, they only force you to deal with the possibility that it doesn't return a "value". A bit of code would be nice to see, in Haskel:
data Maybe a = Just a
             | Nothing


Or the Option type in Scala (simplified! I'll get back to that later):
trait Option[A] {}
sealed case class Some[A](val a:A) {}
case object None {}


Of course, this is also possible in Java (I gave it my own special blend, better readability if you would ask me):

interface Maybe<A> {}
final class Some<A> implements Maybe<A> {
    private final A a;
    public Some(final A a) { this.a = a; }
    public A get()         { return a;   }
}
final class None implements Maybe<Object> {}


To declare a function for all of the above 3 snippets are as follows (signatures only) for an integer division function:

integerDivision :: Integer -> Integer -> Maybe Integer

def integerDivision(a:Int, b:Int):Option[Int]

public Maybe<Integer> integerDivision(Integer a, Integer b)


Why division you ask? Well, you can't divide by zero, now can you? And I wanted an example that wasn't making it unnecessary complex, such as taking the square root of a negative number, which has a proper result! Just not easily modelled in languages such as Scala or Java without breaking everything else. But I'm moving away from my chosen subject for this post.

So, what does this gain us? We are now forced to deal with the possibility that it might return None instead of Some value, it would be fine to return null (well, not in the above example, but the point still stands). Why is it better to make it more explicit? I think it's at least better than having a sudden null pointer exception (or in this case a sudden DivisionByZeroException, or was it ArithmaticException?). An instanceof is also less heavy than a full blown try catch around each time you invoke that function.

In a way, this monad is a light-weight exception mechanism, but to say that it exactly is that would not do it justice. For example, in Scala there is this great parser combinator library (warrents its own post another time!), here they use the Option monad/construct to denote the success or failure of a particular parser (and a combination of 2 parsers yields another parser). Here Some defines the success and returns the state of the stream (and Abstract Syntax Tree till that moment), and while None denotes the failure of a parser, it also returns the location of where it went wrong and what was expected (and what was found). Then in the or parser (or any other binary parser of course) it sees None and returns the application of its right hand side (or it returns the application of the left hand side if it returned Some). In Haskell there is also another variant of this, the Either Monad, which has 4 results, Nothing, Left, Right and Both, with similar implications of course. Alas, this post isn't about Monads, still learning about them, but they are incredibly interesting! So be sure to read about them, but perhaps not the burrito tutorial. :S They appear to be unique to everyone and everyone has to discover them for themselves. But this post is about one particular monad, the Maybe Monad. And it's use outside of Haskell.

Even though the instanceof is fast (similar speed as a null check, according to this), it still has the same elegance as that of a null check. In both Haskell and Scala there is fuss about this, since they have Pattern Matching. So, can we improve this? Can we add some Java idioms to this pattern? Well, of course we can. In fact, the Option in Scala has a few. It is iterable, for None it has no elements, for Some it has 1 element, this enables the use of the Option in the various for constructs of Scala. It has a useful getOrElse function, which, for Some returns the value of Some, or the parameter in case of None. Case classes in Scala also feature automatic hashcode, equals and toString generation (and getters and for vars also setters), and more. And yet, the wonders of case classes in Scala are not in the Scope of this post.

A point should be made about usage of this pattern:

Haskell:
integerDivide 10 2 case of
   Just a -> a
   Nothing -> 0


Scala:
integerDivide(10, 2) match {
  case Some(a) => a
  case None => 0
}


Java:
integerDivide(10, 2).getOrElse(0)

Or without the getOrElse:
Maybe<Integer> result = integerDivide(10, 2);
if (result instanceof Some<Integer>) {
    return ((Some<Integer>)result).get();
} else {
    return 0;
}


Now that we have seen a basic implementation of Maybe in Java, let us come full circle with the Map scenario in the beginning and ask ourselves, can we retrofit the Map to return Maybes? Again, the answer is simply, yes. Albeit not as neatly as we might want to. sad For the purpose of this article, let us keep this "small" (yes, I know the track record about the length of the articles up until now, I aim to please ^_^) and only provide a wrapper to any map and that null values in said map are considered to be "values".

First let us define the interface for the MaybeMap:
public interface MaybeMap<K, V> extends Map<K, Maybe<V>> {
    public Maybe<V> putRaw(V value);
    public void putRawAll(Map<K, V> m);
}

public class MaybeMapWrapper<K, V> implements MaybeMap<K, V> {
    private final Map<K, Maybe<V>> map;
    public MaybeMapWrapper(final Map<K, V> m) {
        map = new HashMap<K, Maybe<V>>(m.size());
        putRawAll(m);
    }
    public Maybe<V> putRaw(K key, V value) {
        return map.put(key, new Some<V>(value));
    }
    public void putRawAll(Map<K, V> m) {
        for(Map.Entry<K, V> entry : m.entrySet()) {
            map.put(entry.getKey(), new Some<V>(entry.getValue()));
        }
    }
    public Maybe<V> get(Object key) {
        if (map.containsKey(key)) {
            return map.get(key);
        }
        return new None();
    }
    public Maybe<V> put(K key, Maybe<V> value) {
        if (value == null) {
            value = (Maybe<V>)new None();
        }
        return map.put(key, value);
    }
    // The rest simply delegates to map
}


You remember when I said it wasn't that neat? It's in the get. We need to do a containsKey to see if it's present or not. But looking at it some more, it stands to reason that no value in the internal map can be null, so whenever we get null from the get, we know it means that the key isn't present. Taking this into account we only need to change the get method to this:
    public Maybe<V> get(Object key) {
        Maybe<V> v = map.get(key);
        if (v == null) {
            return new None();
        }
        return v;
    }


I do admit, I did cheat a bit here, I had not originally thought of the modified put method which replaces null values with Nones. Without it, we couldn't assume that null only means "not present" in the get method.

Even without that, it's not entirely neat, the raw versions of put and putAll are needed here because generic arguments are erased after compilation. Meaning suddenly there would be 2 put methods and 2 putAll methods, but having the same signatures.
The definition of a Maybe<V> put(K key, Maybe<V> value); in the Maybe interface would help for the put (after erasure: put(Object, Maybe) and put(Object, Object)), but not for the putAll (still putAll(Map) and putAll(Map)). Even if we were to overload the putAll for MaybeMap it might still go to the putAll(Map), since MaybeMap is a Map. We could introduce an extra parameter, an enum which defines if null values should be treated as None or not (for example). But this might be something for another post. As it stands now, it is already fairly usable. But let us focus on the Maybe interface and the implementing classes and see what we can do to improve things a bit further.

instanceof is fast, considered to be as fast as a nullcheck, but I suppose an invokeinterface is also incredibly fast, meaning, if we create an isSome (or isNone, doesn't really matter) that is specialized in the Some and None classes, then it could practically be inlined in the generated assembly. And if we do the same for getOrElse, it is the same thing.

So let us do that:
public interface Maybe<A> {
    public boolean isNone();
    public <E extends A> A getOrElse(final E otherwise);
}
public class Some<A> implements Maybe<A> {
    private final A a;
    public Some(final A a)  { this.a = a;   }
    public boolean isNone() { return false; }
    public <E extends A> A getOrElse(final E otherwise) { return a; }
    public A get() { return a; }
}
public class None implements Maybe<Object> {
    public boolean isNone() { return true; }
    public <E extends Object> E getOrElse(final E otherwise) { return otherwise; }
}


But we can do one better; invokevirtual is faster than invokeinterface, and the JVM likes final methods/fields, etc, etc, meaning it can inline stuff. Next, having more than just Some and None as subclasses doesn't add anything (in this case anyway), so we will not allow it.

public abstract class Maybe<A> {
    Maybe(){}
    public abstract boolean isNone();
    public <E extends A> getOrElse(final E otherwise)	{	return a;	}
}


As a side note, we could just have implemented it here, but as mentioned above, it make just as much sense to specialize it all.

public final class Some<A> extends Maybe<A> {
    private final A a;
    public Some(final A a) { this.a = a; }
    @Override public final boolean isNone() { return false; }
    @Override public final <E extends A> A getOrElse(final E otherwise) { return a; }
    public final A get() { return a; }
}

public final class None	extends Maybe<Object> {
    private None() {}
    @Override public final boolean isNone() { return true; }
    @Override public final <E extends Object> E getOrElse(final E otherwise) { return otherwise; }

    public static final None none = new None();
    public static final <E> Maybe<E> getNone() { return (Maybe<E>)none; }
}


That is all there is to it! Seems just another little class that everyone keeps writing because it's easier than just including yet another jar, doesn't it? Just as easy as that Pair (or Tuple) class you just wrote (ok, I keep writing them, surely I am just another average programmer).
Or is it? I should probably explain why I return an E instead of an A with the getOrElse in the None class, well, it is something called "covariance", this has existed since, practically forever I suppose (in Java), it simply means that a subclass may specialize the return type of a method when it overrides said method. In this case, we specialize with E (since it's a subclass of Object). Then there is the issue with not adding a type parameter to None and subclassing directly from Maybe, generics in Java work through a method (to ensure backwards compatibility) called " type erasure" which means that (most) generic type information is lost on compilation. Which means that a List<String> could be equivalent to a List<Integer>, at runtime. So we provide a getter method that simply casts an instance of None to the right Maybe<?> type. This trick is used in the Collections utility class, where it is used for the empty<CollectionType> functions, they store singleton instances for each type, properly specialized for their function and simply return that, type erasure ensures that they have the right type. On the same note, let us add another method that creates Somes. And move them to the Maybe abstract class, seems a little better place for it. This does mean changing the private constructor of the None class to package private. But that isn't such a big deal, it is still private as far as the API is concerned. These helper methods on the Maybe class is also part of the reason for having a base class and not an interface. There are just a few remaining questions for the final api. For instance, should the new some function in Maybe return None on null? It's an interesting question which I can't really give you an answer to. Even if it is because I haven't really made my mind up yet. Null can be a valid value, so it "should" be possible to create a Some with as value null. Perphaps the some function will return None on null, but you can still do new Some(null) and do it that way. But why is this null value keep popping up? And keep being "annoying"? Bugging me about some "code smell" that I can't really put my finger to? What is so unique about null? So different? You can assign it to every reference value. That little fact, I think that might be it for me. It subverts the type system. That would be reason enough for me. While I was writing this post (yeah, other stuff keeps popping up) the very interesting Ola Bini made a post about something called "Bottom Types". I do recommend you to read it, it even mentions the Maybe Monad and null values! I find it uncanny. Well, his post is the better one, that is for sure. p It also mentions an origin of the null reference exception, a Sir Tony Hoare calls it the "Million Dollar Mistake". And yes, this person is an authority on the subject. As an interesting bit on the side, C++ has null pointers, but never null references, and references are "managed", but not in a Garbage Collection kind of way, and has had it for quite a while. The null reference really is an error born of Sloth (laziness). While otherwise a (when properly managed) perfectly good and handy trait among programmers it does seem to have given us a quite a few problems down the line since he invented them 44 years ago, and we are still having this problem, even with the Elvis operators coming in Java7, as the Groovy people call them (among other people). And on that note, I think a little recap of what we have done here is in order before I publish this post, and yes, I shall be coming back to this subject in a future post. Especially after I learn more about Type Systems (currently I only "know" them intuitively, nothing concrete, no names or anything like that, just the 4 basic names and the term "type inference"). Sometimes (always?) it is better to be explicit than to use a magic, possibly having double meanings, constant, such as null. Do tell me, you don't also type in the same String literal time and again in the same function? Surely extract all those literals into one final static field? Or even better, switch to enums! So why not for null? Well, null is compile checked, meaning you can't accidently misspell it. And we have seen (thanks to Ola Bini) that null actually subverts the type system! So how can we make this better? Haskell and Scala show us the way, the Maybe Monad. It is a small collection of three classes that are practically as easy to write as a Pair or (simple) Tuple class. Which everyone always seem to write. It allows us to be explicit when a function returns something or does not return something, without being as "heavy" as an exception while still sufficently distinctive and "present" that you have to deal with it, instead of waiting for a nullpointerexception during runtime. Of course, you can be explicit with null, but it is so easy (too easy) to just "ignore" it, even if the JavaDoc explicity states that you need to deal with null. We also retrofitted a map to return Maybe instances, meaning Some if there is a key and None if there isn't, instead of null for either if the value is present and it happens to be null or if it isn't present. And lastly, a code listing of the Maybe class I had written up for this post, you should put it in some utility package:
public abstract class Maybe<T> extends Iterable<T> {
    Maybe() {}
    public abstract <E extends T> T getOrElse(E otherwise);
    public abstract boolean isSome();
    public abstract boolean isNone();
    public abstract int hashCode();
    public abstract boolean equals(Object o);
    public abstract String toString();

    public static <T> Maybe<T> some(T t) {
        if (t == null) {
            return <T>none();
        }
        return new Some<T>(t);
    }
    private static final None _none = new None();
    @SuppressWarnings("unchecked")
    public static <T> Maybe<T> none() {
        return _none;
    }
}

public final class Some<T> extends Maybe<T> {
    private final T some;
    public Some(T t) { some = t; }
    @Override public final <E extends T> T getOrElse(E otherwise) { return some; }
    @Override public final boolean isSome() { return true;  }
    @Override public final boolean isNone() { return false; }
    @Override public final int hashCode() {
        if (some == null) {
            return 0;
        }
        return some.hashCode();
    }
    @Override public final boolean equals(Object o) {
        if (o instanceof Some) {
            if (some == null) {
                return o == null;
            }
            return some.equals(((Some) o).some);
        }
        return false;
    }
    @Override public final String toString() {
        if (some == null) {
            return "Some(null)";
        }
        return new StringBuffer("Some(").append(some.toString).append(')');
    }
    @Override public final Iterator<T> iterator() {
        return Collections.singletonSet(some).iterator();
    }
}

public final class None extends Maybe<Object> {
    None() {}
    @Override public final <E extends T> E getOrElse(E otherwise) { return otherwise; }
    @Override public final boolean isSome() { return false;  }
    @Override public final boolean isNone() { return true; }
    @Override public final int hashCode() {
        return None.class.hashCode();
    }
    @Override public final boolean equals(Object o) {
        return o instanceof None;
    }
    @Override public final String toString() {
        return "None";
    }
    @Override public final Iterator<Object> iterator() {
        return Collections.emptySet().iterator();
    }
}
PS. Fixed the code to implement the Iterable interface, I shouldn't write this at night.

Now a really (truely) small post, a tiny remark about a tiny bit of language...Diamond Bridge Commander - Projectiles

Comments

Nguyen The Trungdeulamco Tuesday, March 16, 2010 3:50:58 AM

Nice effort...

Write a comment

New comments have been disabled for this post.

June 2012
S M T W T F S
May 2012July 2012
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30