Skip navigation.

Software Development

Correcting The Future

November 2009

( Monthly archive )

Climategate

,

If you only watch CNN, MSNBC, CBS, ABC, BBC or CBC for your news, you probably didn't hear about one of the biggest scientific, economic and global scandals ever. I'm a big fan of conspiracy theories for their entertainment values, but this has real consequences. Still, it does have a lot of comedic value in some parts. Specifically, the coding part. It's in FORTRAN, but the comment file is the best part.

Here is a link to some parts of it. I'm sure you can find the rest of the emails and code on your own unless Google restricts searches again (yes, it's true). I think there's a torrent with all of it somewhere.

I've heard plenty of people try to dismiss this. Not sure why. In case you haven't heard the story, the IPCC (the agency that advises the UN on climate change) had thousands of emails, code and data released a week ago. It is believed that either a hacker or someone on the inside released them. A BBC reporter found some of his own correspondence within the set of released emails and confirmed that the ones he was included in were authentic, but could not say anything about the others. Apparently, there were scientists at the University of East Anglia who were responsible for much of the data modeling. CRU has since admitted that the original raw data has been dumped and deleted.

Quoted from that article:


Professor Phil Jones, the activist-scientist who maintains the data set, has cited various reasons for refusing to release the raw data. Most famously, Jones told an Australian climate scientist in 2004:

Even if WMO agrees, I will still not pass on the data. We have 25 or so years invested in the work. Why should I make the data available to you, when your aim is to try and find something wrong with it.




This is from the head guy at the IPCC back in 2004!

Read more...

Generic List

I don't know if this is possible, but it'd be cool if it was. In Project V, I use a generic list with the objective that no operation takes more than O(log n). However, it has one drawback which causes it to be just short of that objective. Before going into that, a description of the list is required.

I will simplify the scenario since Project V has functionality that would only serve to complicate what I need.

Suppose we have a generic list and in it, you can only insert entries. An entry has the following three members.

1. ID (Guaranteed to be globally unique)
2. Name (not necessarily unique)
3. Value (not used within context of the generic list).

An entry has a fourth "member" which is a list of named generic lists. These named lists are actually used as named sets.

We can retrieve an entry from a set by ID or by Name. This is no problem. Just use a map, hash, or whatever else that is sorted.

I use three internal lists. One is an associative array (otherwise known as STL::map in C++) for sorting by ID, another associative array where they key need not be unique (otherwise known as STL::multimap in C++) for sorting by name and an indexed list. That last one is a custom list I've built by using a skiplist except that I use the index as the key. The index isn't stored. Instead, I store the amount of items that is skipped along with each forward pointer. When inserting and deleting, I simply update the amounts when I update the pointers. So it comes at no extra cost in runtime complexity.

Inserting at the beginning of this custom skip list is O(1) which is just fine by me (details about inserting at the end at the bottom of this article). Inserting in at a random location requires O(log n). Also fine by me since a vector takes O(n) by comparison. Yes, this is the worst case for a skip list as well, but expected runtime is O(log n) for the skip list. If you get O(n) for large sets, something went horribly wrong. Searching is O(log n) which is also fine because it has the same runtime as an associative array (map) though perhaps a little slower in some cases, but hardly noticeable. And deletion has expected runtime of O(log n) as well.

An associative array takes O(log n) for each of insertion, deletion and search. The data structure I'm using for my numerically ordered list (an indexed skip list) likewise takes O(log n) insertion, deletion and search.

Separately, everything looks great. But when used together and without an index, deletion takes O(n) as we have to scan all entries. Is there a way around this? Can we do O(log n) or better?

Something tells me it's possible, but I don't see it right now. If anyone has any suggestions, or knows that it's impossible, please let me know.

        Insert  Delete Search
map     log(n)  log(n) log(n)  by name or ID
indexed log(n)  log(n) log(n)  by index
map     N/A     N/A    N/A     by index
indexed 1       n      N/A     by name or ID
----------------------------------------------
overall log(n)  n      log(n)


Note how insert, delete and search is never done by index on the map. Since we must do the operation on the indexed list first, we will obtain the entry and this will give us the name and ID needed by the the map (so the index is not used at all with the map). But the opposite is not true. If we insert by name or ID, there is NO index stored at all. This helps in the case of inserting by index since we default to inserting it at the end of the list. This results in O(1) (if we tweak the skip list). But it has disastrous effects when trying to find the item in the list so as to delete it. Although N/A is listed under Search for indexing by name or ID, it is actually used when deleting. In all other cases, searching by name or ID is always done with the map instead.

I had once tried to associate an index with each element, but when you delete an item, you need to update the rest. This is O(n).

I could create another associative array where I map the entry to the data structure that wraps said entry to its forward pointers used by the skip list. I could locate the element in O(log n) inside my skip list, but the problem is that I need all of the previous pointers. Currently, I'm only using forward pointers and not a doubly linked list. If I modify the skip list to use a doubly linked list, I think I can achieve O(log n) for deletion without having the index (only the element). This would also allow me to insert at the end in O(1).

This generic list is quite the memory hog needing 4 times the required memory for references on top of the memory used for the keys. I'll have to look if there isn't a way to use an STL map where the key is in the value. That would save a huge amount of storage. Right now, I'm using a pointer. This takes less space than using the key proper, but it's still taking up space for no reason.

Oh, and if anyone is wondering, I'm not going to implement undo in Project V right now as it's too much of a pain. I can implement it later anyhow since the framework is there to support it. So I'm going to concentrate on getting something up and running and then worry about adding other features later.

P =/!= NP? Travelling Salesman Problem (Updated Nov 7, 2009)

The travelling salesman problem (TSP) is where you try to find the shortest route when visiting N number of towns. Apparently, the decision version of this problem, if solved in polynomial time, would be a solution to an NP-complete problem and would also solve all other NP-complete problems in polynomial time. Not only that, but since you can use binary search in O(log(n)) time when used with the decision version of TSP, finding the shortest distance up to X digits of precision would likewise be done in polynomial time.

My first attempt was to first build a convex hull around all the cities (points). A nice property of this is that if all points are part of the convex hull, then you have the shortest path.

Read more...

Easy Convex Hull Construction

If you've ever tried to look at P=NP and related problems, you've probably encountered the traveling salesman problem. And if you've ever tried to find a solution to this, even if just for fun, you may have thought about building a convex hull to start off. Well, I thought of building one. But I couldn't find an easy algorithm that I could understand. Sure, there are plenty of them out there and I know how they work, but they all seemed a little convoluted. So I created my own algorithm only to find out it's a variation of the Graham Scan (you can see the one liner at the bottom of the article describing it under "Notes"). Still, the version described here is super simple to understand and I thought it deserves a mention.

A Convex Hull is essentially a list of points that completely contains all other points where all angles between any three points of this convex hull is convex (hence its name). You can think of it as an elastic band surrounding all points.

Read more...

December 2009
S M T W T F S
November 2009January 2010
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31