Generic List
Sunday, 15. November 2009, 22:17:13
I will simplify the scenario since Project V has functionality that would only serve to complicate what I need.
Suppose we have a generic list and in it, you can only insert entries. An entry has the following three members.
1. ID (Guaranteed to be globally unique)
2. Name (not necessarily unique)
3. Value (not used within context of the generic list).
An entry has a fourth "member" which is a list of named generic lists. These named lists are actually used as named sets.
We can retrieve an entry from a set by ID or by Name. This is no problem. Just use a map, hash, or whatever else that is sorted.
I use three internal lists. One is an associative array (otherwise known as STL::map in C++) for sorting by ID, another associative array where they key need not be unique (otherwise known as STL::multimap in C++) for sorting by name and an indexed list. That last one is a custom list I've built by using a skiplist except that I use the index as the key. The index isn't stored. Instead, I store the amount of items that is skipped along with each forward pointer. When inserting and deleting, I simply update the amounts when I update the pointers. So it comes at no extra cost in runtime complexity.
Inserting at the beginning of this custom skip list is O(1) which is just fine by me (details about inserting at the end at the bottom of this article). Inserting in at a random location requires O(log n). Also fine by me since a vector takes O(n) by comparison. Yes, this is the worst case for a skip list as well, but expected runtime is O(log n) for the skip list. If you get O(n) for large sets, something went horribly wrong. Searching is O(log n) which is also fine because it has the same runtime as an associative array (map) though perhaps a little slower in some cases, but hardly noticeable. And deletion has expected runtime of O(log n) as well.
An associative array takes O(log n) for each of insertion, deletion and search. The data structure I'm using for my numerically ordered list (an indexed skip list) likewise takes O(log n) insertion, deletion and search.
Separately, everything looks great. But when used together and without an index, deletion takes O(n) as we have to scan all entries. Is there a way around this? Can we do O(log n) or better?
Something tells me it's possible, but I don't see it right now. If anyone has any suggestions, or knows that it's impossible, please let me know.
Insert Delete Search
map log(n) log(n) log(n) by name or ID
indexed log(n) log(n) log(n) by index
map N/A N/A N/A by index
indexed 1 n N/A by name or ID
----------------------------------------------
overall log(n) n log(n)
Note how insert, delete and search is never done by index on the map. Since we must do the operation on the indexed list first, we will obtain the entry and this will give us the name and ID needed by the the map (so the index is not used at all with the map). But the opposite is not true. If we insert by name or ID, there is NO index stored at all. This helps in the case of inserting by index since we default to inserting it at the end of the list. This results in O(1) (if we tweak the skip list). But it has disastrous effects when trying to find the item in the list so as to delete it. Although N/A is listed under Search for indexing by name or ID, it is actually used when deleting. In all other cases, searching by name or ID is always done with the map instead.
I had once tried to associate an index with each element, but when you delete an item, you need to update the rest. This is O(n).
I could create another associative array where I map the entry to the data structure that wraps said entry to its forward pointers used by the skip list. I could locate the element in O(log n) inside my skip list, but the problem is that I need all of the previous pointers. Currently, I'm only using forward pointers and not a doubly linked list. If I modify the skip list to use a doubly linked list, I think I can achieve O(log n) for deletion without having the index (only the element). This would also allow me to insert at the end in O(1).
This generic list is quite the memory hog needing 4 times the required memory for references on top of the memory used for the keys. I'll have to look if there isn't a way to use an STL map where the key is in the value. That would save a huge amount of storage. Right now, I'm using a pointer. This takes less space than using the key proper, but it's still taking up space for no reason.
Oh, and if anyone is wondering, I'm not going to implement undo in Project V right now as it's too much of a pain. I can implement it later anyhow since the framework is there to support it. So I'm going to concentrate on getting something up and running and then worry about adding other features later.



