Software Development

Correcting The Future

Generic List

I don't know if this is possible, but it'd be cool if it was. In Project V, I use a generic list with the objective that no operation takes more than O(log n). However, it has one drawback which causes it to be just short of that objective. Before going into that, a description of the list is required.

I will simplify the scenario since Project V has functionality that would only serve to complicate what I need.

Suppose we have a generic list and in it, you can only insert entries. An entry has the following three members.

1. ID (Guaranteed to be globally unique)
2. Name (not necessarily unique)
3. Value (not used within context of the generic list).

An entry has a fourth "member" which is a list of named generic lists. These named lists are actually used as named sets.

We can retrieve an entry from a set by ID or by Name. This is no problem. Just use a map, hash, or whatever else that is sorted.

I use three internal lists. One is an associative array (otherwise known as STL::map in C++) for sorting by ID, another associative array where they key need not be unique (otherwise known as STL::multimap in C++) for sorting by name and an indexed list. That last one is a custom list I've built by using a skiplist except that I use the index as the key. The index isn't stored. Instead, I store the amount of items that is skipped along with each forward pointer. When inserting and deleting, I simply update the amounts when I update the pointers. So it comes at no extra cost in runtime complexity.

Inserting at the beginning of this custom skip list is O(1) which is just fine by me (details about inserting at the end at the bottom of this article). Inserting in at a random location requires O(log n). Also fine by me since a vector takes O(n) by comparison. Yes, this is the worst case for a skip list as well, but expected runtime is O(log n) for the skip list. If you get O(n) for large sets, something went horribly wrong. Searching is O(log n) which is also fine because it has the same runtime as an associative array (map) though perhaps a little slower in some cases, but hardly noticeable. And deletion has expected runtime of O(log n) as well.

An associative array takes O(log n) for each of insertion, deletion and search. The data structure I'm using for my numerically ordered list (an indexed skip list) likewise takes O(log n) insertion, deletion and search.

Separately, everything looks great. But when used together and without an index, deletion takes O(n) as we have to scan all entries. Is there a way around this? Can we do O(log n) or better?

Something tells me it's possible, but I don't see it right now. If anyone has any suggestions, or knows that it's impossible, please let me know.

        Insert  Delete Search
map     log(n)  log(n) log(n)  by name or ID
indexed log(n)  log(n) log(n)  by index
map     N/A     N/A    N/A     by index
indexed 1       n      N/A     by name or ID
----------------------------------------------
overall log(n)  n      log(n)


Note how insert, delete and search is never done by index on the map. Since we must do the operation on the indexed list first, we will obtain the entry and this will give us the name and ID needed by the the map (so the index is not used at all with the map). But the opposite is not true. If we insert by name or ID, there is NO index stored at all. This helps in the case of inserting by index since we default to inserting it at the end of the list. This results in O(1) (if we tweak the skip list). But it has disastrous effects when trying to find the item in the list so as to delete it. Although N/A is listed under Search for indexing by name or ID, it is actually used when deleting. In all other cases, searching by name or ID is always done with the map instead.

I had once tried to associate an index with each element, but when you delete an item, you need to update the rest. This is O(n).

I could create another associative array where I map the entry to the data structure that wraps said entry to its forward pointers used by the skip list. I could locate the element in O(log n) inside my skip list, but the problem is that I need all of the previous pointers. Currently, I'm only using forward pointers and not a doubly linked list. If I modify the skip list to use a doubly linked list, I think I can achieve O(log n) for deletion without having the index (only the element). This would also allow me to insert at the end in O(1).

This generic list is quite the memory hog needing 4 times the required memory for references on top of the memory used for the keys. I'll have to look if there isn't a way to use an STL map where the key is in the value. That would save a huge amount of storage. Right now, I'm using a pointer. This takes less space than using the key proper, but it's still taking up space for no reason.

Oh, and if anyone is wondering, I'm not going to implement undo in Project V right now as it's too much of a pain. I can implement it later anyhow since the framework is there to support it. So I'm going to concentrate on getting something up and running and then worry about adding other features later.

P =/!= NP? Travelling Salesman Problem (Updated Nov 7, 2009)Climategate

Comments

Robbert van Dalenrapido Tuesday, November 17, 2009 7:42:47 AM

You should try to factor out your problem into a single (data structure) problem.

May be this will help. What about an array that has the following API?
(scala language)

trait Array[K]
{
def get(index: Int): Option[K]
def head(index: Int): Array[K]
def tail(index: Int): Array[K]
def concat(other: Array[K]): Array[K]
}

With this API you can insert and combine Arrays every which way.
Let's assume ordinary Strings implement the Array API. Then we can insert ' beautiful' between 'Hello' and ' World!' as follows

val w = "Hello World!"
val b = "beautiful "

val r =w.head(5).concat(b).concat(w.tail(5))

r now holds "Hello beautiful World!" and each of its characters can be indexed with get(index).

The question is: can we make all the Array operations log(n)? It turns out we can: search for 'Finger Trees'.
Amazingly, such magical data structure is made possible through immutability!

Vorlath Tuesday, November 17, 2009 11:24:39 PM

Thank you for your response.

Unfortunately, I have to question your assertion that this is made possible through immutability. My issue is wanting to delete in O(log n). Your finger tree cannot do that, but I believe I have enough information there to know that I can solve my problem with exactly the method I mentioned at the end of the article.

My skip list IS analogous to a finger tree as it exists now. My question relates to fixing a finger tree so that it can do what I want.

I already have access to the first and last elements (required by skip list). And each node in the "tree" (forward pointers in the skip list) are annotated with a count (with the amount of entries that can be skipped). In every respect, I unknowingly created a finger tree with the use of a skip list.

The problem is doing deletions. Say you don't know the index. You travel through the tree in O(log n) and find the node no problem. But now you have a problem. Your cons list only has forward pointers. So how do you delete the item and only keep the rest of the list? Even though you have the node, it's useless because you need to know what the previous item is so that you can create a NEW list. Without bi-directional pointers, you're screwed. You'd need to scan the whole list again. That's the issue at hand since this is O(n).

Being able to either go up the tree (requiring bi-directional pointers) or going backwards in your list (doubly linked list), you need bi-directional pointers in both cases. And since the tree and the list are one and the same in a skiplist, I need to implement bi-directional pointers at all levels of the skip list.

Using finger trees, I would likewise need to change it so as to support my requirements. Thanks for the suggestion though. It allowed me to read up on the topic and I learned something new today. For that, you have my thanks!

Robbert van Dalenrapido Thursday, November 19, 2009 10:36:51 AM

| Unfortunately, I have to question your assertion that this is made possible through immutability. My issue is wanting to delete in O(log n).

Deletion is easy with the Array[K] interface.

val w = "Hello world!"
w.head(5).concat(w.tail(6)) -> "HelloWorld!"

This operation takes O(log(n)) + O(log(n)) + O(log(n)) with Finger Trees.
For 'deletions' or 'modifications' to work in O(log(n)), a lot of structure needs to be shared from the (unmodified) version. Of course, unshared structures can be safely deleted or garbage collected.

To (savely) share structures they have to be immutable: that's why I claim that immutability is the basis of fast array 'deletions' (that aren't actually deletions).

Vorlath Thursday, November 19, 2009 4:02:50 PM

Deletion is easy with the Array[K] interface.

val w = "Hello world!"
w.head(5).concat(w.tail(6)) -> "HelloWorld!"



But you don't have 5 or 6. That's the problem.

This operation takes O(log(n)) + O(log(n)) + O(log(n)) with Finger Trees.



Actually, the finger tree cannot perform the operation at all.


The issue isn't deleting by index. It's deleting by element. Suppose I want to delete "W" (you can assume there is only one for now). Do that in O(log n).

Robbert van Dalenrapido Thursday, November 19, 2009 7:17:40 PM

OK. I think I understand the problem.

deleting 'W' can be done in O(log(n)), assuming there is only one occurrence.

The idea is that every (sub)string is annotated with a set of (all the) characters it contains.

Interestingly, you can annotate Finger Trees via a Measured that's essentially a Monoid.
In this case, the identity of the (Set) Monoid will be the empty set, and the binary function of the Monoid is the union of two sets.

The union of two sets can be made O(log(n)), so Finger Tree operations still run in O(log(n))

To delete "W" in a string, you just need to query the set that is associated with the string:
If the set holds "W", you split the String in the middle (on index) to a left and right part.
Because the left and right part also have a set annotation, so you can search "W" recursively until you find it.

When "W" is found you will know its index.
Hence, you can 'delete' "W" based on index.

Vorlath Thursday, November 19, 2009 10:05:01 PM

you split the String in the middle (on index) to a left and right part



But how do you split without having a reference to the previous element?

When "W" is found you will know its index.



How so? Please explain. I'm REALLY interested in knowing how this is accomplished without bi-directional references.

Note that if you're indexing into an array, then you can indeed find the index, but the new array will have to be built in O(n) because you have to copy all the elements. If you don't copy the array and use wrappers with references, it can get messy and I don't want the overhead/fragmentation.

I don't see how you can get O(log n) all around without bi-directional references.

Anyways, thanks again for your help. I appreciate it! It's pretty clear that bi-directional references will do what I want, so I'll go with that for now. If you see a simpler way, please let me know.

From what I understand, you're taking advantage of the fact that you're using an array in order to get the previous element, no? (In order to split the list) If so, then this requires a list (array or container) with bi-directional iterators as I've said. This would be the only change I need to implement.

Write a comment

New comments have been disabled for this post.

June 2012
S M T W T F S
May 2012July 2012
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30