Software Development

Correcting The Future

Containers with Multiple Lists

Suppose you wanted the same elements ordered 4 different ways. In my case, I have elements that are shared between different containers, so my elements are referenced by pointers. Not normally necessary, but it suits this example. Suppose I want my elements ordered by index of my choosing. These elements also have a name and a unique ID. So I want my elements sorted by name (not unique) and by ID. I also want my elements sorted by pointer to the element itself.

This would require 4 different lists within the same container. I originally had a class that had 4 STL containers inside. But it was getting tedious, cumbersome and error prone. If I delete an iterator, I want that element gone from all lists. I also wanted a way to convert from one iterator type to another. My current list has no iterators at all. I duplicate the list and return a deque. From that, you can do what you want. It's also always ordered by index. You can search for an element by ID, name, pointer or index. But it returns the element itself. You can't traverse the list as you would an STL container.

It needed replacing. It grew and grew over time into a monstrosity that just wasn't good enough. And don't mistake this for shoddy programming. I could not find anything anywhere that does what I wanted. This is why I had to write my own container.

I've finally figured out how to have multiple lists within the same container and still have it work well with STL.

First off, you can no longer use maps. Instead, you use a set (the SkipList equivalent that I wrote) with a template parameter for the key type used in the search. You also use another template parameter to specify the comparison functor. When you call find(key_type key), it will use your functor during the search with the key argument. This means that the key must be held within the element itself.

That leaves only indexing. You can use autoindexing with a set. Or you can use manual indexing. The first version will automatically index your elements as you insert them. They will be sorted by the key and the order they appear in the list will determine the indexes. This is how autoindexing is done. The other version is where you insert an element at a specific location that you specify directly. You can use either one. Not only that, but you can use as many different indexed lists as you want.

The issues I had were as follows.

1. Insertion (into all lists)
2. Deletion (from all lists)
3. Converting between iterator types.
4. Creating the nodes.

The way this compound container works is that you derive from a template class that I wrote. It will contain basic stuff like the list itself, the number of items and other things related to skiplists. Everything works with skiplists, so you have log(n) performance on everything.

When you derive the template class, you must specify how many different lists in total you want and how many of those lists will be indexed. This will create the node type for you. It will make sure to create one set of forward and backward pointers for each list. It will also create the skip count storage for indexed lists. All done through templates.

The next step involves inserting all the different container types you want to use for each list. These are predefined templates that will create the container and iterators for each specific type of container you want to use. The last thing you need to do is include each container in a predefined vector. This is because each container needs a pointer to the primary container and this will be handled automatically by the constructor in the template. However, you must include each container in the list manually in a virtual init method. I'm hoping to find an easier way, but for now this will do.

At this point, you can start using your compound container. To use it as an indexed container, simply use the methods in the template that you declared for that purpose. And for other kinds of container, then use the respective template that was declared.

I thought I'd have a problem with deletion, but since I now have access to all the lists, I can delete the node from all lists at once. Insertion was very troublesome at first. But I think I have solved how to do it. You can use any internal container to insert an element. It will order them each according to the key found within for each of the comparisor functors. Since all internal containers are in a vector, I can make sure that all keys are handled properly. For manually indexed lists, the element will be added at the end unless you used the insert method of an indexed skiplist container. This way, you can specify the index you want to use for that list. However, you can only specify the index for ONE and only ONE indexed list when inserting an element.

This leaves one other problem. Normally, you remove an element and then insert it into its new location. But with multiple lists, removing an item removes it from all lists. So for manually indexed lists, I will provide functionality to MOVE an item from one location to another. It will also support moving ranges. And this is the drawback with this compound container. You should not be assigning values to the elements. You should not swap elements or alter the elements in any way. Especially not any keys found within the element. So any algorithm that re-orders the list cannot be used. I will try to write a sort algorithm that can work with this new compound container. It's the forward and backward pointers that need to be updated, not the element itself. This way, I can update individual lists without altering any of the other lists.

About converting iterators, I will provide a base class that will be common amongst them all. I only need the node pointer to be able to convert the iterator. So I'll provide a constructor as well as a new function to convert iterators in each of the internal containers.

If I can get all this to work, and I see no reason why not since I've already got the nodes and iterators done, then I'm really looking forward to the next step. I consider this quite an accomplishment. It's not that it's difficult to have multiple lists within a compound container. It's that it's generic and can still work with STL algorithms that don't alter the ordering of the lists. Some features will have to go unfortunately. Constructors that built default elements will no longer be available. And methods that insert x copies of the same element will no longer be available. Everything else should still be there.

I've heard a lot of people say that when they see someone using SkipLists, it's because they don't know what they're doing and that other containers are always faster. In this case, I don't see any other containers being able to do what I want. I think SkipLists are VERY beneficial here because it allows me to use the same node type for all the different container types.

Here's an example of how I want this to go.

class MyNode
{
public:
  UnicodeString name;
  NID ID; // Unique ID
  int value; // Some data.
};

// I want four lists.
// 1. Sorted by name (non-unique).
// 2. Sorted by ID (unique).
// 3. Sorted by index.
// 4. Sorted by pointer to node (MyNode*).

// Indexed list must come first when putting them in the array.

// I have 4 list and 1 of them is indexed.
class MyContainer : public AggContainer<MyNode*,4,1>
{
public:
  typedef MyNode* T;
  // Declare the internal containers.
  AggIndexedSkipList<T> IndexedContainer;
  AggMultiSkipList<T> NamedContainer;
  AggSkipList<T> IDContainer;
  AggSkipList<T> PtrContainer;

  virtual void Init()
  {
    Containers.push_back(IndexedContainer);
    Containers.push_back(NamedContainer);
    Containers.push_back(IDContainer);
    Containers.push_back(PtrContainer);
  }
};


There's one more thing that's missing. For the three keyed lists, we need a comparison functor for each one as well as a functor to retrieve the key from the element. Those functors are passed as template parameters when declaring the internal containers. I left it out to show the basic setup and not clutter the code. However, the default less<UnicodeString> functor can be used for NamedContainer. less<NID> for IDContainer and less<MyNode*> for PtrContainer. As for retrieval of the keys, it's simply a matter of returning the member such as "return node->ID;". For PtrContainer, the key is the node, so no functor needed there.

Here is the full code.

class MyNode
{
public:
  UnicodeString name;
  NID ID; // Unique ID
  int value; // Some data.
};

// I want four lists.
// 1. Sorted by name (non-unique).
// 2. Sorted by ID (unique).
// 3. Sorted by index.
// 4. Sorted by pointer to node (MyNode*).

// Indexed list must come first when putting them in the array.

class NamedFunctor
{
public:
  UnicodeString operator()(MyNode *node) const {return node->name;}
};

class IDFunctor
{
public:
  NID operator()(MyNode *node) const {return node->ID;}
}

// I have 4 list and 1 of them is indexed.
class MyContainer : public AggContainer<MyNode*,4,1>
{
public:
  typedef MyNode* T;
  // Declare the internal containers.
  AggIndexedSkipList<T> IndexedContainer;
  AggMultiSkipList<T,less<UnicodeString>,NamedFunctor> NamedContainer;
  AggSkipList<T,less<NID>,IDFunctor> IDContainer;
  AggSkipList<T,less<MyNode*> > PtrContainer;

  virtual void Init()
  {
    Containers.push_back(IndexedContainer);
    Containers.push_back(NamedContainer);
    Containers.push_back(IDContainer);
    Containers.push_back(PtrContainer);
  }

  MyContainer() : AggContainer() {}
};


From this point on, you can use any of the internal containers to manipulate the list and obtain iterators. A special convert() method will be included in each container for the purpose of converting iterators. And since there is only one indexed list, I should use the IndexedContainer to insert any elements as this will allow me to specify where to insert each node in the indexed list. All other lists will be automatically updated.

It takes a little doing to get it all setup, but for what I want to do, this is a million times better than what I have now.

WritingAnother C++ Link (Updated with yet another link!)

Comments

Unregistered user Monday, July 19, 2010 11:22:59 PM

sired22 writes: Another step towards the horizon and another piece of your puzzle. It always aggravates me when i cant do something with a computer, we humans invented literally everything about computers and so they should do anything within the limits of hardware. If i cant do something because of a self imposed limitation in the general design that just vexes me. I'm looking forward to working with what you are creating. I find myself imaging what the first public build will be like and look forward to a forming community. Also when you look at the real world and nature doesn't that fit more in line with your data flow approach?

Vorlath Tuesday, July 20, 2010 12:43:15 AM

I'm hoping to work some more today in finishing this compound list. I could write a custom list and be done with it. But I find a generic compound list to be too useful to not complete it now while I have the chance. I will release all my SkipLists including the compound list as Open Source when I'm done. I'm sure it'll come in handy to some people.

I find myself imaging what the first public build will be like and look forward to a forming community.



I released a demo a while back that had the basic environment on display. No GUI for Project V though. I then released another version that let you place new interfaces for components on the screen, move them around and resize them. So some people have had a quick taste already. Visually, I'm not that far ahead from that point. I have some lists and a few more buttons. But I'm concentrating on the command line for now and will add in GUI manipulation via the mouse and keyboard (on the network itself like naming an input by clicking on it and typing it in) later on. Right now, you'll use the command line to set the name of the input and connect components together. I think the command line will be useful for scripts later on that generate networks on the fly.

Also when you look at the real world and nature doesn't that fit more in line with your data flow approach?



Pretty much everything, yeah. Even in non-dataflow environments like my compound list in C++, I used techniques that are used in the real world. For example, I have nodes that have forward and backward pointers for each list. I did not want to create a set of methods for each list. And neither did I want to specify an index since each internal container will already know which set of forward and backward pointers to use. How to deal with the problem?

I solved it like it's solved in the real world. In carpentry, you often create a jig that will let you operate on materials in a new and unique way. The idea here is creating a tool that will do most of the work for you. So the internal containers produce a templated helper class that will work only on the forward and backward pointers required for that container. It's typesafe. Not only that, but it will not access forward and backward pointers from other lists. And the container has no idea that it's using a helper class rather than the real thing.

This technique often goes unused. It's a shame. It's the difference between asking paper to cut itself (OO) and using a pair a scissors to cut the paper (helper class). Sometimes the first method works. Sometimes it's the second method that's best. Dataflow is more toward the second option which is in line with the real world.

Vorlath Tuesday, July 20, 2010 12:00:18 PM

Got a lot done today. Got the framework for indexed list set up. Really tedious. I need to delegate most of the functionality to the primary container because that's where all the data is now. Very annoying. I'm using properties to automatically redirect things. I'm also using a template that implements properties in case someone wants to use standard C++ instead of Borland C++ Builder which has properties built in. The template property class is somewhat annoying in that you need to put a few method calls into the constructor to set the container, getter and setter.

I think this will take me the rest of the week since I can't work on it all the time. A few more hours here and there and I should be mostly done with this.

Unregistered user Wednesday, July 21, 2010 12:54:46 AM

Simon Schlee writes: Take a look at Boost.MultiIndex (http://www.boost.org/doc/libs/1_43_0/libs/multi_index/doc/index.html) it was designed with most of your goals in mind, all four points you listed explicitly. Removeing from only one "view" could be a challenge, but maybe it could be extended to do this too. It supports range searching and conversion of iterators. Anyway it could be a source of inspiration for some implementation tricks you maybe didn't thought of... I used it for a while to have both ID and name mapping, but at some point it doesn't seem reasonable anymore, because I found another solution which avoids multi-indexing. I would say I have similar goals like you with Project V and I agree with you that VM's neither are necessary nor THE solution. The ongoing discussions about the next 'big' programming language, misses the point, because no programming language can solve the problems which are still remaining. From my point of view programming languages are just a bunch of ugly representation of state which gets transformed more than it should(Because it goes all the way down to the machine, with todays languages going up to a more abstract description and than getting compiled from there down to the machine would be more reasonable (to make room for improvement)). Yes there is more, there are parsers, compilers, interpreters and whatever, but this all shouldn't be part of the language. The language shouldn't be more than a lisp-like macro, which maps pure unique data, to some representation which is editable by the programmer. I think one important aspect of this is, that this is a one to many relation, you can map the data to every representation you can think of, but every representation should map to the same data (if the representations are equivalent). And this is what some language designers don't get, if you have a unique data representation underneath anything you can provide easily all views you like, but if you don't you only have representations and to extend the functionality of the language you have to map your new representation to some old representation. You get a many to many relationship and a big compatibility problem. So we really need a system, which supports different languages (text-representations) and SmallTalk-like IDE's, I'm sick of IDE's which operate only on text-representations. This system would naturally support all kinds of computer-related development, by including the language designer as user of the system, we can overcome the limitedness of programming languages and by including the source and the build process of the system into the system we can program it to automatically crosscompile itself, if needed. LISP is able to do many of these things and I see that what I'm working on is in some twisted sense a LISP, but I want something what is even more powerful and I hope to find some even cooler superset than LISP is compared to other languages. This is something Paul Graham seems to work on too. And his arc lisp dialect, is closer to what I want than e.g. CommonLISP. But I want to build my twisted reimplementation of more than half of CommonLisp around my brain, instead of wrapping my brain around Lisp ;) I'm reading your articles from beginning and don't know very much about Project V yet, but i like the direction in which you seem to go. I think nobody knows what he wants completely until he found it and I think the people who are telling "you want that, take this" don't get the greater picture, that it is a search for more understanding, expressiveness and power that drives people to build such systems.

Vorlath Wednesday, July 21, 2010 2:28:05 AM

I'm reading your articles from beginning and don't know very much about Project V yet, but i like the direction in which you seem to go. I think nobody knows what he wants completely until he found it and I think the people who are telling "you want that, take this" don't get the greater picture, that it is a search for more understanding, expressiveness and power that drives people to build such systems.



Project V is a dataflow environment. You will eventually be able to use existing languages for the implementation of different components. Some people have mistaken this for a polyglot tool. It isn't (though you can certainly use it that way if you wish, but that's a very narrow view). If you read some of my blog entries about my discoveries, you'll see that I go into great detail about the different features and advantages of dataflow.

And you're quite right that this is about the search for more understanding. There had to be a better way. And you're also right that too often people will see what they know and project this onto my work. Without going through the process, they cannot see what I see (like the previous example where they assume I'm working on yet another polyglot tool). Thankfully, many have dealt with dataflow and if not, they have followed the progress here. In fact, I've had a few extremely good discussions in private about the inner details of not just Project V, but overall implementation details of any dataflow system. The type system is a unique beast that could fill a book on its own.

Write a comment

New comments have been disabled for this post.

June 2012
S M T W T F S
May 2012July 2012
1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30