Containers with Multiple Lists
Monday, July 19, 2010 11:52:59 AM
This would require 4 different lists within the same container. I originally had a class that had 4 STL containers inside. But it was getting tedious, cumbersome and error prone. If I delete an iterator, I want that element gone from all lists. I also wanted a way to convert from one iterator type to another. My current list has no iterators at all. I duplicate the list and return a deque. From that, you can do what you want. It's also always ordered by index. You can search for an element by ID, name, pointer or index. But it returns the element itself. You can't traverse the list as you would an STL container.
It needed replacing. It grew and grew over time into a monstrosity that just wasn't good enough. And don't mistake this for shoddy programming. I could not find anything anywhere that does what I wanted. This is why I had to write my own container.
I've finally figured out how to have multiple lists within the same container and still have it work well with STL.
First off, you can no longer use maps. Instead, you use a set (the SkipList equivalent that I wrote) with a template parameter for the key type used in the search. You also use another template parameter to specify the comparison functor. When you call find(key_type key), it will use your functor during the search with the key argument. This means that the key must be held within the element itself.
That leaves only indexing. You can use autoindexing with a set. Or you can use manual indexing. The first version will automatically index your elements as you insert them. They will be sorted by the key and the order they appear in the list will determine the indexes. This is how autoindexing is done. The other version is where you insert an element at a specific location that you specify directly. You can use either one. Not only that, but you can use as many different indexed lists as you want.
The issues I had were as follows.
1. Insertion (into all lists)
2. Deletion (from all lists)
3. Converting between iterator types.
4. Creating the nodes.
The way this compound container works is that you derive from a template class that I wrote. It will contain basic stuff like the list itself, the number of items and other things related to skiplists. Everything works with skiplists, so you have log(n) performance on everything.
When you derive the template class, you must specify how many different lists in total you want and how many of those lists will be indexed. This will create the node type for you. It will make sure to create one set of forward and backward pointers for each list. It will also create the skip count storage for indexed lists. All done through templates.
The next step involves inserting all the different container types you want to use for each list. These are predefined templates that will create the container and iterators for each specific type of container you want to use. The last thing you need to do is include each container in a predefined vector. This is because each container needs a pointer to the primary container and this will be handled automatically by the constructor in the template. However, you must include each container in the list manually in a virtual init method. I'm hoping to find an easier way, but for now this will do.
At this point, you can start using your compound container. To use it as an indexed container, simply use the methods in the template that you declared for that purpose. And for other kinds of container, then use the respective template that was declared.
I thought I'd have a problem with deletion, but since I now have access to all the lists, I can delete the node from all lists at once. Insertion was very troublesome at first. But I think I have solved how to do it. You can use any internal container to insert an element. It will order them each according to the key found within for each of the comparisor functors. Since all internal containers are in a vector, I can make sure that all keys are handled properly. For manually indexed lists, the element will be added at the end unless you used the insert method of an indexed skiplist container. This way, you can specify the index you want to use for that list. However, you can only specify the index for ONE and only ONE indexed list when inserting an element.
This leaves one other problem. Normally, you remove an element and then insert it into its new location. But with multiple lists, removing an item removes it from all lists. So for manually indexed lists, I will provide functionality to MOVE an item from one location to another. It will also support moving ranges. And this is the drawback with this compound container. You should not be assigning values to the elements. You should not swap elements or alter the elements in any way. Especially not any keys found within the element. So any algorithm that re-orders the list cannot be used. I will try to write a sort algorithm that can work with this new compound container. It's the forward and backward pointers that need to be updated, not the element itself. This way, I can update individual lists without altering any of the other lists.
About converting iterators, I will provide a base class that will be common amongst them all. I only need the node pointer to be able to convert the iterator. So I'll provide a constructor as well as a new function to convert iterators in each of the internal containers.
If I can get all this to work, and I see no reason why not since I've already got the nodes and iterators done, then I'm really looking forward to the next step. I consider this quite an accomplishment. It's not that it's difficult to have multiple lists within a compound container. It's that it's generic and can still work with STL algorithms that don't alter the ordering of the lists. Some features will have to go unfortunately. Constructors that built default elements will no longer be available. And methods that insert x copies of the same element will no longer be available. Everything else should still be there.
I've heard a lot of people say that when they see someone using SkipLists, it's because they don't know what they're doing and that other containers are always faster. In this case, I don't see any other containers being able to do what I want. I think SkipLists are VERY beneficial here because it allows me to use the same node type for all the different container types.
Here's an example of how I want this to go.
class MyNode
{
public:
UnicodeString name;
NID ID; // Unique ID
int value; // Some data.
};
// I want four lists.
// 1. Sorted by name (non-unique).
// 2. Sorted by ID (unique).
// 3. Sorted by index.
// 4. Sorted by pointer to node (MyNode*).
// Indexed list must come first when putting them in the array.
// I have 4 list and 1 of them is indexed.
class MyContainer : public AggContainer<MyNode*,4,1>
{
public:
typedef MyNode* T;
// Declare the internal containers.
AggIndexedSkipList<T> IndexedContainer;
AggMultiSkipList<T> NamedContainer;
AggSkipList<T> IDContainer;
AggSkipList<T> PtrContainer;
virtual void Init()
{
Containers.push_back(IndexedContainer);
Containers.push_back(NamedContainer);
Containers.push_back(IDContainer);
Containers.push_back(PtrContainer);
}
};
There's one more thing that's missing. For the three keyed lists, we need a comparison functor for each one as well as a functor to retrieve the key from the element. Those functors are passed as template parameters when declaring the internal containers. I left it out to show the basic setup and not clutter the code. However, the default less<UnicodeString> functor can be used for NamedContainer. less<NID> for IDContainer and less<MyNode*> for PtrContainer. As for retrieval of the keys, it's simply a matter of returning the member such as "return node->ID;". For PtrContainer, the key is the node, so no functor needed there.
Here is the full code.
class MyNode
{
public:
UnicodeString name;
NID ID; // Unique ID
int value; // Some data.
};
// I want four lists.
// 1. Sorted by name (non-unique).
// 2. Sorted by ID (unique).
// 3. Sorted by index.
// 4. Sorted by pointer to node (MyNode*).
// Indexed list must come first when putting them in the array.
class NamedFunctor
{
public:
UnicodeString operator()(MyNode *node) const {return node->name;}
};
class IDFunctor
{
public:
NID operator()(MyNode *node) const {return node->ID;}
}
// I have 4 list and 1 of them is indexed.
class MyContainer : public AggContainer<MyNode*,4,1>
{
public:
typedef MyNode* T;
// Declare the internal containers.
AggIndexedSkipList<T> IndexedContainer;
AggMultiSkipList<T,less<UnicodeString>,NamedFunctor> NamedContainer;
AggSkipList<T,less<NID>,IDFunctor> IDContainer;
AggSkipList<T,less<MyNode*> > PtrContainer;
virtual void Init()
{
Containers.push_back(IndexedContainer);
Containers.push_back(NamedContainer);
Containers.push_back(IDContainer);
Containers.push_back(PtrContainer);
}
MyContainer() : AggContainer() {}
};
From this point on, you can use any of the internal containers to manipulate the list and obtain iterators. A special convert() method will be included in each container for the purpose of converting iterators. And since there is only one indexed list, I should use the IndexedContainer to insert any elements as this will allow me to specify where to insert each node in the indexed list. All other lists will be automatically updated.
It takes a little doing to get it all setup, but for what I want to do, this is a million times better than what I have now.


Unregistered user # Monday, July 19, 2010 11:22:59 PM
Vorlath # Tuesday, July 20, 2010 12:43:15 AM
I released a demo a while back that had the basic environment on display. No GUI for Project V though. I then released another version that let you place new interfaces for components on the screen, move them around and resize them. So some people have had a quick taste already. Visually, I'm not that far ahead from that point. I have some lists and a few more buttons. But I'm concentrating on the command line for now and will add in GUI manipulation via the mouse and keyboard (on the network itself like naming an input by clicking on it and typing it in) later on. Right now, you'll use the command line to set the name of the input and connect components together. I think the command line will be useful for scripts later on that generate networks on the fly.
Pretty much everything, yeah. Even in non-dataflow environments like my compound list in C++, I used techniques that are used in the real world. For example, I have nodes that have forward and backward pointers for each list. I did not want to create a set of methods for each list. And neither did I want to specify an index since each internal container will already know which set of forward and backward pointers to use. How to deal with the problem?
I solved it like it's solved in the real world. In carpentry, you often create a jig that will let you operate on materials in a new and unique way. The idea here is creating a tool that will do most of the work for you. So the internal containers produce a templated helper class that will work only on the forward and backward pointers required for that container. It's typesafe. Not only that, but it will not access forward and backward pointers from other lists. And the container has no idea that it's using a helper class rather than the real thing.
This technique often goes unused. It's a shame. It's the difference between asking paper to cut itself (OO) and using a pair a scissors to cut the paper (helper class). Sometimes the first method works. Sometimes it's the second method that's best. Dataflow is more toward the second option which is in line with the real world.
Vorlath # Tuesday, July 20, 2010 12:00:18 PM
I think this will take me the rest of the week since I can't work on it all the time. A few more hours here and there and I should be mostly done with this.
Unregistered user # Wednesday, July 21, 2010 12:54:46 AM
Vorlath # Wednesday, July 21, 2010 2:28:05 AM
Project V is a dataflow environment. You will eventually be able to use existing languages for the implementation of different components. Some people have mistaken this for a polyglot tool. It isn't (though you can certainly use it that way if you wish, but that's a very narrow view). If you read some of my blog entries about my discoveries, you'll see that I go into great detail about the different features and advantages of dataflow.
And you're quite right that this is about the search for more understanding. There had to be a better way. And you're also right that too often people will see what they know and project this onto my work. Without going through the process, they cannot see what I see (like the previous example where they assume I'm working on yet another polyglot tool). Thankfully, many have dealt with dataflow and if not, they have followed the progress here. In fact, I've had a few extremely good discussions in private about the inner details of not just Project V, but overall implementation details of any dataflow system. The type system is a unique beast that could fill a book on its own.