Message Passing
Thursday, 27. July 2006, 03:08:00
When you pass any kind of message, isn't the destination expected to understand the message? Unless you're trying to talk to aliens or something, you usually transact with people or machines that understands your messages. When you talk to people, you expect them to talk the same language as you in most cases unless you're in Vancouver or a Toronto corner store. But the main point is that the message doesn't contain functionality. Not even one iota, whatever the heck that is.
Let's take something even more commercially viable, CD's. When you buy one, they don't give you a CD player with it. Everyone is expected to have their own player. In this way, a CD can be considered a message that goes from player to player. The CD only contains data.
This goes on everywhere. You don't see flying radios when you listen to the morning traffic report. Telephone wires don't carry phones in them. So why do we program that way? Why is the message an object? Should it not be raw data? I understand providing basic functionality like inserting and removing data for convenience's sake. But everything else should already be available to the object sending or receiving the message. Messages in software should not be encapsulated. It should be raw.
The cool thing about this is that messages can be expanded to contain lots of different information. The receiver can use dispatiching or use corresponding helper classes to actually do things with the message.
And this is just common sense. If two entities are to communicate, they must have some common understanding of basic messages. The message itself cannot provide this understanding because it is the source and destination that has to USE the information. It cannot be delegated. So providing functionality within a message is contrary to common use. Yet Object Oriented practices claim that everything should be an object. I don't think this is possible. A duality must always be present.


Anonymous # 28. July 2006, 08:14
Well lets think about what having a CD-like message entails: All CD's must conform to a single format (one that every player knows how to play). All players must know the format and every company which produces players has its own implementation of the electronic devices which read and decrypt the information from the CD. If a new format is introduced - like DVD - a new kind of player has to be purchased by everyone in order to play it. This is an example of tight coupling.
But what if instead of a CD you could purchase a CD+player which has a minimal interface to the outside world in the form of an Audio Out connection which you could connect directly to your speaker or amplifier. This would solve the problem of DVD (the DVD+player will come with a Video Out connection as well) because of reduced coupling and this is what OO is trying to acheive.
An object receiving a message in the form of another object is not meant to understand or process the message but rather send the received object (or a collaborator) a new message based on some internal logic. Idealy the last message in this chain of messages would be a simple, parameterless, non-value-returning method which operates on an object's state. So most of the time an object is actually "talking to itself" and simply using other objects to decide what to do with itself...
Vorlath # 28. July 2006, 15:14
About your tight coupling, what do you say about interfaces? With a new format, you need to decide if you will replace it or enhance the existing format. I've almost never seen a case where you want to remove an old format unless you're basically rewriting the application. Usually, you want to add things to the existing format. What better way to just add it to the existing data structure and still have the old code work the way it used to. For objects to use this enhanced format, use helper classes. That way, each system can have only the helper classes it needs without affecting any other part of the message. This is a real boon when you're trying to separate different functionality affecting the same message.
I disagree with the notion that an object isn't meant to understand or process the message. This is EXACTLY what it's supposed to do. Otherwise, it is nothing more than a filter that directs messages to one place or another. While this does have uses, it falls under the "moving" category I mentioned above. It's a message only in the sense that a delivery person sees a message. The real message comes into effect when it arrives at its destination to be used by the receiver. Unless it's to be placed in a list of some sort, the receiver MUST understand the message.
The idea behind encapsulating the message is a tempting one because we think that hiding the details will help us, but I've NEVER, not once, found this to be true. I use helper classes. For example, my backgammon server uses global helper classes for message passing between the server and clients. It's great because anyone that needs to read or write messages can use the helper class. If you want to add functionality, you can extend the helper class and then everyone has access to it, or you can have parts of the message that only certain objects can use by using two separate helper classes. It has much better "safety" than this private, protected BS, and it's impossible with methods alone without a lot of setup with access schemes.
I kept wondering why messages where the data was public was so much easier to program for. It's because encapsulation is a lie when it comes to message passing. If I am to use the message, I must understand it. I can't send methods over the Internet. I can only send data. This is the way it works everywhere. When we start to take advantage that on a single computer, we can provide functionality together with the data, we get caught when it isn't possible to send both. This will become more of a factor when multi-core and distributed programming starts to take off. In these scenarios, you don't want to group methods or functionality with the data. Because then you start shipping methods that you may not need to the destination, if at all possible. Helper classes has the advantage that they act like independant modules. You can request only the functionality you need and is not coupled to anything else, not even the data it acts on, other than its format.
About your last paragraph, how is an object meant to do some "logic" if it can't understand the message (other than routing)? I understand the cleverness of using polymorphism so that you can retrieve information about the message no matter what it is, but I'd rather see polymorphism with the helper classes. Unfortunately, most (all?) static OO languages have no support for this. I also REALLY dislike the fact that messages that are ojects can themselves act on other objects. I've seen this all to often where messages end up registering themselves for receiving events and then this very message would initiate the message passing mechanism. This is very dangerous. But it's done everywhere.
I can find nothing good about passing off encapsulated messages. In fact, I've seen situation where using them makes the application impossible to complete.
Anonymous # 28. July 2006, 17:44
Firstly, I think we might be using the term "message" in a slightly different context. I'm using the term message in its OO sense which meens the passing of control from one location (in the code) to another - mostly done by a method call. The passing of data over a network is a different form of message which may or may not be compared to an OO message. I agree that methods cannot be sent over the network but if you have a class which is deployed both on the client and on the server, then it can actually send itself messages over the network and still keep its internal format hidden. I guess this is what your global helper classes do only they violate encapsulation because you're working with data-structure objects.
Coupling to an interface and not to a data-structure is what OO is all about... An interface method which accepts an interface as a parameter can be extended in many ways of polymorphism or the addition of methods. Extending a data-structure usually entails making _specific_ changes in more than one place in the code - which is exactly what encapsulation helps avoid. Maintaining an encapsulated data-structure localizes specific changes to one place, while additional changes, which may or may not be required, are decoupled and so can be maintained seperately.
The way objects can operate when receiving a message containing other objects is either change their internal state, route the incoming object or objects to a collaborator, send a new message to an incoming object or perform some static behaviour. Understanding the message in the form of extracting data from received objects should be avoided.
I ask you, what keeps you from encapsulating your messages inside an object?
Vorlath # 28. July 2006, 18:53
I already explained why this encapsulation MUST fail in the case of messages. Both the sender and the receiver MUST understand the message. Having get and setters is not the issue. But most people go beyond this and add functionality. If there is an interaction to be done between the receiver and the object, then the receiver must know the internals of the message (or with get/set), OR the message must know the internals of the receiver. There can be no interaction of data unless one of these objects has access to both at the same time and understands both. I think this is generally accepted in the real world. I don't know why we think programming works differently. And my experience has been that this precise problem is the root of most software headaches.
That's why helper classes are cool, because you can extend them in whatever system need to interact with the actual message (or as you correctly point out are the arguments to the command/method). In any case, helper objects usually take the place of get/setters. The information extracted is raw data, or in equivalent form taht the receiver knows how to manipulate. That's my main point.
I'd just like to mention something about your network message passing because it's especially common in the programming field. The distribution of the class on both machines is indeed possible beforehand (or even during execution by shipping the code), but this only goes to show that both the source and the destination must understand the raw data. In this case, it is a self-contained solution. But more to the point would be data from two or more sources that need to interact. When it's one to one, you are correct that you can produce some "nice" or "well designed" code for your architecture. However, encapsulation is not there as both the sender and receiver understand the data. Take C++ for example. It is notorious for being ambiguous about making two different objects interact, especially when it comes to operators. Java has taken this out almost completly except in special cases.
There is no scenario that can achieve encapsulation when it comes to making different types of data interact. One object must understand how to manipulate all of this data together. Just knowing the methods that you can use to manipulate this data is breaking encapsulation because you are only deferring the action by one level. What this level of indirection gets you is the inability to expand on the functionality available to this data without physically changing the object and possibly causing adverse effects in other parts of your application. That's why encapsulation often makes applications impossible to complete. There is too much coupling.
BTW, helper classes avoid making _specific_ changes everywhere and also helps avoid coupling in areas where it's not needed leading to much easier extensibility. I understand your point of view as it is the common view (from what I can tell). But I think there should be more discussion in the programming community about the adverse effects of encapsulating messages. There seems to be some general avoidance of the reality of this problem.
Anonymous # 28. July 2006, 22:11
The solution I offered for maintaining encapsulation in a distributed environment is not limited to one-to-one situations as a single class can easily be deployed in various locations and communicate with itself to collect all required data. This does not break encapsulation because the format of messages is defined once. You are correct that the raw data sent over the network is understood by both sender and receiver but the important fact is that they are one - changes remain local.
I can see your problem when it comes to operations which act on data encapsulated in two objects or more. To tell you the truth I don't see many algorithms requiring calculations such as "a = b / c + d * e ^ f" - all coming from different objects. Data transfer can never be totally avoided, it should be kept to a minimum. There are several ways to collect the data from different objects - one of which is to pass around a Command object (GoF) that all objects recognize and provide their data to, and which is the able to perform the required operation. Another way is to set up the sequence of message passing so that the object with most data receives the rest of the data as parameters provided by the other objects.
As I mentioned these cases should be extremely rare, execution of most algorithms can be destributed between decoupled objects in a way which doesn't require access to much data in a single code block.
I'm not sure I understand how you use helper classes in your design. Can you provide an example?
Vorlath # 29. July 2006, 20:37
Let's say I have a multi-user game room. I need a list of users and lists of rooms. Each room has its own list of users and its own rules for the game being played. For a simple example, take the socket mechanism. It needs the socket number, its state, input and output buffers. But many things can use sockets, not just users. There are database objects and relay objects that use sockets. Should I make the socket engine recognise them all? Of course not.
So I wrote a helper class that can extract commands as well as insert them (from the buffers). Remember that the socket engine cares not how the information is formatted. I cannot put this is the socket engine. Yet, I cannot put it in the socket info class either because this would lock the buffer into a specific format for the data stored. I can't subclasss it because then I can't place the socket engine in a DLL. I want to make each system as independant as possible.
When you open a file, you get a handle. When you open a window, you get a handle. When you allocate memory, you get a pointer. The same should happen with any module in your application. The socket engine mentioned here should provide a handle or data structure that any object can use. What I did was provide helper classes that could read input buffers and create message info (command + name/value pairs). If I want to use a different format for messages (say for the database only), I am free to do so, yet still use the same socket system.
At this point, you can add delegation to the objects (via subclassing) that use these helper classes if you wish so that is looks more OO like. For example, instead of
"SocketSystem->SendMessage(user->socket_info,msg);"
you can use the more traditional
"user->SendMessage(msg);"
This is but one very simple example. But the uses are everywhere. Game rooms can have different games for example. Each game requires to store some info about each player. Do I make each game recognise my custom user class? Of course not. I store in the user class the data the game needs (usually allocated by the game itself) and the game logic is stored in a helper class.
One thing I found frustrating was that C++ has no mechanism for creating true virtual objects. I would like to be able to associate methods from a helper class as if it were included directly in the object itself. For example, if I have user object A and helper class B, I'd like to be able to use methods of B as if it was in A. This would remove the need for delegation. As the example above shows, I could do something like
"C = MakeVirtual(A,B,A->B_data)"
Now if B has a method
"SendMessage(B_data bdat, char *msg)"
you could just go
"C->SendMessage(mymsg);"
A problem is that if you send a handle, there is usually no backlink to the overall object that uses this object. For example, a user class that contains a socket handle. By passing the socket handle, it is difficult to obtain the user class again unless you do a search or store it in some fashion for retrieval.
But I find these kinds of templates a little odd because it doesn't help when you need more than one piece of data or multiple helper classes. I usually end up subclassing the user class and delegating to the helper object(s) so that it looks like standard OO and can add more specialised usage. What I like is the ability to swap helper classes without affecting anything else.
Anyways, this is where most of my ideas come from. This problem plagues most large projects I've seen. Things get so coupled that they can't change anything. With streaming, I don't have any of these problems. But I think a written language could benefit from something that alleviates these problems.
Anonymous # 30. July 2006, 00:09
I'm having a little trouble understanding what you're trying to do here.
Firstly, I don't see why socket_info and SocketHelper are not combined to a single class. But more important, I don't see the need for either the 'user' or the 'DB' class in this context, the messages being sent are not being formatted by these classes (they are bypassing them altogether) and so should be called directly upon a SocketHelper object...
If messages sent through the user object should be formatted differently from those sent through the DB object then delegation is required. (It would be more appropriate if messages sent to a user object were in a higher abstraction level - like user->Hello() ).
But I fear I may be missing your point... Handles and pointers are not data structures and only provide functionality - which meens they don't violate encapsulation. I think the usage of handles is unnecessary in OO languages. If your boundries are based on DLLs which only export functions you are not doing Object Oriented design at all.
Vorlath # 30. July 2006, 05:03
socket_info is what the socket system uses and the socket system has no use for SocketHelper. SocketHelper enables outside objects to store data such as the "Hello" text message and then get the socket system to send it out. If another object wants to use different formatting for the message, this would be impossible. Well, you could subclass it, but then this class would be locked to one kind of functionality. I often use different sets of functionality on the same data. In fact, this HAS to happen. All computers do is transform data.
I do use delegation for the DB and user objects. They are both delegated to the SocketHelper classes along with other more specialised functionality before and after using the helper class. For example, getting rows from the DB.
I have specialised methods like user->SendStats() depending on high frequency usage. But there's too many client/server messages to each have their own method. You can build your own message (like chat and server commands) and send it. That's where helper classes really shine.
By the DLL example, I only meant that I wanted a certain group of classes (what I call a system like the socket handling mechanism) to be completely independant. I should be able to completely separate it and still keep working. It should not care what is using it or what objects need to use sockets. It does requires a handle to a certain data structure and that's what it uses. Wherever you put this data structure is up to you. The unfortunate thing is that sometimes the objects that contain these handles need to be passed along from system to system. If the socket system only cares about its own handles, how can you pass along the user (or other object) that was using the handle if the reference to the user is no longer available? Having backlinks is nice, but requiring it on all handles is a waste of memory.
I tried using templates and it does alleviate the problem somewhat. But it's a hack more than anything.
Here's the deal. I have a user class that needs functionality that can make X and Y interact as well as Y and Z. My DB class needs (X,Y) functionality too, but not (Y,Z). It needs (Y,K) instead. It also needs (L,K,Y) for output. What I like about helper classes is that you can mix and match functionality. I can't do that if the functionality is locked in with the data structure itself. If I separate functionality from the data it acts on, I can mix and match what I need.
You need to use input/output buffers with name/value pair lists from messages, we have that. You need to use name/value pairs for dispatching which may lead to timer events or to the DB, we have helper classes for timers as well as for DB handling. These systems in turn (like DB) will need to use input/output buffers with name/value pairs, but from the database. We can use the same helper class as before. Game system needs to be able to send messages to users, but custom game related messages. Ok, we have that helper class already that lets you build your own messages. Already, this helper class is used three times for three completely unrelated purposes.
Now, let's say I use someone else's database and it doesn't work with my messages anymore. Well, if I were using encapsulation, I'd be stuck or I'd have to manually replace methods in the DB class. With helper classes, you simply build your new functionality for these new messages and replace the old one. The boundary for this functionality is very well defined. Since these are new messages, you're more than likely to have special handling, so you'll have different (and more) methods available to you. Yet, the same socket mechanism is used. Only the format and handling of the messages has changed. And you don't affect objects using the old message formatting.
I couldn't have used interfaces because of new features added. I can't use subclassing because it needs to use multiple data structures (or objects) at once. I can't use encapsulation for the same reasons. But what I am doing is achieving homogeneous grouping of functionality.
Let's break it down to something really simple in OO fashion. A SocketInfo class an a Message class. The Message class is used for constructing or querying info from a client/server message.
Not very exciting and incomplete. Message and SocketInfo are the two main classes. Here's where my "problem" comes in. I want to be able to convert my Message class so that it can be sent out over a socket. To do this, I have to place the data in the out buffer (we can assume the SocketInfo knows how to send this data).
So how do I do that? In OOP, we usually send OO messages/methods to the object so that it may act on itself. This means that I should have a method in SocketInfo that takes a Message object as a parameter. The problem is that I can't do that for the object itself because it's already defined and the socket system has no need of this extra functionality. We don't want to contaminate a perfectly separable module by adding a new class that it doesn't need into this module. So let's try to subclass it instead. Well, what if I want to send and receive not only Messages, but other data as well over the Internet on this very same socket? A subclass locks me into one specific data format. What would be nice is if I could change the class at runtime to use a different set of methods. But I also don't like that SocketInfo knows about the Message class. It should care less about the Message class or any other class. It's not its job. It's real job is handling the buffers and nothing else. It should not know or care what the meaning of the data stored in its buffers is or how it's formatted.
So instead of using switchable vtables (not sure how that would be done anyhow), I use different helper classes along with some delegation when needed. Or I use the same helper classes within different composite objects. But to use this technique, I must not use encapsulation other than get/setters and even then, that gets in the way much of the time.
So for the above example, I would have a third (fifth?) class that is the helper class which understands both of the SocketInfo and Message classes and both their internals if necessary. This helper class can even add its own data structures to the internals of these two classes to keep state or other data. One such use for Messages is that you don't want to keep converting messages that go out to a few thousand client computers. So the helper class actually takes over the Message class' functionality by keeping track if the message data has changed. If so, it will rebuild a cached version of its output. Otherwise, it will just use the cached version ready to be stored directly into the output buffer. In this case, it's only about efficiency, but there could be other uses.
Note too that I don't create more than one or two instances of these helper classes per module. I reuse the same ones. The reason is simple. Other modules using the same data will have their own helper classes. So when these modules communicate, they send the raw data only and each module is free to use the functionality they want on this data. Like I said, there are minor complications to this (backlinks). But I rather like it. It's the cleanest and most extensible I've ever designed an architecture.
This also adds to what I said in the past about having each module see the data through its own eyes. To sum up, I need to be able to add functionality without injecting coupling to the existing modules or systems.
Anonymous # 30. July 2006, 08:11
I think you are absolutely right about the kind of seperation you're trying to acheive. SocketInfo should not be aware of Message and most likely Message should not be aware of SockeInfo either. What I'm suggesting is using interfaces to decouple these classes, instead of a helper class.
Pardon my Java...
interface CanBeWrittenToASocket
{
void writeToSocket(Socket s);
}
interface Socket
{
void write(byte[] data);
}
class SocketInfo
{
//...
public void send(CanBeWrittenToASocket me)
{
// call writeToSocket to have the object format itself to a raw byte array
}
}
class Message implements CanBeWrittenToASocket
{
public void writeToSocket(Socket s)
{
//.. format the name-value pairs to a byte array for sending to the socket
}
}
Caching can be used by introducing some kind of intermediate object. New kinds of messages are easy to add without effecting the socket system. A new kind of socket system is easy to add without effecting the Message. If you really need the same Message to be formatted differently in different occations you can either deploy the same Message class and have it "recognize" the environment and change the format accordingly or introduce a Formatter interface the Message class can use for different formats (whatever that can be...).
Vorlath # 31. July 2006, 23:46
The way you solve the Message class by using an interface is exatly what I'm talking about. I've tried something similar to what you're doing and many other ways as can be seen above. I wish there was a way that I didn't have to link together Message and Socket that directly. Subclassing is tempting, but that limits the class to one particular expansion of functionality. I like the previous comments. It's very reflective of current programming trends. Not that it's bad. Quite the contrary. I simply think there is a limation to these languages. I haven't looked to much at others, but if they have a solution, I would likely start using it.
We need ways to introduce extra functionality to existing objects without modifydng them in such a way that they now require external objects to function. Yes, for this specific use, it will require another object for the extra functionality. But it should be optional. When I use the original object, the source should be exactly the same as it always was and be completely independant. Added "plugins" would allow it to achieve this extra functionality, but without having to change the source of the original object. I think that explains it better.
Anonymous # 1. August 2006, 22:55
As far as passing basic data types as method parameters my rule of thumb (which I learned from much more experienced professionals) is to keep it to a minimum. Avoiding it completely is not reasonable especially when it comes to low-level abstractions (like the Message-Socket one). As far as breaking encapsulation I disagree - data is transformed into raw bytes and then transformed back by the same class...
I also think you can use interfaces in C++ in the form of classes with pure virtual functions.
Object Orientation has a lot to do with objects requiring external objects to function. When objects are decoupled extension is usually done by re-implementation of these external objects, without changing the source.
I understand you are trying to acheive this by having external objects extract the data they need from the object and act upon it - but why do you prefer this to simply adding functionality inside the class? All I see is an added dependency, not an improved independence...
Vorlath # 2. August 2006, 00:33
BTW, the dependancy you see is within one project. It is but an illusion. I'm thinking larger scale to multiple projects. If I have class A, I want to be able to use it in multiple projects as is. But if I want to add functionality to class A in ONE of my projects, I should be able to do that without touching class A (because I don't want to couple its definition to anything else). My problem is that this added functionality should be applicable to different classes. There are very real uses for this. I have a system for this and it's the most flexible and extensible system I've ever used. I only program this way now because it's SO much easier.
I'd like to pick at your external objecst and high level objects argument for a second. High level objects can't act on anything. They are simply wrappers. For the programmer, this may be fine to a certain extent, but it locks you into that one specific use. Even with interfaces, it locks you into a request mechanism that doesn't reflect many things in the real world. In the real world, most requests happen asynchronously and the receiver usually opens the package and deals with the low level parts him or herself. Even if you delegate the job, someone must eventually understand the information in the package as well as the information in another package. There's no getting around it. And that's why encapsulation is a lie.