Decoupling, Abstraction and Atoms
Saturday, January 14, 2006 3:41:29 AM
Decoupling. What's it about? Basically, it's interdependance between systems. In simple terms, if you have class A that uses class B then that's coupling. Class A can't work without class B. If you change how class B works, you also have to change class A unless there is some standard protocol. And now we get into the area of decoupling. If there is a common interface between the two, then you could theoretically change class B and still have it work with class A as long as the interface is still there. So the coupling is now on the interface and not class B itself.
There is another more serious kind of coupling and that is when a class uses another object out of its own accord. Ie. via a singleton or some global mechanism. There is no way to accurately determine what objects are coupled with what other objects. There is no interface to use in this case. A redesign must be done (which may include interfaces).
Decoupling is related to abstraction in that both of these terms usually deal with objects although it can be used for functions as well. So abstraction is simply a way to organise your data and code in a more 'usable' format. The 'object' in Object Oriented languages is that abstraction. Instead of using primitive types and functions, now most everything is an object and has a different way to do things. In an optimal world, each object should be independant from others (decoupled). But in practice, an object that isn't reliant on others isn't very useful.
Ok, oversimplified discussions aside, what does all this have to do with anything? My suggestion is that code and data need to be decoupled. Also, objects and primitive types should be decoupled. Primitive types are atoms. Atoms hold a special place in programming languages and computing because the computer can only operate on these primitive types. Now objects should still be able to hold state information. So what exactly do I mean by decoupling code and data?
First, we have to accept the fact that data on its own does have its uses. They do not always have to be placed in an object with methods. An object that only contains data should be acceptable. Then there should be other objects that operate on this data. So we need two categories of objects. One for data and one for computations. In this way, we can pass data along to different objects for processing. If you've used Linux or Unix in general, then you're familiar with command pipes. After one program is done processing the data, it passes it on to the next in line. Why this obviously useful methodology has all but been adandoned is beyond me. It has proved itself for over 20 years. And realistically, in the world of the Web, many of the more common server side code is passing session information from one request to the other. But it is not seen in that fashion for the most part.
Actually, if you look around in specific niches, the idea of pipelining is coming back in full swing. It lends well to the idea of multiprocessing if there is a lot of data to process. Each program can be run at the same time as long as they stream the data instead of processing all data at once before passing the results along.
There is a very good reason why this works so well. It is simply an extension of how hardware processors work. Instructions basically read data in, process it and then passes that along to the next instuction. CPU manufacturers have taken advantage of this pipelining in their hardware architechture to provide much speed improvements.
Notice in all cases, the decoupling of code and data? While each 'program' does have state information as it processes the data, the data itself is seperate from the software. How many times do we pass along the entire object instead of decoupling the data? The idea was that the object was supposed to protect against corruption and mishandling the data. I think this is a backwards view. If the object is the only thing that can 'handle' the data, then only the object can provide functionality to manipulate it. So we subclass the object and add our extra functionality. But what if that functionality can be used elsewhere? See the problem with coupling data and code? I see this all the time and is one of the major problems with the OO paradigm IMNSHO.
Take a list for example. The list data structure should be seperate from what acts on it. A real world example is a user class as in my backgammon server. Everything revolves around this class. In fact, I can't think of anything the server does that doesn't act or require a user. Because everything needs this class, making its members private seems like a useless overhead. If C++ had provided properties, I could have specifed certain properties as read-only and provided direct access to the socket mechanism to the user's input and output buffers and socket handles for example. But there are no properties in the C++ standard, so I ended up doing the worst of all possible options in an OO world. I made all members public. GASP! Other classes were better handled such as the game rules class and RNG classes where these can be mostly self-contained. But there were no appropriate constructs for what I wanted to do overall. Writing get and set methods is unacceptable. I'm not sure who thought of this, but it's just awful.
So what should have I done? I've given a lot of thought to this and I don't know that there is a solution. Even properties would not have been enough. The socket handling mechanism needs access to the input and output buffers of each user as well as updating its state which can then lead to processing user message (again pipelining). Do I make the socket mechanism a friend? What about the message processing facilities? EVERYTHING updates a user. There seems to be no consistent and clean way to handle what is essentially coupling. No, what would have solved all my problems would have been to seperate all data from the user class. The user class would provide basic initialisation and functionality. But the data part of the user could be passed along to whatever part of the system needed to update it. The only thing missing is a standard way of doing these updates. Sometimes I think the worry of corruption or mishandling the data causes more trouble than it's worth. Primitive types don't have the same restrictions as objects, so are we intentionally giving ourselves more trouble than necessary?
Perhaps having a validation mechanism in place for raw data would be useful. I don't know. What I do know is that coupling is a very serious issue that the OO paradigm fails to resolve. I'll even go further and say that it makes it worse.
In fact, I'll go further again and say that most software process data in this pipelining fashion. Video players are an extreme form of this where the video data is passed through many codecs before finally being displayed. This could not be accomplished if objects were used. Web browsers evidently are pipelined just by the nature of sockets and streams. Word processors are pipelined when doing searches or any kind of formatting. I'm hard pressed to find something that is NOT pipelined. If you think about it, in overall programming practices, it would seem logical to pass along this data and have each 'station' process it before passing it along. But is that what we do in OO? No, we extend the class. To me, that's backwards.
I think this is one of the main reasons why OO is giving so many people problems. It doesn't adequately represent what they are trying to do.
Also, objects that contain both data and code is contrary to any real-world system. For example, sure, a car has tons of parts. But these parts pass fluids, electricity, air or force from one part to another. This is how things get done. When fuel goes into the valves of an engine before ignition, this fuel is not protected by a capsule. And the spark plug does not ask this capsule to please take some of this 'spark' and light the fuel for me because I obviously have no clue what I am doing. No, it is passed along in raw format. The sparkplug knows what to do.
This is what I don't like in the current way operator overloading works. For primitive types, at the most basic level, it's the CPU instructions that act on the primitive types. But if you try to extend this with more complex types, it's the types themselves that handle the operators. If you look at it, the operator function acts on data in both objects of the arguments. This is ok in this case because both arguments are the same type. In real situations though, we often pass data along to incompatible classes. So instead of passing along the data, we pass along the object and this results in unecessary coupling.
While objects do work great for certain situations, I think they bring their own set of problems. I propose to decouple data from classes when the need arrives. If you look at objects that handle XML and SQL, you can see what I'm talking about. While this encapsulation is in accord with the OO paradigm, it sure is a lot of hassle. I think all this relates to much of what I mentioned in the last blog entry about requesting "resources" and being able to act on them. Only in this case, you pass it along to the next "process" in the chain as well.
Are these common problems? This coupling of data is my main problem when I use OO. How pervasive is this problem? And should the pipeline methodology make a comeback? I think we have to take a step back and really look at what's happening when we code. What are the real objectives and do they really fit the OO paradigm? Is bare data too much of a risk? Unix has been using it for years. Was I wrong in making my user class public? All that's in it is data. I find it weird that I ended up using the pipeline methodology even though I had to jump through hoops to do it. Interestingly, the system works great because of this decision to make the user class public. Not in spite of it. But it should have been easier overall and I don't like using things that seem to contradict the accepted way of doing things (as far as the spirit of the language goes). What do you say about OO and coupling? And although C++ streams haven't been discussed, I think they fell out of favour a long time ago. Is it related to this coupling issue? I think so. Also, I admitted to something that in the programming world would be looked at very negatively by making my user class public. Have others had to create ugly solutions? In my view, it isn't ugly. It's just incompatible with OO. But it's perfectly acceptable and has been used for ages. And it works great.