Failing Drastically or Quietly Produce Incorrect Results?
Sunday, January 10, 2010 6:51:26 AM
Yes, it's on older topic from March 2009, but I must have missed it. Here is the specific portion of the comment.
To my way of thinking, a program that fails in a dramatic way, such as an assertion failure or infinite loop, is less harmful than one that quietly produces incorrect results.
I am utterly SHOCKED by this quote. Not because it's out there. But because I agree with it despite countless people who have devoted their entire careers against this principle.
We should back up a little because there are many issues at hand here.
Let's go back to the original issue about goto statements. Andrew talks about two things that Dijkstra talked about. First is that goto's make it more difficult to ascertain the program's state at a particular point, especially at the destination's label because you now have to inspect the entire program to make sure you know of everything that could goto to that label. Second is that goto's make it more difficult to ascertain how far the program's execution has gone.
There is some other talk about continue and break statements. I agree with Andrew that these don't add much complexity. Then again, I was never big on being against goto because I still code in assembly where it's simply not an option to not use goto's.
So goto's are bad because it makes debugging and finding problems with your code much more difficult. Creates spaghetti code and all that. At the risk of another "X is considered harmful" article, what else makes code that much more difficult to finding problems?
There are plenty of things we could talk about, but the title of this article says exactly what's on my mind. Is it better for a program to stop abruptly than it is to continue with incorrect data? This is where the quote at the beginning of this article comes into play. It also fits in perfectly with what's wrong with gotos because they both have the same issues.
Java has taken the opposite view where continuing with corrupt data is better than crashing. In fact, crashing is seen as the worst possible outcome. This was not done because the designers actually believed this (well, maybe they did). It was because newcomers were found to go away if their programs stopped abruptly. It was a shock and a slap in the face. Their program was wrong. The abruptness of it all was too much to ignore. Many people would rather do something else than face that all the time.
This has created what I'm calling the leaky roof syndrome. A leaky roof almost never leaks where the real problem is because most people have something called tar paper installed between the shingles and the sheeting. Tar paper is a product meant to absorb moisture. Aside from the questionable use of a material that absorbs moisture being placed directly on a wooden surface (inducing rot), what happens is that your shingles will leak at a certain spot and the tar paper will bring the water down to a lower spot where it will eventually leak into your home. If you have trouble getting someone to fix your roof, this is why. It's a world of hurt and you may never find where the real leak is actually located. IOW, no matter who you get to fix your roof, they will never know how long it will take. It can actually be cheaper to just to redo your entire roof with new shingles. That's how bad it can get.
If your software keeps running with faulty data, then we have the exact same problem described by the leaky roof problem. By the time you notice the faulty data, you know that the problem lies earlier. But where? I've heard people say that they never have problem with this. I've also seen people who just put the equivalent of buckets in their code hoping to catch the spill. It turns into a big ball of patches.
Java isn't the only language that has this problem. Anywhere that forces you to use exceptions will give you the same scenario. C++ can have this problem even though many claim the opposite. I don't actually mind exceptions though I prefer old style error checking. An old complaint of mine was that you couldn't have multiple return values (without going to pointers). One for the actual return value and the other for the error. With pass-by-reference arguments, you can indeed have multiple return values with little effort. In fact, many of the best API's use some form of this. DirectX uses the return value only for error codes. The actual return values are done via pointers or pointer to pointers (instead of by reference). Despite the extensive use of pointers, it's an incredibly good API. I recently also used the NetCDF API for reading in scientific data and it uses old style error checking as well even though they have C++ wrappers. It too works extremely well. The documentation is actually very good with plenty of examples.
Note that this isn't an argument against exceptions. I like using them in many languages such as Perl or VB when doing database handling for example. Localized use of exception is fine, even beneficial. What I'm saying is bad is anything that will send control to who knows where, effectively reproducing a goto. This is what happens when exceptions must be used everywhere because people tend to not handle all exceptions.
So is getting a core dump better than blank exceptions that keep your program running with corrupt data? Hell yeah! When I get a core dump, it's usually on the exact spot where the error was produced. I very rarely have to go looking around.
Andrew kept saying in his article that there are FACTS of gotos that make them bad. That these facts are not up for negotiation. Only their severity. Well, what *I* am saying is that it doesn't matter what the name of it is called. If it has the same FACTS that makes it bad, then you're going to have the same problems. No amount of opinion is going to convince me that one thing is gonna be different than another when they both share the same problems.
And one thing I can say is that core dumps do NOT have the same problems as gotos. There is no leaky roof syndrome there. What's more, multiple things that don't inherently have a leaky roof syndrome can produce this syndrome when combined together in certain circumstances. That's when you start to see patches applied and your program slowly turning into a big ball of mud. So it's not just specific things you have to be concerned about. It's how the whole thing interacts together. And this is one thing I've been talking about for ages with respect to programming with the execution point. Functional programming is not immune either. Neither is dataflow for that matter if one uses it like how monads are implemented where only one data item may use the network grid at a time forcing components to execute sequentially reproducing what is effectively imperative programming.
The leaky roof syndrome can happen anywhere. When you know what causes it (or simply being aware of it), you can be better prepared to avoid it.


Unregistered user # Sunday, January 10, 2010 10:45:47 AM
Sean Connerspc476 # Tuesday, January 12, 2010 5:17:24 AM
Conversely, a long lived Unix daemon. Same friend is using one I wrote ( http://www.x-grey.com/ ) and when it fails, it just stops. No core dump, nothing. It took me a bit of time to figure out exactly what was going on (thankfully, the logging he did get pointed me in the right direction), but the fact that it affected his email was troubling (the daemon implements a type of spam filter and when it wasn't running, the default action was to accept all emails). It's hard to say what the proper thing to do is in that case.
Vorlath # Tuesday, January 12, 2010 7:45:55 PM
With email, the same thing applies. Imagine an email daemon that keeps running, but ends up discarding valid messages. You would never know it until later when you start wondering why you're not getting email anymore.
But yeah, it depends on the error. The severity is always up for debate. But I'm hard pressed to find a situation where running with corrupt data is better than failing. Maybe in situations where a little bit is better than nothing at all. If your email spam daemon filters some messages and fails to work properly with a lot of other messages is better than nothing. But then you have to wonder if it won't corrupt valid emails or filter out valid emails.
No, I'd still rather have the software terminate, even without a core dump, then keep running. I understand if you see this differently, but the reasoning escapes me. It always has. From my experience, the view of keeping the program running is the common one, much to my dismay.
Sean Connerspc476 # Tuesday, January 12, 2010 10:41:39 PM
Sean Connerspc476 # Tuesday, January 12, 2010 10:43:42 PM
Unregistered user # Wednesday, January 13, 2010 11:59:23 AM
Vorlath # Thursday, January 14, 2010 3:05:54 AM
And yeah, a really bad scenario is when a program terminates and no one knows and no info is given. A segmentation fault without a core dump is frustrating.