To Compiler and Debugger Writers
Monday, 7. July 2008, 22:20:39
Ok, so if you're on a Windows XP box and you're developing something like a compiler or debugger, you're gonna need some tools, right? Tools like an API for loading and unloading modules. Tools that let you inspect symbols and line information stored in the debug section of your object and exe files. Tools that let you query what module, what source file, what line number a certain function that caused a breakpoint or bad memory allocation occurred. Tools that let you walk the stack and display a stack trace. These are things that compiler and debugger writers need to deal with. Even people who just want self inspecting code will need this stuff.
So I'm going along doing all of this. And as a good little boy, I didn't make my own tools though I nearly ended up doing so trying to work around the problems I was getting. I'm using the Borland C++ Builder compiler because it has properties and closures and I don't feel like writing extra code to enable a feature I've been using for ages. So BCB it was. Only problem is that it uses a different name mangling scheme to MS. Not only that, but Borland doesn't output CodeView debug format that Windows debugging API wants. So I wrote some code that converts Borland debug format to CodeView format. I got some of this code online, but it worked badly and didn't work for the newer versions of Borland. I also output line numbers which wasn't available in the original online code.
So I'm good to go, right? I can add this to my code to automatically check for memory leaks and see where each and every allocation and deallocation is made. All this isn't so much for the final product of Project V as it is a standard set of tools I've used for ages on larger projects for personal use to make life easier. It's basically like having a debugger within your application that tells you right away if anything is wrong. And for those who dislike debuggers, this is useful to find out WHERE the error is. Not to fix it.
So I'm going along doing all of this. And as a good little boy, I didn't make my own tools though I nearly ended up doing so trying to work around the problems I was getting. I'm using the Borland C++ Builder compiler because it has properties and closures and I don't feel like writing extra code to enable a feature I've been using for ages. So BCB it was. Only problem is that it uses a different name mangling scheme to MS. Not only that, but Borland doesn't output CodeView debug format that Windows debugging API wants. So I wrote some code that converts Borland debug format to CodeView format. I got some of this code online, but it worked badly and didn't work for the newer versions of Borland. I also output line numbers which wasn't available in the original online code.
So I'm good to go, right? I can add this to my code to automatically check for memory leaks and see where each and every allocation and deallocation is made. All this isn't so much for the final product of Project V as it is a standard set of tools I've used for ages on larger projects for personal use to make life easier. It's basically like having a debugger within your application that tells you right away if anything is wrong. And for those who dislike debuggers, this is useful to find out WHERE the error is. Not to fix it.
The stack trace was working fine. The name demangling was also working fine except when it ran longer than 255 characters (a limitation based on the fact that debug formats require a ONE byte length field, and one byte has a maximum value of 255). So I had to update the demangler to bail out if it reached the end of the string. No big deal. Just put in some extra error checking.
But the line numbers would not show up correctly. I simply could not get the line numbers and the source filename to show up correctly except in a few odd cases. I looked over my code for several days and could not find anything wrong. In fact, I looked at my code so much and refactored it so much that I actually found a few errors in the CodeView specs (unless every app that uses CodeView is wrong instead, but I doubt it). I found code that wasn't exactly right, but still worked. Fixed those too. For code that has existed for years and has seen extensive use, I could not figure out what I had broken.
And there is almost ZERO information online about CodeView, so not much luck there either. Finally, I decided to output every piece of data I had. All symbols, their offsets, line numbers, object filename, source filename, module name, module offset and length, everything. I noticed (pure luck really) that on the odd occasion when an address ended up exactly on a line offset, it would print the source file and line number as expected. But the API says you don't need an exact match. And I've used it in the past and it worked fine. Why now? What is causing this?
So I used Google and entered "SymGetLineFromAddr64 only gets exact matches" and the first result had the answer. Note that the word CodeView is nowhere to be found (which is where the line info comes from).
I reproduce the answer here:
I used http://support.microsoft.com/dllhelp to figure out the exact products
we've got the good vs. bad dbghelp.dll from:
BAD dbghelp.dll: version 5.1.2600.1106
* shipped in XP SP1 -and also what we found in XPE
GOOD dbghelp.dll: version 6.0.7.0
* no match in the DLL database! Whare did we get this? XP Pro, but no SP?
To answer your question - we (would) find the same issue on XP SP1, since
they use the same dll. Our version matches the version shipped with XP SP1.
Guess what version had found its way on my machine?
Dbghelp.dll from XP SP1 is defective. Version 5.1 is BAD! It does not work. I'm guessing it must have come back when I did some reverting when I installed some new video drivers (had some driver issues where the machine would no longer boot and had to rollback some changes).
Now get this. I install all sorts of tools all the time. I have the Debugging tools from MS. But the dll's don't install in the system directory. So I copy the one from the debugging tools into my system directory and everything works great. PROBLEM SOLVED!
This is what kills me about third party products. Get this... I now have my own code to read and write OMF object files (Borland obj format and debugging info). I also have my own code to read and write CodeView (NB09) .dbg files. So what is it that dbghelp is doing for me? It's retrieving data that my own code put there in the first place. It would have been easier and faster to just read the original exe and obj files and write a few routine to query that data (which is mostly available anyhow). But it wouldn't be "standard". I wouldn't be able to use the OS with all its available tools.
And we wonder why people write from scratch. Also, I was forced to learn about things I'd rather not know about. I don't care about OMF, COFF and CV4 file formats. I'd rather use tools made by people who know this stuff better than I do.
Anyways, now it works fine. My debug code actually works a million times better because of this. It has more error checking than any piece of code I've ever written in the past. It includes code from at least three different sources (people, not files). I've got executables, object files and mangled C++ names all working together from both MS and Borland at the same time. And I couldn't give a damn!
If you want an example of what happens when you have bad code, this is it. Writing stuff at the beginning of a project is easy. Maintaining and updating a project when all those early people are gone is where the real work is at. And this isn't about dll hell. It's about defective code. That dll should have worked. It's a topic I've mentioned before. There is real risk in going with a third party tool. If it gets you there 99% of the way, but that last 1% is critical, then you're in worse shape than if you had written everything from scratch to start off. This is what happened here. I got MOST of the functionality. I could load and unload modules. I could query for symbols. I could do stack traces. But I needed line numbers. That was the whole point of the exercise.
The real agony is having everything work so great up to that 99%. So all of your code is nice, clean and probably very extensively integrated by this point. And although this is OT, this is exactly why I don't use functional languages. They'll get you quite far. It might prove powerful and productive at times until you hit that last 1%.
I'm not sure what can be done to avoid these kinds of problems. I suppose libraries are tougher to write than standalone code. And debugging code (as well as other tools for programmers and for maintaining the code itself) tend to have a lack of interest. Still, that dll found its way in multiple versions of an OS. The more I look around, the more I notice how truly awful the source code is out there. It's never fazed me before. I understand how changes can mess up the best thought out projects.
Entropy in action. If software development is going to go against entropy, it needs a system where that can be accomplished. Where it can evolve. I don't see that happening. And just to bring the point home, notice how I wanted my code to be self aware if there are any problems. If there was a system that took that idea to greater use, dll's could be automatically checked against other versions to see if they're defective or different in any significant way. But that would require your machine to learn. To retain information on what it is capable of doing and also of being able to adapt and change itself when needed. I'm not seeing ANYTHING anywhere close to these ideas. All I'm seeing are things like "what's the next popular feature in PLs?"
I think adaptation is the way of the future. It must be. It has to be about being able to adapt. The continuous cycle of hardcoding for the DOS, Windows, VM's, the web, single cores, etc. must stop. I think many people believe that they aren't hardcoding at all. I think they believe they are writing reusable code. Actually, I have no idea. I just hope that people will start seriously talking about this. About how to make the machine retain what it has done. For example, a machine can do the same operation a million times and when that software terminates, it forgets everything it just did. Turn that around and I think you'll see 99% of our problems disappear, and that will in turn let us tackle the last 1%. The important stuff.