C++: Some Techniques
Wednesday, November 26, 2008 8:08:46 PM
auto_ptr and ++j
The first one is RAII. This one should be used whenever you have local pointers that need to be deleted when done.
std::auto_ptr<NEList> nelst(lst->GetList());
for(NEList::iterator j = nelst->begin(); j!=nelst->end(); ++j)
{
NEntryLink *link = *j;
chain.insert(link->dest);
}
GetList() returns a new list and the auto_ptr will delete it automatically when it goes out of scope. I've heard that there are some problems with auto_ptr in special cases, but for most cases, it makes life a LOT simpler. However, you should try and create normal local instances whenever possible and have functions that return by value. This seems like there would a lot of duplication, but it actually uses the same amount of memory as using pointers without the hassle of pointers.
Also notice ++j. I HATE putting ++ in front of the variable. I think it looks ugly. But I do it because it is faster or equal in speed than putting it behind the variable. It can never be slower than j++. IOW, in most cases, there is no difference. But some objects cannot be optimized and you get duplicates created every single iteration. That's right, j++ creates a duplicate of the stored item every single iteration. Most of the time, this can be optimized, but not always. Best not to risk it. Use ++j;
Return by value
class Region
{
public:
Rect __fastcall RBoundingBox() const;
}
void some_func()
{
// Create region here...
// Get bounding box of that region.
Rect r = RBoundingBox();
}
This code will not create more than one Rect in most compilers. It'll actually pass a pointer to r to RBoundingBox internally. Inside, it will allocate r and call r's constructor in whatever way RBoundingBox() has decided to create the Rect. So it's just as good as using pointers without all of the mess.
One thing to watch out for is how many return statements RBoundingBox() has. If there are multiple exit points in the function in question, they must all return the same local object. You cannot call a different constructor and return different objects depending on some choice (if statement). If you do, then r will be created, a local object inside your function will be created, and then the assignment operator will be used and you'll end up with an extra copy, at least temporarily. But if you create a Rect inside your function and ALL exit points return that SAME Rect, no problem. The Rect that was created inside your function will also be used for r. So both r and the internal object will point to the same location thus avoiding duplication. IOW, if you create ONE and only ONE object inside your function, you can return it by value without any duplication as long as the return value is used to initialize a newly declared object.
So more often than not, using pointers for these types of operations are pointless and only create useless memory management.
Events
When writing code that triggers events, you need to be extremely careful. The first issue is that any code that triggers events must have a corresponding function that does not trigger events. You need to duplicate ALL functionality that involves events. You need two sets of functions. One sets triggers events. The other set does not.
Why is this important? Because internal code will need to update state without triggering several events. You need to update the state of your object, collect information, and THEN trigger the event. So you should write your code where event-active functions call non-event-active functions first, and then activate whatever events are required. Otherwise, you will be triggering an insane amount of events, or events that you don't want to trigger. Or cause infinite recursion.
Just remember to write two sets of functions where one sets triggers events and the other does not, but where both sets are IDENTICAL in functionality except for the triggering of events. Also remember that functions that trigger events can and should call functions that do not trigger events, but not the other way around.
The other issue with writing code that triggers events is when to actually activate that event. When in an event, all other events should be disabled and queued when possible. So if you can, delay the activation of events to the latest possible time. This has several benefits. First, you avoid event triggered recursion. It's simply impossible. Second, you avoid state corruption by not having different events changing state in random order. In any case, you'll find that your code is much cleaner and you'll discover how much easier everything is to maintain by not having runtime spaghetti code. One thing to watch out for are deleted objects. If you queue an event, you must make sure that associated objects are not deleted before the event is triggered. Or if they are deleted, to make sure it's OK to delete the event.
But make sure that delayed events make sense for your task. For example, take events triggered before an action. These cannot be delayed. That's because these events usually allow the event to change the object or allow/deny the action to take place. Delaying those events would be a paradox. Same thing goes for events that happen after the action. If you can modify the object after the action has taken place, do NOT delay the triggering of that event or you'll get data corruption.
Remember that delaying is useful for displaying updates. Notifications like this are used all the time. Stock prices, GUI updates, race car positions, scores, leaderboards, etc. These are all things where you don't want to update the object itself, but rather is used as feedback to the user. In these cases, you don't care WHEN or WHERE in your code the event happens just as long as you get the most recent action. Delaying events simply makes your code cleaner and easier to maintain. Oh yeah, it's also great for batching updates (process all queued events in one go) and not have to continuously update the screen for nothing.
Properties
C++ doesn't have properties. But Borland C++ Builder does. I hope other compilers include this feature. It's just too good to pass up. You can specify an optional read function and an optional write function. You can also specify a member value instead of a function. What this does is avoid the use of redundant get and setters. It also retains the popular concept of encapsulation even though it looks like you're using a member variable. Once you use properties, you will simply never go back. I can't explain in enough detail the experience associated with them, so I'll just show one example for now.
Here is my original code.
class NEntry
{
public:
UnicodeString Name;
NID ID;
}
What I wanted to do recently is add string and ID interning where I cache these values to avoid duplicates. This is very useful because there are places that reference NEntry's by name or by ID. (Note that there is more data in NEntry than shown).
Here is what I did to add interning.
class NEntry
{
protected:
NID *FID;
UnicodeString *FName;
public:
void __fastcall SetName(const UnicodeString &name);
UnicodeString& __fastcall GetName() const;
void __fastcall SetID(const NID &ID);
NID& __fastcall GetID() const;
__property UnicodeString Name = {read = GetName, write = SetName};
__property NID ID = {read = GetID, write = SetID};
}
You still use Name and ID as usual, but now they call specific functions of my choosing. With properties, NONE of your existing code has to change. What SetName() and SetID() do is look into a cache to make sure the values aren't duplicated. GetID() and GetName() are only necessary for dereferencing since the Name and ID are now stored internally as pointers. I could have put the getters and setters in the protected section if I wanted to, and normally one would do so. That way, the interface would remain identical and only local functions could change state without triggering interning functionality.
In my own personal code, FID and FName are actually all public. The reason is what I explained in the Events section. Interning isn't exactly an event in the public eye, but it has the same features. It triggers an action unrelated to the specific task of assigning and retrieving the name and ID. So if we consider it an event, we MUST have access to both event-active and non-event-active members (or properties).
So we see the duplicate set of functionality. FName is non-event-active while Name is event-active. Same thing with FID and ID respectively. Normally, you would put the non-event-active in a protected section (or private if you really like the pain). But the above object is a communication object and helper objects need direct access to the internals. Also, using friends is not very extensible. That's another discussion all together though (See the "Separate primary objects from communication objects" section below for more details).
So consider properties as a way to retain encapsulation for public members if you're into that kind of thing. If you don't have properties, you're stuck with getters and setters and I wish you well in your adventures. It's best if you don't know what you're missing.
Mutable
This one is more of a hack. Ever derive a virtual class from a third party library where one of the methods is const? Usually, it's not a problem. But what if you want to add caching to speed up your code? Can't do it. A cache is pretty useless if you can't update it.
What I do is declare the cache as mutable. That lets your code update your data from within a const function.
class WinFontInstance : public LEFontInstance
{
private:
mutable std::map<LETag,void*> tables;
mutable std::map<LEGlyphID,LEPoint> glyphAdvance;
WinFontInstance();
public:
virtual const void *getFontTable(LETag tableTag) const;
virtual void getGlyphAdvance(LEGlyphID glyph, LEPoint &advance) const;
};
Note how the tables and glyphAdvance members are mutable? That's so the two const functions can update them if need be. Yeah, it's like a hack, but I don't care. I always support the get-shit-done principle.
Private default constructor
Let's take a look at that last example again.
class WinFontInstance : public LEFontInstance
{
private:
mutable std::map<LETag,void*> tables;
mutable std::map<LEGlyphID,LEPoint> glyphAdvance;
WinFontInstance();
public:
virtual const void *getFontTable(LETag tableTag) const;
virtual void getGlyphAdvance(LEGlyphID glyph, LEPoint &advance) const;
};
What's a constructor doing in a private section???
If you know me, you know I HATE the private keyword and have said many times that it should be removed from the language. Well, you caught me. I actually do use it whenever I want the class to be final. Guilty as charged.
But there's another use for private. What if you don't want temporaries of your object to be created? Or what if you want your object to only be created if you specify some options (within constructor arguments)? How would you stop people from creating your object using the default constructor?
Put those invalid constructors inside the private section. This way, you don't need exceptions or any of that crap. You get a compiler error instead. I use this all the time and life has been simpler since. Simple, but effective. If you use a class factory, the friend keyword may be of interest.
Separate primary objects from communication objects
This one is controversial, so I'll try to keep it as short as possible although there is a lot of information to go over.
There are two types of objects. Primary objects and communication objects. Primary objects are objects that do things (classes). This is what most people are comfortable with, so I won't bother explaining it.
Communication objects are things used to communicate between primary objects. Things like primitive types and argument lists. But it doesn't need to end there. You can create your own communication objects. Frankly, C++ has sweet fuck all in this area. Here are the guidelines for creating a communication object and I guarantee you won't see this tip anywhere else.
1. ALL data must be public! (This is a MUST!)
2. Any SIMPLE methods that can be done without other primary objects should be included directly in the class.
3. Everything else should be in another object.
Rule #2 should be clarified. You may include any simple methods that only use the same types for its arguments as those used for members. IOW, you don't use types in arguments that aren't also found as members. All primitive types (or built-in types) are always OK though because they're universal. The reason for this second rule is that these methods create zero additional coupling to other primary objects.
Rule #3 is to simulate what you can do with primitive types. With primitive types, you can manipulate that data any way you want. So you want to have the same flexibility with communication objects, but without having to access the data directly (to keep code maintainable). The way you do this is to have other objects (I call them helper objects) that manipulate the communication object(s). Each primary object may have its own custom unique helper object. This means that each primary object may continue to manipulate the communication objects in any way it chooses and not be bound to a single interface.
Helper objects can be considered as reusable tools for future primary objects. A buffer object could be considered a communication object. Leave it raw and create helper objects that have different functionality. One helper object would be used by the socket engine to insert and retrieve raw data. Usually, you would not need helper objects because this functionality can be accomplished without outside help. But if you need to encrypt or compress the data, then replacing the helper object with one that does encryption would be a snap compared to the alternative. Other helper objects could be used to convert the raw data in the buffer to messages and commands (and vice versa).
With this concept, you achieve separation of concerns while expanding the versatility of communication between primary objects. You're expanding its vocabulary in a meaningful way rather than only relying on function names.
These communication objects should be created for application-wide use or between different systems in your software. Don't create these things for small localized tasks.
And if you're nervous about having exposed data in your communication objects, consider that argument lists are always bare. You're simply shifting some of those arguments into a data structure and providing better tools to manipulate that data. So if argument lists can remain bare, there's no reason objects used for the EXACT same purpose can't also remain bare. The advantages are simply too numerous to mention. Try it out. I guarantee you'll like it.
BTW, if you want a simple example, auto_ptr is a form of helper object. However, it lacks the protection from the actual data that I'm encouraging. Helper objects usually behave much like CD players where you can change the CD (data) that it acts upon. The interface should always be the CD player (helper object) and never the CD (data) itself.
Create interfaces for third party interfaces
This one is weird and I've only recently been a proponent of this. I learned this the hard way. If you use a third party library, then create your own internal interface where all access to the third party interface may only be located in implementations of your internal interface.
You can't normally create a one to one interface. That would be quite pointless. Rather, you need to come up with an interface that suits your application's needs. In the implementation, you can use whatever third party libraries you wish. Just make sure you NEVER EVER use third party library calls inside the main parts of your applications.
Why do this? Because third party libraries come and go. If you change the third party library to another one, you need to change ONE implementation. It may be a colossal amount of work to replace these third party libraries. But it's doable. You've gone from project failure to a situation that requires a fair amount of grunt work. It's really a way to make sure your worst-case scenario isn't complete failure.
Note that this is why Java programmers get infuriated when the official Java library changes. One of the advantages of Java is that you don't need to worry about changing library requirements. That is until it does change. And then they're screwed. Anyone that's worked in Java and had a library change on them knows the pain I'm talking about.
So if you're in C++, you don't have the luxury of never changing libraries (or almost never changing libraries). This tip is entirely up to you to use or not. I'm not even going to advocate for it. The pain that changing libraries bring should be enough for you to decide on your own.
If you have a better way, let me know!
Initialization Lists
This one isn't a tip, but a common C++ feature that beginners overlook. You can initialize everything in constructors, right? Just use assignment operators, right?
What if you have an object member where you can't use the default constructor? What then?
Initialization lists come to the rescue. This is how you choose which constructor you want for your members. Again, this is actual code.
class Region
{
private:
Rect *frame;
bool bRect;
Rect rect;
CRegion region;
public:
__fastcall Region(const Rect &r);
... // more stuff continues.
};
__fastcall Region::Region(const Rect &r) :
frame(new Rect()), bRect(true), rect(r)
{
frame->Fill();
}
Note the "Rect rect" member. It is initialized by using the copy constructor "rect(r)". If I had used an assignment, what would happen is that the default constructor would get called and then the assignment operator would get called. That's wasted computations.
Also, if Rect did not allow its default constructor to be called (by having it in its private section), how would you initialize the rect member? You can't even declare it in your class unless you use initializer lists.
So use initialization lists. They're great!
STL lists with pointers
Ever have a list where you want to store pointers, but where the comparisons should be done on the object itself and not the pointer?
Just use a different comparison routine.
// Functor for comparing derefences instead of pointers.
template <class T>
class DerefCompare
{
public:
bool operator()(T *a, T *b) { return ((*a)<(*b)); }
};
class StringIntern
{
protected:
std::map<UnicodeString*,unsigned int,DerefCompare<UnicodeString> > strings;
public:
__fastcall StringIntern();
__fastcall ~StringIntern();
UnicodeString* __fastcall AddItem(const UnicodeString &str);
void __fastcall DeleteItem(const UnicodeString &str);
};
Note that the strings map will now sort by the actual string rather than by pointer even though we store pointers in our list. There's probably an easier way, but I haven't gone looking and this is rather simple in of itself. And yes, this is the string interning class used by NEntry in the Property section.
Make sure you handle the memory management of individual strings. Since you're using pointers, they must be manually allocated and deleted. That's what the AddItem() and DeleteItem() methods do. The value in the map is an integer that is a reference count. When it reaches zero, the string key is deleted. With AddItem(), if the string can't be found, a new string is allocated and the reference count is set to one, otherwise no allocation happens and the reference is increased.
I guess that's it for now. I know there's more stuff out there that I use, but they escape me at the moment. Share your own tips if you wish.


Luke McCarthyshaurz # Wednesday, November 26, 2008 8:53:32 PM
Even if you don't like using boost you can use the smart pointers from the headers without linking any code (and you can link in the parts of boost you use piecemeal). Some of boost is kinda bloated, for example boost::format added 40K, so I just wrote a wrapper for vsnprintf instead.
Unregistered user # Friday, November 28, 2008 7:05:27 PM
Vorlath # Friday, November 28, 2008 9:06:43 PM
I will try to put up an example about communication objects in the near future. Unfortunately, it's difficult to do because you normally require a rather large codebase to show you how it works. I could write up the socket engine example I guess.
Here are some of my articles that mention the private keyword.
http://my.opera.com/Vorlath/blog/show.dml/58528
http://my.opera.com/Vorlath/blog/2008/03/27/oo-dumb
If there's a specific quote or idea you wish to talk about, please do so. However, referencing someone else without so much of mentioning a reason why *you* like the keyword doesn't really motivate me to check out the book. If the book supported an argument you want to make, then fine. But statements like "How long have you been programming?" only serve to reinforce that you've been indoctrinated to a certain way of doing things without knowing the WHY's. Go read the book again and try and find the REASONS behind why they say the private keyword is good and if they make sense. I would wager they do not make sense. They may sound good. It may appear logical on the surface. But really look at it. You'll be surprised.
I'll say it again. If anyone uses the private keyword, they should be banned from the computing industry forever.