What If... C++ Pointer Notation Was Different?
Thursday, 22. October 2009, 01:55:58
(Note: I had touched on the subject a while back, but I can't find it. So I'm posting about it again.)
Currently in C and C++, you use the star notation. Each star you use is a dereference. And each ampersand is getting the address of the variable in question. But each of those operators works on whatever type that has been declared for the variable. So getting the address of a pointer variable will get you an address to that pointer (which then points to a variable). In essence, you have to keep track of where you are with dereferences.
What if it worked slightly differently? Declaring pointer variables would still be the same. But what if no star always meant accessing the variable directly regardless of the pointer type. Using one star would get the address of the variable. Two stars would get the address of a pointer and so on.
What this would mean is that you would not need two operators. You would need only one. Dereferencing will be done as required by the compiler all the way down to the variable itself. Getting an address can be done up to one level higher than the declared pointer type.
int a,b; int *var; // pointer variable. int **ptr2; // pointer to pointer variable. // Note the same amount of *'s in the line below. *var = *a; // set var pointer to address of a. var = 5; a = 10; // Overwrites 5. *b = *a; // error **ptr2 = **var; ptr2 = 15; // overwrites 10 in a.
Since a and b aren't pointers, you can only go one level up. IOW, you can use one star, but not two. With var, you can use zero, one or two stars (one higher than it was declared). With ptr2, you can go up to three stars. Also, when you use the maximum allowed stars, it is read only. That's why writing to "*b" would cause an error.
I think this kind of code would be much easier to deal with. When a function needs a pointer, you know that the variable needs one star. No questions asked. Something else that would happen is that arrays would simply be accessed by index into the list of variables. The compiler probably shouldn't allow a[1] for example. It would technically be valid syntax. But the declaration would tell the compiler that the extra items haven't been allocated. To get access to a pointer in an array of pointers, you would do *ptr2[index]. Again, no star means data and one star means a pointer, etc.
The real reason behind all this is to keep things consistent. With pointers, you would now essentially be creating aliases. And you don't need to know all the weird details of pointer arithmetic for most tasks. "*var = *a" would become the standard way to set an alias. Now you can use 'var' as if it was 'a' itself, except it's done through a pointer. In fact, you can create aliases at any level such as is the case with "**ptr2 = **var". Since ptr2 aliases to the pointer of 'var', it also aliases to the variable of 'a'. This is why you can do "ptr2 = 15;" which is the same as "a = 15" by the compiler doing a double dereference.
What's more, the ampersand isn't used to get the address of a variable anymore. So the current way that C++ handles aliasing could be kept as is. In fact, the ampersand would only be used for this style of aliasing. When combined with the new star notation, you would have to use the same amount of stars as found in the declaration.
int& *var2 = *var; // OK int& var3 = var; // Error, 'var' was declared as '*var' with one star.
Passing arguments by reference would use this same restriction though temporaries can be created by the compiler as is currently the case.
An example using arrays.
int array[1024];
int *ptr = *array[0];
int total=0;
for(int i=0;i<1024;i++,(*ptr)++)
{
total+=ptr;
}
cout << total;
You could also do funky stuff like set up a triangular array using a vertical pointer to pointer array and then allocate each horizontal array independently. When the triangular (or irregular) array is set up, you access it like a normal array "total+=triarray[row][column]". This is presently impossible without using pointer syntax such as "(*triarray[row])[column]".
If someone were to implement everything described here, would there be any glaring disadvantages or perhaps something that wouldn't work? I wonder what the C and C++ world would be like if this syntax were used instead. Maybe it would be minor. I don't know. But I can't help but believe that it would have made pointers much easier to deal with.
(edit: Wanted to mention that declarations would also make more sense. The way it is now in C and C++, you can assign a pointer in the declaration even though there is a star which normally indicates a dereference.)
(edit2: One problem I've noticed is that you wouldn't be able to dereference a pointer returned from a function. You'd have to store it and then use it. Unless the programmer can decide what pointer level he wants to use. So func() would try to access the variable no matter what. *func() would try to access the pointer. That would work I guess.)

