Skip navigation.

Software Development

Correcting The Future

What If... C++ Pointer Notation Was Different?

A long time ago... on this very blog, I was trying to create a new programming language that fit a certain set of requirements. We all know this ended up with Project V. But one thing from back then keeps me wondering "what if?". In my theoretical programming language that never was, I had a different notation for pointers.

(Note: I had touched on the subject a while back, but I can't find it. So I'm posting about it again.)

Currently in C and C++, you use the star notation. Each star you use is a dereference. And each ampersand is getting the address of the variable in question. But each of those operators works on whatever type that has been declared for the variable. So getting the address of a pointer variable will get you an address to that pointer (which then points to a variable). In essence, you have to keep track of where you are with dereferences.

What if it worked slightly differently? Declaring pointer variables would still be the same. But what if no star always meant accessing the variable directly regardless of the pointer type. Using one star would get the address of the variable. Two stars would get the address of a pointer and so on.

What this would mean is that you would not need two operators. You would need only one. Dereferencing will be done as required by the compiler all the way down to the variable itself. Getting an address can be done up to one level higher than the declared pointer type.

int a,b;
int *var; // pointer variable.
int **ptr2; // pointer to pointer variable.

// Note the same amount of *'s in the line below.
*var = *a; // set var pointer to address of a.
var = 5;
a = 10; // Overwrites 5.

*b = *a; // error

**ptr2 = **var;
ptr2 = 15; // overwrites 10 in a.


Since a and b aren't pointers, you can only go one level up. IOW, you can use one star, but not two. With var, you can use zero, one or two stars (one higher than it was declared). With ptr2, you can go up to three stars. Also, when you use the maximum allowed stars, it is read only. That's why writing to "*b" would cause an error.

I think this kind of code would be much easier to deal with. When a function needs a pointer, you know that the variable needs one star. No questions asked. Something else that would happen is that arrays would simply be accessed by index into the list of variables. The compiler probably shouldn't allow a[1] for example. It would technically be valid syntax. But the declaration would tell the compiler that the extra items haven't been allocated. To get access to a pointer in an array of pointers, you would do *ptr2[index]. Again, no star means data and one star means a pointer, etc.

The real reason behind all this is to keep things consistent. With pointers, you would now essentially be creating aliases. And you don't need to know all the weird details of pointer arithmetic for most tasks. "*var = *a" would become the standard way to set an alias. Now you can use 'var' as if it was 'a' itself, except it's done through a pointer. In fact, you can create aliases at any level such as is the case with "**ptr2 = **var". Since ptr2 aliases to the pointer of 'var', it also aliases to the variable of 'a'. This is why you can do "ptr2 = 15;" which is the same as "a = 15" by the compiler doing a double dereference.

What's more, the ampersand isn't used to get the address of a variable anymore. So the current way that C++ handles aliasing could be kept as is. In fact, the ampersand would only be used for this style of aliasing. When combined with the new star notation, you would have to use the same amount of stars as found in the declaration.

int& *var2 = *var; // OK
int& var3 = var; // Error, 'var' was declared as '*var' with one star.


Passing arguments by reference would use this same restriction though temporaries can be created by the compiler as is currently the case.

An example using arrays.

int array[1024];

int *ptr = *array[0];
int total=0;
for(int i=0;i<1024;i++,(*ptr)++)
{
  total+=ptr;
}
cout << total;


You could also do funky stuff like set up a triangular array using a vertical pointer to pointer array and then allocate each horizontal array independently. When the triangular (or irregular) array is set up, you access it like a normal array "total+=triarray[row][column]". This is presently impossible without using pointer syntax such as "(*triarray[row])[column]".

If someone were to implement everything described here, would there be any glaring disadvantages or perhaps something that wouldn't work? I wonder what the C and C++ world would be like if this syntax were used instead. Maybe it would be minor. I don't know. But I can't help but believe that it would have made pointers much easier to deal with.

(edit: Wanted to mention that declarations would also make more sense. The way it is now in C and C++, you can assign a pointer in the declaration even though there is a star which normally indicates a dereference.)

(edit2: One problem I've noticed is that you wouldn't be able to dereference a pointer returned from a function. You'd have to store it and then use it. Unless the programmer can decide what pointer level he wants to use. So func() would try to access the variable no matter what. *func() would try to access the pointer. That would work I guess.)

Writing Multithreaded Programs Can't Be Done In C - SpolskyEasy Convex Hull Construction

Comments

vladas 22. October 2009, 07:19

Huh. I always been wondering why C needs '&' operator. What you propose is always I thought it should be instead...

Maybe it's because I came to C from plain Assembler?

Vorlath 22. October 2009, 14:13

Actually, I believe C gets the way it works from how assembly does it. But since you can usually only grab the address or memory location in assembly, it actually makes sense to do it this way because assembly has no concept of types. However in C, it's like they never bothered to use type declarations for pointers at all.

However, you do have a point that assembly only has one operator. You only ever dereference if you need to. Getting the address is the default operation so there's no need to ever use a separate addressing mode (though LEA does come in handy).

Anonymous 23. October 2009, 18:39

Anonymous writes:

Perhaps use a macro program (Emacs Lisp, Rexx, etc...) to change C/C++ code written your way to regular code. Or even use the preprocessor. I have no clue how difficult that would be, it might require multiple passes if something references something above it in the code.

Vorlath 23. October 2009, 21:18

I don't think it's possible to do it in macros or whatnot since you need to know the types of the variables. I was simply wondering "what if".

Anonymous 11. November 2009, 14:17

phookdk writes:

I developed more or less the same scheme for my programming language - although don't know if it will make it for the final version, due to other considerations.

The beauty of it really shines in a C++ environment where you have deep hierarchies where you get rid of the ugly "->" i.e.

a.b.c.d = 2 // any of these could be a pointer

Been lurking - reading your blog for some time now - we share a lot of thoughts about FBP.
Looking forward to progress on Project V.

How to use Quote function:

  1. Select some text
  2. Click on the Quote link

Write a comment

Comment
(BBcode and HTML is turned off for anonymous user comments.)

If you can't read the words, press the small reload icon.


Smilies

December 2009
S M T W T F S
November 2009January 2010
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31