Can I expect that any “data” pointer in user space programs lies in a safe distance from the addresses 0 and 0xffffffff…, so that I can safely add a small offset to the pointer without checking for an overflow? What is the largest positive n for which I can safely assume that p + n doesn’t overflow when p is a char pointer into a constant character buffer or a dynamically allocated string (on a modern >= 32-bit operating system)?
To avoid confusion: I’m talking about overflow checking, not about bounds checking. For example: If you have a pointer p to the beginning of a string with m chars and you want to access the char at the positive offset i, then you either need to check that i < m or you can check indirectly p + i < p + m. However, in the latter case you also have to make sure that p + i doesn’t overflow, i.e. you have to make sure that p + i >= p.
Update: Ok, p + i is not valid standard C if i > m, regardless of whether p + i is actually dereferenced or whether it overflows. However, the question I’m really interested in is whether there is a small n for which p + n won’t overflow in practice. Answering this question obviously requires some knowledge about how modern operating systems organize the address space.
Update2: It would already be very interesting to hear about any one particular platform, even if it’s not generalizable. Preferably not some obscure embedded one. x86 or Power-based 32-bit Win, Linux and Mac would be most interesting.
The only offset you can safely add to the pointer (and then dereference it) is one that would position the pointer in the block of memory you are working with. This block must have been allocated with new or malloc, or exist on the stack.
Either way, it is guaranteed that the memory exists (and that one past the end of the memory is a sensible address), otherwise you would have got an exception from new, a NULL pointer from malloc or undefined behaviour if you tried to mis-allocate on the stack. In no case do you have to check for overflow.
Strictly speaking the answer is 0 because
p could already be pointing to one past the end of the array and that is what the standard says is valid.
In a specific implementation you might be able to get away with some amount but that is completely implementation defined. There has has been hardware, and maybe is still, that checks operations on pointers within the CPU instructions: if
p points to an array of 2 ints, doing
p+3 will cause the CPU to fail the instruction. On the other hand, on most current hardware you can get away with a lot.
Given the information you’ve provided, the answer is 0.
0 is the only answer that is valid according to the C++ standard.
If you are willing to venture into undefined behavior (hint: don’t), you’ll have to provide us some platform-specific information and wave goodbye to any guarantees about the validity of the state of your application. Your application may still run, but you’re relying on the arbitrary, and possibly changing, decisions made by the OS and compiler writers.
If we know the exact details of your platform (which CPU, OS, which compiler, primarily), then it may be possible to give you an answer that will usually work, as long as nothing changes in either compiler or OS.
But it seems like you’re going about this the wrong way. IF this is so performance-critical like you keep saying, arrange it so that pointer overflows are not an issue.
Strictly speaking, adding anything to a pointer that makes it point past the same block of memory it pointed at before, is undefined. It may work, or it may act really really funky on some architectures. It may overflow at unexpected times, it may cause hard crashes on some. That is why the language simply says “it’s not allowed”, and why we can’t say what’d happen in practice when you haven’t told us the platform it’s running on.
A C++ pointer is not a memory address. That is how compilers typically implement it, yes, but they obey different rules. Some things are legal with memory addresses, according to the CPU instruction set, which are not legal with pointers.
But seriously, my best advice: Take a step back and examine how you can avoid needing to check for overflows.
In C, pointer arithmetic and relative comparisons are defined only within “objects” (not to be confused with C++ objects). An “object” is a variable (of any type) or a memory area allocated with malloc/calloc/realloc. You can compute a pointer to “one past” the object, and this will always work in a standard conforming implementation.
Looking at things at a lower level, a pointer is typically implemented as an (unsigned) integer. The size of the integer is large enough to hold the address of any memory location. The only way for a pointer to overflow is if you exceed the address space, and that is impossible to do while conforming to the C standard.
However, if you are writing low-level code, and ignoring the restriction that pointer arithmetic is only valid within an object, the best way to do this is to take advantage of the knowledge of how pointers are represented. In most modern environments, this then becomes the same thing as checking for overflow with unsigned integer arithmetic.
(Exceptions to that will be things like segmented memory architectures, of the 8086 or Multics kinds, and probably other things I may have suppressed from my memory to
preserve my sanity.)
Of the top of my head, it seems like this is very dependent on:
- the OS
- the underlying HW
- the compiler
As a rule, and only as far as I remember, the stack is allocated way at the top of the linear address space. Considering you have all sorts of runtime libraries running, your actual data most likely isn’t running at the top of that space. Either way, if you exceed the area allocated by
malloc you’ll run into other trouble, and if you exceed your stack frame, you’ll also run into trouble. The bottom line is that I don’t think you have to worry about wraparound from 0xffffffffffffffff to 0x0, but you still have to ensure you don’t go over the bounds of either your static, automatic or manually allocated memory.
This is dependent on two things
- The architecture
- The compiler
So the only way to be certain is to consult the reference documentation for your compiler and arch.
Usually memory allocation should be done in blocks, for example, even if you want to utilize 1 byte, minimum allocation should always be in power of 2, for example, CString in MFC uses 128 bytes, or 64 bytes, by doing this way you can waste some space for sure but also reduce amount of calculation. If your allocation was done on the basis of blocks, then you can use size of block and current pointer value to use maximum number of offset to avoid overflows.
For address overflow, a useful trick as I know used for time camparsion should also works for your situation.
If you platform is 32bits, then each address is 32bits wide with (unsigned long type), you can try following macro:
#define address_after(a,b) ((long)(b) – (long)(a) < 0))
#define address_before(a,b) address_after(b,a)
Then you can compare address as address_after(p+i, p+m) safely only if the |m-i| < 0x7FFFFFFF ( which is common situation). This macro handles the overflow problem very well.