Search

ReCode /

Possible Loss Of Data

If static type checking is useful for detecting bugs, it is an even more powerful refactoring tool. In a well designed program, it is often possible to make a change in an interface, then rely on the compiler to report errors for all places in the code that need to be altered to match. In general, it is A Good Thing to design code that relies on static type checking as much as possible.

But one of the most common problems programmers inflict on themselves is when they try to use the static type checking built in C and C++ compilers to detect bugs due to integer conversions that result in loss of precision.

Let's say in a class called point, we have reasons to store X and Y as shorts rather than ints:

class point
{
  short x_, y_;
public:
  point( short x, short y ):
    x_(x),
    y_(y)
  {
  }
};

void bar( int x, int y )
{
  point p(x,y); //oops?
}

Right, we might have a problem. On most platforms, the int and short types are 32- and 16-bits in size, respectively. This means that the call to the point constructor may result in loss of data.

It would be very nice if we could get the compiler to point out all instances where an int is implicitly converted to a short, so that we can verify that the loss of precision is desirable or at least that it isn't a problem.

So we enable the appropriate warning:

warning C4242: 'argument' : conversion from 'int' to 'short', possible loss of data

What makes this especially desirable is that function (and constructor) declarations are usually separated in a header file. The thinking is, if a careless programmer calls the constructor with int arguments, the compiler will point out the possible problem, the programmer will inspect the code carefully to confirm that the loss of precision is OK, and then... Well,

Then what?

What do you say? Casting? Yes, this situation is a textbook example for using casts:

void bar( int x, int y )
{
  point p((short)x,(short)y); //Trust me, it's OK
}

The trouble is, in the original program, the compiler would have truncated x and y only if they needed to be truncated. The program that uses casts to avoid the warning will truncate x and y even if they don't need to be truncated.

We've changed the semantics of the program in a way that's not just dangerous, it's more dangerous than the potential bug the warning was intended to detect. If during later refactoring we realize that we do need the extra precision of int, the point class will have to change accordingly:

class point
{
  int x_, y_;
public:
  point( int x, int y ):
    x_(x),
    y_(y)
  {
  }
};

void bar( int x, int y )
{
  point p((short)x,(short)y); //Trust me, it's OK
}

And now we don't have a warning but we have a nasty bug, all because we explicitly instructed the compiler to do something bad (cast), something that it's carefully designed not to do!

The source of the bug is the ill-advised attempt to use static type checking to detect what is inherently a run-time error. Converting an int to a short is dangerous only if the run-time value of x exceeds the valid range for short integers. A better approach is to ditch the warning and the cast it comes with, and use assert:

void bar( int x, int y )
{
  assert(x>=std::numeric_limits<short>::min());
  assert(x<=std::numeric_limits<short>::max());
  assert(y>=std::numeric_limits<short>::min());
  assert(y<=std::numeric_limits<short>::max());
  point p(x,y); //Now we *know* we're OK
}

But now... even if we assume that programmers will always assert before calling foo (quite an assumption!) the resulting code is pretty ugly to look at.

To solve the problem, we need to take a step back and look at the original reasons why the point constructor takes shorts instead of ints. Clearly, this is because the values are copied to the members of class point, which are of type short to save memory. Note, however, that the savings come from the point members, not from the constructor itself taking shorts -- the latter was done for "consistency", which A) doesn't make it correct, and B) as we'll see later isn't even consistent.

First, we should make the constructor take int parameters regardless of the type of the (private) members:

class point
{
  short x_, y_;
public:
  point( int x, int y ):
    x_(x),
    y_(y)
  {
    assert(x>=std::numeric_limits<short>::min());
    assert(x<=std::numeric_limits<short>::max());
    assert(y>=std::numeric_limits<short>::min());
    assert(y<=std::numeric_limits<short>::max());
  }
};

This promotes consistency throughout the program, regardless of what type the members of class point are this week. The constructor is also the perfect place to put our asserts, thus completely relieving the caller from the burden of truncating ints to shorts.

Second,

C++ and C are obsessed with ints

Using short or char in any kind of integer expression ends up promoting them to int anyway. This can be illustrated by the following program:

#include <iostream>

int main()
{
  unsigned short x=65535;
  std::cout << sizeof(x) << ", " << sizeof(x+x) << ", " << x+x;
}

The typical output of this program is:

2, 4, 131070

Note that it prints 131070 for x+x, even though x is of type short and (for 16-bit shorts anyway) wouldn't be able to hold that value. That's because x+x is of type int.

As another example, if we pass something like x+1 (which would be pretty common) to a function that takes short, the program will first produce an int which is then truncated to a short as part of the function call (and I'm not aware of a way to make compilers issue a warning in this case, not that it would be desirable anyway.)

The best way to be consistent is to use the consistency of C and C++ to always promote small integer types to int. This means to forget about char and short as function arguments; they are (almost) never justified. The net effect of using them is that the program will contain a lot more places where data gets truncated, which makes it more prone to bugs.

Avoiding char and short function arguments is not ideal, but it is better than using dangerous casts to suppress a retarded warning that should never have been enabled to begin with.

Gregory — 02 June 2012, 04:46

I disagree.

Having Point's constructor take ints then assert then cast just hides information.

Now, your code breaks *eventually* at runtime while the user of the class could have been aware of coordinates range restrictions.

Also, your point about refactoring is contrived as many tutorials, blog posts or text books examples are :) The false promise of OOP: make a class, hide data in private members, provide public accessors. It's just piling up layers of useless code, coding for refactoring scenari that would never happen in real life imho

Emil — 05 June 2012, 23:45

What casts are you talking about, my suggestion doesn't have casts. Removing the asserts won't catch a possible runtime error, but even that is safer than the casts people commonly use to avoid the warning that they think protects them from something.

By the way, notice that standard functions that take or return characters (like getch, tolower, etc.) use int and not char. What'd be the point of returning char anyway? So that foo(getch()) could resolve to a different overload than foo(getch()+1)? :)

Arseny Kapoulkine — 11 June 2012, 14:52

getch() can return EOF. tolower can receive EOF. Oh, and C did not have overloading.

You can make a safe_cast utility function to avoid a hidden bug (assert) and to keep the natural type information.

Emil — 18 June 2012, 13:10

The real reason why getch() et al. communicate in terms of ints is that there is no static type checking in K&R C.

Anyway, do I understand correctly that you're proposing to use some kind of safe_cast every time you call a function that takes a short? Like:

void foo( short x );

void bar( int x )
{
  foo(safe_cast<short>(x));
}

?

Add comment: 
Sign as author: 
Add 1 to 607 and enter it here: 

Formatting hint: when posting comments, surround code blocks in [@ and @].