XNU, the MacOS Kernel

BUG OF THE MONTH | Undefined behavior

int
utf8_encodestr(....)
{
  u_int16_t ucs_ch;
  int swapbytes = ....;
  ....
  ucs_ch = swapbytes ? OSSwapInt16(*ucsp++) : *ucsp++;
  ....
}

PVS-Studio warning: V567 Undefined behavior. The ‘ucsp’ variable is modified while being used twice between sequence points. vfs_utfconv.c 298

Macros are very tricky. Perhaps, you’ve already seen our article “Macro Evil in C++ Code“. We usually do not write about warnings on macros. It’s difficult to work with them without knowing the project’s codebase.

However, this case turned out to be a little bit easier. Although to find the reason for this error and expand the macros chain, we had to fall down the rabbit hole. Actually, the chain begins with the OSSwapInt16(*ucsp++) expression.

Then, we realized that there was an easier way. We just opened the .i file that remained after the project’s check. So, the line with this macro unfolded as follows:

ucs_ch = swapbytes
? ( (__uint16_t)(__builtin_constant_p(*ucsp++)
   ? ((__uint16_t)(  (((__uint16_t)(*ucsp++) & 0xff00U) >> 8)
                   | (((__uint16_t)(*ucsp++) & 0x00ffU) << 8)))
   : _OSSwapInt16(*ucsp++)))
: *ucsp++;

Above all, this section of the expression draws our attention:

  (((__uint16_t)(*ucsp++) & 0xff00U) >> 8)
| (((__uint16_t)(*ucsp++) & 0x00ffU) << 8)

None of the operators in the expression is a sequence point. As we don’t know exactly which of the arguments of the | operator will be evaluated first, the value of *uscp is undefined.

For V567 diagnostic, PVS-Studio provides highly detailed documentation. If you wonder why such code can lead to undefined behavior, start with the documentation to explore the problem.

It’s not over yet! There’s a curious and important point. We bet that the programmer planned to increase the value of *ucsp only once. In fact, the value will increase twice. This process is invisible and not clear. That’s why macros are extremely dangerous. In many cases, it’s better to write an ordinary function. The compiler is most likely to perform the substitution automatically. So, no performance degradation will occur.

Please click here to see more bugs from this project.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.