Protocol Buffers, a brutal protocol from Google, vs. PVS-Studio, a static code analyzer

Protocol Buffers is a very popular, cool, and high-quality product that is mostly developed by Google. This is a good challenge for the PVS-Studio static code analyzer. Finding at least something is already an achievement. Let’s give it a shot.

I am writing about Protocol Buffers (protobuf) as part of a long-term series of articles about checking open-source projects. The library implements a protocol for structured data serialization. This is an effective binary alternative to the XML text format.

The project seemed like an intriguing challenge for the PVS-Studio analyzer, because Google takes a very serious approach to the quality of the C++ code it produces. Take, for example, the “Safer Usage of C++” document that has been actively discussed recently. Additionally, many developers use protobuf in their projects – which means the protobuf product is well-tested. Finding at least a couple of mistakes in this project is a challenge that we have taken upon us. So what are we waiting for? Time to find out what PVS-Studio can do!

We have never checked this project on purpose before. Once, three years ago, we examined it when writing a series of articles about checking Chromium. We found an interesting error in a data-checking function and described it in a standalone article – “February 31“.

To be honest, when I was writing my article this time, I had a specific plan. I wanted to demonstrate the analyzer’s new feature – the intermodular analysis mechanism for C++ projects – and what it can do. Unfortunately, this time, intermodular analysis didn’t produce any new interesting results. With or without it – it was all the same, no new interesting analyzer triggers in code. Although this was not surprising. It’s hard to find anything in this project, at all :).

So let’s see what mistakes evaded the eye of developers and assisting tools.

Copy-paste

void SetPrimitiveVariables(....) {
  ....
  if (HasHasbit(descriptor)) {
    (*variables)["get_has_field_bit_message"] = ....;
    (*variables)["set_has_field_bit_message"] = ....;
    (*variables)["clear_has_field_bit_message"] = ....;
    ....
  } else {
    (*variables)["set_has_field_bit_message"] = "";      // <=
    (*variables)["set_has_field_bit_message"] = "";      // <=
    (*variables)["clear_has_field_bit_message"] = "";
  ....
}

PVS-Studio warns: V519 [CWE-563] The variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 163, 164. java_primitive_field_lite.cc 164

This is a classic error that occurred when a developer was copying code lines. The developer fixed some code lines, but missed the other ones. As a result, the code sets the same key – “set_has_field_bit_message” – twice.

If you look at the code above, it becomes clear that, in the else code block, the developer intended to write the following:

(*variables)["get_has_field_bit_message"] = "";
(*variables)["set_has_field_bit_message"] = "";
(*variables)["clear_has_field_bit_message"] = "";

File descriptor leak

ExpandWildcardsResult ExpandWildcards(
    const string& path, std::function<void(const string&)> consume) {
  ....
  HANDLE handle = ::FindFirstFileW(wpath.c_str(), &metadata);
  ....
  do {
    // Ignore ".", "..", and directories.
    if ((metadata.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY) == 0 &&
        kDot != metadata.cFileName && kDotDot != metadata.cFileName) {
      matched = ExpandWildcardsResult::kSuccess;
      string filename;
      if (!strings::wcs_to_utf8(metadata.cFileName, &filename)) {
        return ExpandWildcardsResult::kErrorOutputPathConversion;       // <=
      }
    ....
  } while (::FindNextFileW(handle, &metadata));
  FindClose(handle);
  return matched;
}

PVS-Studio warns: V773 [CWE-401] The function was exited without releasing the ‘handle’ handle. A resource leak is possible. io_win32.cc 400

Before the function exits, the FindClose(handle) method call must close the handle file descriptor. However, this does not happen if UTF-8-encoded text fails to convert to UTF-8. In this case, the function exits with an error.

Potential overflow

uint32_t GetFieldOffset(const FieldDescriptor* field) const {
  if (InRealOneof(field)) {
    size_t offset =
        static_cast<size_t>(field->containing_type()->field_count() +
                            field->containing_oneof()->index());
    return OffsetValue(offsets_[offset], field->type());
  } else {
    return GetFieldOffsetNonOneof(field);
  }
}

PVS-Studio warns: V1028 [CWE-190] Possible overflow. Consider casting operands, not the result. generated_message_reflection.h 140

Two int type values are added and placed into the size_t variable:

size_t offset = static_cast<size_t>(int_var_1 + int_var_2);

It is assumed that in case of a 64-bit build, the sum of two 32-bit variables can exceed the INT_MAX value. This is why the result is written to the size_t type variable that will be a 64-bit variable in a 64-bit application. Moreover, since adding two int values can result in overflow, the developer uses an explicit cast.

However, this explicit cast is used incorrectly. And it does not protect from anything. The implicit cast from int to size_t would have worked without it. So the code does not differ in any way from the following:

size_t offset = int_var_1 + int_var_2;

I assume that, by accident, the developer placed a parenthesis in the wrong spot. Here’s the correct code:

size_t offset = static_cast<size_t>(int_var_1) + int_var_2;

Null pointer dereferencing

bool KotlinGenerator::Generate(....)
{
  ....
  std::unique_ptr<FileGenerator> file_generator;
  if (file_options.generate_immutable_code) {
    file_generator.reset(
        new FileGenerator(file, file_options, /* immutable_api = */ true));
  }

  if (!file_generator->Validate(error)) {
    return false;
  }
  ....
}

PVS-Studio warns: V614 [CWE-457] Potentially null smart pointer ‘file_generator’ used. java_kotlin_generator.cc 100

If the generate_immutable_code variable equals false, then the smart file_generator pointer remains equal to nullptr. Consequently, the null pointer will be dereferenced.

Apparently the generate_immutable_code variable is always true – otherwise the mistake would have been detected already. It can be called insignificant. As soon as someone edits the code and its logic, the null pointer will be dereferenced, someone will notice and fix the problem. On the other hand, this code contains, so-to-say, a mine. And it’s better to find it early than to sit and wait till someone blows themselves up on it in the future. The point of static analysis is to find errors before they get dangerous.

Is the parenthesis in the right spot?

AlphaNum::AlphaNum(strings::Hex hex) {
  char *const end = &digits[kFastToBufferSize];
  char *writer = end;
  uint64 value = hex.value;
  uint64 width = hex.spec;
  // We accomplish minimum width by OR'ing in 0x10000 to the user's value,
  // where 0x10000 is the smallest hex number that is as wide as the user
  // asked for.
  uint64 mask = ((static_cast<uint64>(1) << (width - 1) * 4)) | value;
  ....
}

Let’s take a look at this subexpression:

((static_cast<uint64>(1) << (width - 1) * 4))

The analyzer does not like this code for two reasons:

  • V634 [CWE-783] The priority of the ‘*’ operation is higher than that of the ‘<<‘ operation. It’s possible that parentheses should be used in the expression. strutil.cc 1408
  • V592 The expression was enclosed by parentheses twice: ((expression)). One pair of parentheses is unnecessary or misprint is present. strutil.cc 1408

You probably agree that these warnings compliment each other. The shift and multiplication operators are used together. It’s easy to forget which one has a higher priority. And the recurring parentheses hint that the author knew about the ambiguity and wanted to avoid it. But that didn’t work.

There are two ways to understand this code. Version one: the code is correct. In this case, additional parentheses only make it easier to read the code and do not affect anything:

uint64 mask = (static_cast<uint64>(1) << ((width - 1) * 4)) | value;

Version two: the expression contains a mistake. If this is so, the additional parentheses must change the order of the operations performed:

uint64 mask = ((static_cast<uint64>(1) << (width - 1)) * 4) | value;

Conclusion

It’s a good feeling to be able to find flaws in a well-known and quality product – such as protobuf. On the other hand, it’s probably not the best idea to use protobuf to demonstrate static code analysis capabilities :). It’s difficult to show off the tool’s features if the tool can find only a couple of errors :).

Let me remind you that the static analyzer is the most beneficial when used regularly to check new code – and not for one-time checks of already tested projects.

However, you need to start somewhere. So I recommend downloading PVS-Studio, checking your project, and taking a look at the Best Warnings. Most likely, you will see many things that require your attention :).

If your code is of the highest quality – like that of protobuf – I recommend you start using the analyzer as intended. Try integrating PVS-Studio into the development process and see what it can find every day. Wondering how you can do this if yours is a large project? Click here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.