Checking 7-Zip with PVS-Studio analyzer

One of the programs, which allows you to solve the problem of data compression, is a popular file archiver 7-Zip, which I often use myself. Our readers have long asked us to check the code of this application. Well, it’s time to look at its source code, and see what PVS-Studio is able to detect in this application.image1

Introduction

A couple of words about the project. 7-Zip is a free file archiver with a high data compression ratio, written in C, and C++. The size of this project is 235,000 lines of code. It supports several compression algorithms and a variety of data formats, including its own 7z format, with a highly effective LZMA compression algorithm. It is in development since 1999, free, and open source. 7-Zip is the winner of the SourceForge.net Community Choice Awards of the year 2007 in the categories “Best project” and “Best technical design”. We checked the 16.00 version, whose source code can be downloaded at this link – http://www.7-zip.org/download.html

Analysis results.

To do the analysis of 7-Zip we used the static code analyzer, PVS-Studio v6.04. In this article we provide the most interesting analyzer warnings. Let’s have a look at them.

Typos in conditional statements

We see typos in conditional operators quite often. They can cause a lot of pain if there is a large number of checks. Then static analyzer comes to our aid.

Here are some examples of this error.

V501 There are identical sub-expressions ‘Id == k_PPC’ to the left and to the right of the ‘||’ operator. 7zupdate.cpp 41

void SetDelta()
{
  if (Id == k_IA64)
    Delta = 16;
  else if (Id == k_ARM || Id == k_PPC || Id == k_PPC)    //<==
    Delta = 4;
  else if (Id == k_ARMT)
    Delta = 2;
  else
    Delta = 0;
}

The analyzer detected similar conditional expressions. At best, one of the conditions for Id == k_PPC is redundant and does not affect the logic of the program. To fix this typo we should just remove this condition, then the correct expression will be:

if (Id == k_IA64)
  Delta = 16;
else if (Id == k_ARM || Id == k_PPC)
  Delta = 4;

But there may be more serious consequences from such typos, if instead of a k_PPC constant, there should be another in one of the repeated conditions. In this case, the program logic may be broken.

Here’s another example of a typo in a conditional statement:

V501 There are identical sub-expressions to the left and to the right of the ‘||’ operator: offs >= nodeSize || offs >= nodeSize hfshandler.cpp 915

HRESULT CDatabase::LoadCatalog(....)
{
  ....
  UInt32 nodeSize = (1 << hr.NodeSizeLog);
  UInt32 offs = Get16(p + nodeOffset + nodeSize - (i + 1) * 2);
  UInt32 offsNext = Get16(p + nodeOffset + nodeSize - (i + 2) * 2);
  UInt32 recSize = offsNext - offs;
  if (offs >= nodeSize
           || offs >= nodeSize    //<==
           || offsNext < offs
           || recSize < 6)
    return S_FALSE;
  ....
}

The problem is in the repeating condition offs >= nodeSize.

The typos most likely appeared because of using Copy-Paste to duplicate the code. It wouldn’t make sense to recommend not using the copy-paste method. It’s too convenient and useful to reject such functionality in the editor. We should just check the result we get more thoroughly.

Identical comparisons

The analyzer detected a potential error in a construction that consists of two conditional statements. Here is an example.

V517 The use of ‘if (A) {…} else if (A) {…}’ pattern was detected. There is a probability of logical error presence. Check lines: 388, 390. archivecommandline.cpp 388

static void AddRenamePair(...., NRecursedType::EEnum type, ....)
{
  ....
  if (type == NRecursedType::kRecursed)
    val.AddAscii("-r");
  else if (type == NRecursedType::kRecursed)    //<==
    val.AddAscii("-r0");
  ....
}

NRecursedType is defined in the following way in the code:

namespace NRecursedType { 
  enum EEnum {
    kRecursed,
    kWildcardOnlyRecursed,
    kNonRecursed
  };
}

As a result the second condition will never be fulfilled. Let’s try to sort out this problem in detail. Based on the description of the command-line parameters, the -r parameter signals usage of recursion for subdirectories. But in the case of the -r0 parameter, the recursion is used only for the template names. Comparing this with the definition NRecursedType we can draw the conclusion, that in the second case we should use the type NRecursedType::kWildcardOnlyRecursed. Then the correct code will be like this:

static void AddRenamePair(...., NRecursedType::EEnum type, ....)
{
  ....
  if (type == NRecursedType::kRecursed)
    val.AddAscii("-r");
  else if (type == NRecursedType::kWildcardOnlyRecursed)    //<==
    val.AddAscii("-r0");
  ....
}

Conditions that are always either true or false

You should always take into account the variable type – if it is signed or unsigned. Ignoring these peculiarities can lead to unpleasant consequences.

V547 Expression ‘newSize < 0’ is always false. Unsigned type value is never < 0. update.cpp 254

Here is an example of where this language feature was ignored:

STDMETHODIMP COutMultiVolStream::SetSize(UInt64 newSize)
{
  if (newSize < 0)    //<==
    return E_INVALIDARG;
  ....
}

The thing is that newSize has unsigned type, and the condition will never be true. If a negative value gets to the SetSize function, then this error will be ignored and the function will start using an incorrect size. There were two more conditions in 7-Zip that are always either true or false because of the confusion with signed/unsigned types.

  • V547 Expression ‘rec.SiAttr.SecurityId >= 0’ is always true. Unsigned type value is always >= 0. ntfshandler.cpp 2142
  • V547 Expression ‘s.Len() >= 0’ is always true. Unsigned type value is always >= 0. xarhandler.cpp 258

The same condition is checked twice.

The analyzer detected a potential bug, related to the fact that the same condition is checked twice.

V571 Recurring check. The ‘if (Result != ((HRESULT) 0L))’ condition was already verified in line 56. extractengine.cpp 58

Here is a code fragment:

void Process2()
{
  ....
  if (Result != S_OK)
  {
    if (Result != S_OK)    //<==
      ErrorMessage = kCantOpenArchive;
    return;
  }
  ....
}

Most likely, in this situation the second check is redundant, but there is also a possibility that a programmer didn’t change the second condition, and it turned out to be erroneous.

Another similar fragment in 7-Zip code:

  • V571 Recurring check. The ‘!quoteMode’ condition was already verified in line 18. stringutils.cpp 20
  • V571 Recurring check. The ‘IsVarStr(params[1], 22)’ condition was already verified in line 3377. nsisin.cpp 3381

Suspicious pointer handling

There were such bugs in 7-Zip code, where a pointer first gets dereferenced, and only then it is verified against null.

V595 The ‘outStreamSpec’ pointer was utilized before it was verified against nullptr. Check lines: 753, 755. lzmaalone.cpp 753

It is a very common error in all programs. It usually appears because of negligence during the process of refactoring. Accessing by a null pointer will result in undefined behavior. Let’s look at a code fragment of an application containing an error of this type:

static int main2(int numArgs, const char *args[])
{
  ....
  if (!stdOutMode)
    Print_Size("Output size: ", outStreamSpec->ProcessedSize);   //<==

  if (outStreamSpec)    //<==
  {
    if (outStreamSpec->Close() != S_OK)
      throw "File closing error";
  }
  .... 
}

The pointer outStreamSpec is dereferenced in the expression outStreamSpec->ProcessedSize. Then it is verified against null. The check below in the code is either meaningless, or we should verify the pointer in the code above against null. Here is a list of potentially buggy fragments in the program code:

  • V595 The ‘_file’ pointer was utilized before it was verified against nullptr. Check lines: 2099, 2112. bench.cpp 2099
  • V595 The ‘ai’ pointer was utilized before it was verified against nullptr. Check lines: 204, 214. updatepair.cpp 204
  • V595 The ‘options’ pointer was utilized before it was verified against nullptr. Check lines: 631, 636. zipupdate.cpp 631
  • V595 The ‘volStreamSpec’ pointer was utilized before it was verified against nullptr. Check lines: 856, 863. update.cpp 856

An exception inside a destructor

When an exception is thrown in a program, the stack beings to unwind, and objects get destroyed by calling the destructors. If the destructor of an object being destroyed during the stack folding throws another exception which leaves the destructor, the C++ library will immediately terminate the program by calling the terminate() function. Therefore, the destructors should never throw exceptions. An exception thrown inside a destructor must be handled inside the same destructor.

The analyzer issued the following message:

V509 The ‘throw’ operator inside the destructor should be placed within the try..catch block. Raising exception inside the destructor is illegal. consoleclose.cpp 62

Here is the destructor that throws an exception:

CCtrlHandlerSetter::~CCtrlHandlerSetter()
{
  #if !defined(UNDER_CE) && defined(_WIN32)
  if (!SetConsoleCtrlHandler(HandlerRoutine, FALSE))
    throw "SetConsoleCtrlHandler fails";    //<==
  #endif
}

V509 message warns that if the CCtrlHandlerSetter object is destroyed during processing of the exception handling, the new exception will cause an immediate crash of the program. This code should be written in such a way, so as to report an error in the destructor without using the exception mechanism. If the error is not critical, then it can be ignored.

CCtrlHandlerSetter::~CCtrlHandlerSetter()
{
  #if !defined(UNDER_CE) && defined(_WIN32)
  try
  {
    if (!SetConsoleCtrlHandler(HandlerRoutine, FALSE))
      throw "SetConsoleCtrlHandler fails";    //<==
  }
  catch(...)
  {
    assert(false);
  }
  #endif
}

Increment of a bool type variable

Historically, the increment operation is possible for variable of bool type; the operation sets the value of the variable to true. This feature is related to the fact that previously integer values were used to represent boolean variables. Later this feature remained to support backwards compatibility. Starting with the C++98 standard, it is marked as deprecated, and not recommended for use. In the upcoming C++17 standard this possibility to use an increment for a boolean value is marked for deletion.

We found a couple of fragments where this obsolete feature is still used.

  • V552 A bool type variable is being incremented: numMethods ++. Perhaps another variable should be incremented instead. wimhandler.cpp 308
  • V552 A bool type variable is being incremented: numMethods ++. Perhaps another variable should be incremented instead. wimhandler.cpp 318
STDMETHODIMP CHandler::GetArchiveProperty(....)
{
  ....
  bool numMethods = 0;
  for (unsigned i = 0; i < ARRAY_SIZE(k_Methods); i++)
  {
    if (methodMask & ((UInt32)1 << i))
    {
      res.Add_Space_if_NotEmpty();
      res += k_Methods[i];
      numMethods++;    //<==
    }
  }
  if (methodUnknown != 0)
  {
    char temp[32];
    ConvertUInt32ToString(methodUnknown, temp);
    res.Add_Space_if_NotEmpty();
    res += temp;
    numMethods++;    //<==
  }
  if (numMethods == 1 && chunkSizeBits != 0)
  {
    ....
  }
  ....
}

There are two possible variants in this situation. Either the numMethods is a flag, and it’s better to use initialization by a boolean value numMethods = true in this case. Or, judging by the variable, it is a counter which should be an integer.

Checking incorrect memory allocation

The analyzer detected a situation, where the pointer value, returned by the new operator is compared with zero. This usually means that the program won’t behave in the way the programmer expects in the case of it not being possible to allocate the memory.

V668 There is no sense in testing the ‘plugin’ pointer against null, as the memory was allocated using the ‘new’ operator. The exception will be generated in the case of memory allocation error. far.cpp 399

Here’s how it looks in the code:

static HANDLE MyOpenFilePluginW(const wchar_t *name)
{
  ....
  CPlugin *plugin = new CPlugin(
    fullName,
    // defaultName,
    agent,
    (const wchar_t *)archiveType
    );
    if (!plugin)
      return INVALID_HANDLE_VALUE;
    ....
  }

If the new operator was unable to allocate the memory, then according to a C++ standard, an exception std::bad_alloc() is generated. Then the verification against null is pointless. The plugin pointer will never be null. The function will never return a constant value INVALID_HANDLE_VALUE. If it is impossible to allocate the memory, then we have an exception which should be handled on a higher level, and the verification against null may be deleted. In case it’s not desirable to have exceptions in the application, we can use new operator which doesn’t generate exceptions, and thus, the return value can be verified against null. There were three more similar cheks:

  • V668 There is no sense in testing the ‘m_Formats’ pointer against null, as the memory was allocated using the ‘new’ operator. The exception will be generated in the case of memory allocation error. enumformatetc.cpp 46
  • V668 There is no sense in testing the ‘m_States’ pointer against null, as the memory was allocated using the ‘new’ operator. The exception will be generated in the case of memory allocation error. bzip2decoder.cpp 445
  • V668 There is no sense in testing the ‘ThreadsInfo’ pointer against null, as the memory was allocated using the ‘new’ operator. The exception will be generated in the case of memory allocation error. bzip2encoder.cpp 170

Constructions requiring optimization

Now let’s talk about some spots that can potentially be optimized. An object is passed to the function. This object is passed by value, but doesn’t get modified, because of an const keyword. Perhaps it would be sensible to pass it with a constant reference in the C++ language, or with the help of a pointer in C.

Here is an example for the vector:

V801 Decreased performance. It is better to redefine the first function argument as a reference. Consider replacing ‘const .. pathParts’ with ‘const .. &pathParts’. wildcard.cpp 487

static unsigned GetNumPrefixParts(const UStringVector pathParts)
{
  ....
}

During the call of this function we’ll have a call of a copy constructor for the UStringVector class. This can significantly reduce the performance of an application if such object copying happens quite often. This code can be easily optimized by adding a reference:

static unsigned GetNumPrefixParts(const UStringVector& pathParts)
{
  ....
}

Here are another similar fragments:

  • V801 Decreased performance. It is better to redefine the first function argument as a reference. Consider replacing ‘const .. props’ with ‘const .. &props’. benchmarkdialog.cpp 766
  • V801 Instantiate CRecordVector < CAttribIconPair >: Decreased performance. It is better to redefine the first function argument as a reference. Consider replacing ‘const .. item’ with ‘const .. &item’. myvector.h 199

Conclusion

7-Zip is a small project, which has been developing for quite a while, so there wasn’t much chance of finding a large number of serious bugs. But still, there are some fragments that are worth reviewing, and static code analyzer PVS-Studio can be of great help. If you develop a project in C, C++ or C#, I suggest downloading PVS-Studio and checking your project.

By Kirill Yudintsev

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s