Analysis of PHP7

Sometimes checking a project one more time can be quite amusing. It helps to see which errors were fixed, and which ones got into the code since the time it was last checked. Our colleague has already written an article about PHP analysis. As there was a new version released, we decided to check the source code of the interpreter once again, and we weren’t disappointed – the project had a lot of interesting fragments to look at.

image4

The project to be analyzed

PHP – is a general-purpose scripting language that is intensively used in web development. The language and its interpreter are developed within the open source project.

image2

The new version release – PHP v.7.0.0. was announced on the 3rd of December, 2015. It is based on the experimental branch of PHP which was initially called phpng (PHP next generation), and was designed with a focus on increased productivity and reduced memory consumption.

The analyzed project is the PHP interpreter, the source code of which is available in the repository on GitHub. We checked the master branch.

The analysis tool – PVS-Studio static code analyzer. To do the analysis we also used the compiler monitoring system, which allows doing the project analysis no matter what system is used to build this project. The trial version of the analyzer can be downloaded here.

You can also read the previous article written by Sviatoslav Razmyslov “A Post About Analyzing PHP”.

Bugs found

It is worth noting, that a lot of bugs found by the analyzer are located in the PHP libraries. But if we describe all of them here, the article will become too lengthy. On the other hand, the errors in the libraries will show up during the project usage. That’s why some of them are still given here.

One more thing to point out – during the analysis, there was an impression that the code was almost entirely written with the help of macros. They are just everywhere. It makes the analysis much more complicated, not to mention the debugging process. By the way, their widespread usage did more harm than good, and caused a lot of trouble – the errors in the macros were found in lots of fragments throughout the code. So here is the proof of it.

static void spl_fixedarray_object_write_dimension(zval *object, 
                                                  zval *offset, 
                                                  zval *value) 
{
  ....
  if (intern->fptr_offset_set) {
    zval tmp;
    if (!offset) {
      ZVAL_NULL(&tmp);
      offset = &tmp;
    } else {
      SEPARATE_ARG_IF_REF(offset);
  }
  ....
  spl_fixedarray_object_write_dimension_helper(intern, offset, value)
}

Analyzer warning: V506 Pointer to local variable ‘tmp’ is stored outside the scope of this variable. Such a pointer will become invalid. spl_fixedarray.c 420

In case the condition of the if operator is true, the offset pointer can be assigned with the address of the tmp variable. The lifespan of the tmp variable is limited by its scope, i.e. by the body of if operator. Further in the code we see a call of a function that takes offset pointer as one of the parameters, which references to the variable that was already destroyed; this can lead to an error during the work with this pointer.

Another strange code fragment:

#define MIN(a, b)  (((a)<(b))?(a):(b))
#define MAX(a, b)  (((a)>(b))?(a):(b))
SPL_METHOD(SplFileObject, fwrite)
{
  ....
  size_t str_len;
  zend_long length = 0;
  ....
  str_len = MAX(0, MIN((size_t)length, str_len));
  ....
}

Analyzer warning: V547 Expression is always false. Unsigned type value is never < 0. spl_directory.c 2886

The code logic is simple – firstly, two values are compared, then the smallest of them is compared with zero, and then the greatest of them is written to the str_len variable. The problem is that size_t is unsigned type, and its value is always nonnegative. As a result, using the second MAX macro makes no sense. Only the developer can say for sure, if it is just an extra operation or some serious bug.

It’s not the only strange comparison, there were many others.

static size_t sapi_cli_ub_write(const char *str, size_t str_length)
{
  ....
  size_t ub_wrote;
  ub_wrote = cli_shell_callbacks.cli_shell_ub_write(str, str_length);
  if (ub_wrote > -1) {
    return ub_wrote;
  }
}

Analyzer warning: V605 Consider verifying the expression: ub_wrote > – 1. An unsigned value is compared to the number -1. php_cli.c 307

The variable ub_wrote has size_t type, which is unsigned. However, further in the code we see a check ub_wrote > -1. At first glance it might seem that this expression will always be true, because ub_wrote can store only nonnegative values. In reality, the situation is more interesting.

The type of literal -1 (int) will be converted to the variable type ub_wrote (size_t), so during the comparison of ub_wrote with the variable we’ll have the converted value. In the 32-bit program, it will be an unsigned value 0xFFFFFFFF, while in the 64-bit – 0xFFFFFFFFFFFFFFFF. Thus, the variable ub_wrote will be compared with the maximum value of unsigned long type. Therefore, the result of this comparison will always be false, and the return statement will never be executed.

We came across a similar code fragment once more. The issued message: V605 Consider verifying the expression: shell_wrote > – 1. An unsigned value is compared to the number -1. php_cli.c 272

The next code fragment, that got a warning from the analyzer, is also related to a macro.

PHPAPI void php_print_info(int flag)
{
  ....
  if (!sapi_module.phpinfo_as_text) {
    php_info_print("<h1>Configuration</h1>\n");
  } else {
    SECTION("Configuration");
  }
  ....
}

Analyzer warning: V571 Recurring check. The ‘if (!sapi_module.phpinfo_as_text)’ condition was already verified in line 975. info.c 978

At first sight it might seem that everything is fine and there is no error. Nevertheless, let us take a look at what the SECTION macro is here.

#define SECTION(name) if (!sapi_module.phpinfo_as_text) { \
                        php_info_print("<h2>" name "</h2>\n"); \
                      } else { \
                        php_info_print_table_start(); \
                        php_info_print_table_header(1, name); \
                        php_info_print_table_end(); \
                      } \

Thus, after preprocessing in the *.i-file we’ll have the following code:

PHPAPI void php_print_info(int flag)
{
  ....
  if (!sapi_module.phpinfo_as_text) {
    php_info_print("<h1>Configuration</h1>\n");
  } else {
    if (!sapi_module.phpinfo_as_text) { 
      php_info_print("<h2>Configuration</h2>\n"); 
    } else { 
      php_info_print_table_start(); 
      php_info_print_table_header(1, "Configuration"); 
      php_info_print_table_end(); 
    } 
  }
  ....
}

Now it’s much easier to spot the problem. A condition (!sapi_module.phpinfo_as_text) gets checked, and if it is false, it is checked again (and of course, it will never be true). You would probably agree that it looks strange, to say the least.

A similar situation involving the use of this macro occurred once more in the same function:

PHPAPI void php_print_info(int flag)
{
  ....
  if (!sapi_module.phpinfo_as_text) {
    SECTION("PHP License");
    ....
  }
  ....
}

Analyzer warning: V571 Recurring check. The ‘if (!sapi_module.phpinfo_as_text)’ condition was already verified in line 1058. info.c 1059

A similar situation – the same condition, the same macro. We expand the macro and get the following:

PHPAPI void php_print_info(int flag)
{
  ....
  if (!sapi_module.phpinfo_as_text) {
    if (!sapi_module.phpinfo_as_text) { 
      php_info_print("<h2>PHP License</h2>\n"); 
    } else { 
      php_info_print_table_start(); 
      php_info_print_table_header(1, "PHP License"); 
      php_info_print_table_end(); 
    }
    ....
  }
  ....
}

Again, the same condition is checked twice. The second condition will be checked in case the first one is true. Then, if the first condition (!sapi_module.phpinfo_as_text) is true, the second one will always be true too. In such a case, the code in the else branch of the second if operator will never be executed.

Let’s move on.

static int preg_get_backref(char **str, int *backref)
{
  ....
  register char *walk = *str;
  ....
  if (*walk == 0 || *walk != '}')
  ....
}

Analyzer warning: V590 Consider inspecting the ‘* walk == 0 || * walk != ‘}” expression. The expression is excessive or contains a misprint. php_pcre.c 1033

In this code the pointer is dereferenced, and its value is compared with some literals. This code is redundant. Let’s simplify and rewrite this expression to make it more demonstrative:

if (a == 0 || a != 125)

As you can see, the condition can be simplified to a! = 125.

This may indicate both code redundancy, and a more serious error.

image3

The cause of some issues was Zend Engine:

static zend_mm_heap *zend_mm_init(void)
{
  ....
  heap->limit = (Z_L(-1) >> Z_L(1));
  ....
}

Analyzer warning: V610 Unspecified behavior. Check the shift operator ‘>>’. The left operand ‘(- 1)’ is negative. zend_alloc.c 1865

In this code we have an operation of right shift of a negative value. This is a case of unspecified behavior. Although in terms of the language such a case is not an erroneous one, as opposed to undefined behavior, it is best to avoid such cases, because the behavior of such code may vary depending on the platform and the compiler.

Another interesting bug was found in the PCRE library:

const pcre_uint32 PRIV(ucp_gbtable[]) = {
  ....
  (1<<ucp_gbExtend)|(1<<ucp_gbSpacingMark)|(1<<ucp_gbL)|   /*  6 L */
  (1<<ucp_gbL)|(1<<ucp_gbV)|(1<<ucp_gbLV)|(1<<ucp_gbLVT),
  ....
};

Analyzer warning: V501 There are identical sub-expressions ‘(1 << ucp_gbL)’ to the left and to the right of the ‘|’ operator. pcre_tables.c 161

Errors of this kind are classic. They were and still are, in C++ projects, some C# projects have them and perhaps other languages too. The programmer made a typo and duplicated a subexpression (1<<ucp_gbL) in the expression. Most likely (judging by the rest of the source code), a subexpression (1<<ucp_gbT) was meant to be here. Such errors aren’t really evident in a separately taken code fragment, and in a general mass they are even harder to detect.

By the way, my colleague wrote about this error in the previous article, but nothing has changed in the code.

Another fragment from the same library:

....
firstchar = mcbuffer[0] | req_caseopt;
firstchar = mcbuffer[0];
firstcharflags = req_caseopt;
....

Analyzer warning: V519 The ‘firstchar’ variable is assigned values twice successively. Perhaps this is a mistake. Check lines: 8163, 8164. pcre_compile.c 8164

Well, the code looks strange. The programmer writes the result of the ‘|’ operation to the firstchar variable, and then rewrites it, ignoring the result of the previous operation. Perhaps in the second case, another variable was meant instead of firstchar, but it’s hard to say for sure.

There were redundant conditions as well. For example:

PHPAPI php_stream *_php_stream_fopen_with_path(.... const char *path, 
                                               ....)
{
  ....
  if (!path || (path && !*path)) {
  ....
}

Analyzer warning: V728 V728 An excessive check can be simplified. The ‘||’ operator is surrounded by opposite expressions ‘!path’ and ‘path’. plain_wrapper.c 1487

This expression is redundant: in the second subexpression we can remove the verification of path pointer against nullptr. Then, the simplified expression will be like this:

if (!path || !*path)) {

Don’t underestimate such errors. Something else was probably supposed to be there instead of the path variable, and then such expression would be erroneous, not redundant. By the way, this is not the only fragment. There were several more:

  • V728 An excessive check can be simplified. The ‘||’ operator is surrounded by opposite expressions ‘!path’ and ‘path’. fopen_wrappers.c 643
  • V728 An excessive check can be simplified. The ‘||’ operator is surrounded by opposite expressions ‘!headers_lc’ and ‘headers_lc’. sendmail.c 728

Third-party libraries

We have already written about this in the beginning of the article, but I would like to stress it once more. PHP uses several third-party libraries, which, alas, are not perfect and contain errors. Quite a number of warnings were issued for the code from these libraries. We could have brought them all here, but then the article would be too long.

It’s not hard to detect if the error is in the source code of the PHP interpreter or a third party library – there is a comment in the beginning of all source files that describes the license, project, and the authors. Based on these comments, it is easy to track in a project file where the error was hiding.

On the other hand some of the fragments were still worth looking at. In any case, if you use any third-party libraries, you also take the responsibility towards the users for the errors in these projects, because the error can reveal itself during the use of your project. This is why you should carefully consider those dependencies which you pull into your project.

Conclusion

image1
The results of the analysis came out quite interesting. In fact, there were many other bugs found, in this article we had a look at a small amount of medium and high severity warnings. A considerable amount of these errors were found in the PHP libraries, and thus, implicitly, they got into its code. In the PHP code itself, we found some entertaining bugs, which we presented in this article.

To sum up, we would emphasize that it is necessary to use different tools to improve the productivity and the quality of your code. You shouldn’t confine yourself to tests and code review. A static analyzer is one of those tools that might help the programmer write better code, allowing him to use his time more productively instead of looking for bugs.

P.S. Zend Engine developers contacted us and said that the problems, described in the article were already fixed. Good job!

One thought on “Analysis of PHP7

  1. Pingback: Logical Expressions in C/C++. Mistakes Made by Professionals | How Not To Code

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s