Guileful BSTR strings

Let’s talk about one more nasty data type – BSTR (Basic string or binary string).

The fragment is taken from VirtualBox project. The code contains an error that analyzer diagnoses in the following way: V745 A ‘wchar_t *’ type string is incorrectly converted to ‘BSTR’ type string. Consider using ‘SysAllocString’ function.

....
HRESULT EventClassID(BSTR bstrEventClassID);
....
hr = pIEventSubscription->put_EventClassID(
                    L"{d5978630-5b9f-11d1-8dd2-00aa004abd5e}");

Explanation

Here’s how a BSTR type is declared:

typedef wchar_t OLECHAR;
typedef OLECHAR * BSTR;

At first glance it seems that “wchar_t *” and BSTR are one and the same things. But this is not so, and this brings a lot of confusion and errors.

Let’s talk about BSTR type to get a better idea of this case.

Here is the information from MSDN site. Reading MSDN documentation isn’t much fun, but we have to do it.

A BSTR (Basic string or binary string) is a string data type that is used by COM, Automation, and Interop functions. Use the BSTR data type in all interfaces that will be accessed from script.

 

Screenshot_2

BSTR description:

  1. Length prefix. A four-byte integer that contains the number of bytes in the following data string. It appears immediately before the first character of the data string. This value does not include the terminating null character.
  2. Data string. A string of Unicode characters. May contain multiple embedded null characters.
  3. Terminator. Two null characters.

A BSTR is a pointer. The pointer points to the first character of the data string, not to the length prefix. BSTRs are allocated using COM memory allocation functions, so they can be returned from methods without concern for memory allocation. The following code is incorrect:

BSTR MyBstr = L"I am a happy BSTR";

This code builds (compiles and links) correctly, but it will not function properly because the string does not have a length prefix. If you use a debugger to examine the memory location of this variable, you will not see a four-byte length prefix preceding the data string. Instead, use the following code:

BSTR MyBstr = SysAllocString(L"I am a happy BSTR");

A debugger that examines the memory location of this variable will now reveal a length prefix containing the value 34. This is the expected value for a 17-byte single-character string that is converted to a wide-character string through the inclusion of the “L” string modifier. The debugger will also show a two-byte terminating null character (0x0000) that appears after the data string.

If you pass a simple Unicode string as an argument to a COM function that is expecting a BSTR, the COM function will fail.

We hope this is enough to understand why we should separate the BSTR and simple strings of “wchar_t *” type.

Additional links:

  1. MSDN. BSTR.
  2. StackOverfow. Static code analysis for detecting passing a wchar_t* to BSTR.
  3. StackOverfow. BSTR to std::string (std::wstring) and vice versa.
  4. Robert Pittenger. Guide to BSTR and CString Conversions.
  5. Eric Lippert. Eric’s Complete Guide To BSTR Semantics.

Correct code

hr = pIEventSubscription->put_EventClassID(
       SysAllocString(L"{d5978630-5b9f-11d1-8dd2-00aa004abd5e}"));

Recommendation

If you see an unknown type, it’s better not to hurry, and to look it up in the documentation. This is important to remember, so it’s not a big deal that this tip was repeated once again.

Written by Andrey Karpov.
This error was found with PVS-Studio static analysis tool.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s