Coauthored with Thierry Joubert
In the code examples in the following chapters in this section of the book, the types and functions for strings manipulations included WCHAR, TCHAR, _T(...), _tprintf(...) etc. These are different from the usual C style, char and printf(...). The reason for this difference comes from character encoding; to allow the use of international alphabets all Windows operating systems since NT-3.1 in 1993 – including Windows CE and Windows Embedded Compact – internally use Unicode character representation as opposed to the old ASCII representation. As a consequence, in the native C WIN32 API all strings are defined as 16-bit arrays and they must contain the value 0x0000 as a termination marker (just as C ASCII strings must contain a final 0x00). It is the responsibility of the developer to provide a correct memory mapping for these Unicode strings, and no automatic or easy translation is available as we use C style arrays.
Warning: Using casts like in (short*)my_char_string or (char*)my_short_string will let you compile and link but it does no translation and the receiver will interpret another string than the one you encoded.
The Unicode character representation wouldn’t be a problem if C developers didn’t spent more than 20 years with ASCII strings deeply embedded in their language. As a consequence a large quantity of legacy code uses ASCII strings, and when the WIN32 API was introduced Microsoft took care of providing both a Unicode and an ASCII version for each function having at least one string argument. Most desktop developers ignore this as the choice is based on a wizard-defined pre-processor macro named _UNICODE, and they provide ASCII strings to Windows in total impunity. There is still a price to pay because the ASCII version of the WIN32 functions allocate Unicode strings on the heap to do the translation before the call, and do a clean up on return.
Table 1.3: Examples of WIN32 API functions
Generic name
ASCII
#undef _UNICODE
Windows desktop ONLY!!
Unicode
#define _UNICODE
CreateFile
CreateFileA
CreateFileW
CreateThread
CreateMutex
CreateMutexA
CreateMutexW
GetConsoleMode
GetConsoleTitle
GetConsoleTitleA
GetConsoleTitleW
Warning: The WIN32 API for Windows Embedded Compact is strictly Unicode, it does not contain the XxxxA functions. Your native code may not migrate instantly from a Windows desktop project to a Windows Compact project.
In order to incorporate Unicode character representation in the C syntax and runtime, both Microsoft and the ANSI Institute implemented extended types and libraries. Everybody agreed on the term ”wide” to qualify the newcomers, therefore all Unicode types and functions will have a ’w’ prefix like in ANSI wchar_t or wprintf, or a ’W’ prefix like in Windows WCHAR or WSTR.
Note: One exception to the ’w’ is the compiler rule to create a constant Unicode string where an ’L’ prefix is required before the string expression ( L”Hello World” ).
Another initiative was taken by ANSI to abstract the Unicode vs ASCII differences in order to allow for source code portability. This abstraction uses ”generic types” based on the UNICODE pre-processor macro. The generic types and functions have a ’_t’ or ’T’ prefix as in _tprintf or TCHAR. One result of these combined efforts is some redundancy at a syntactic level.
Note: In Windows wchar_t and WCHAR are equivalent to the same multibyte character data type. The generic type TCHAR will also be a multibyte character if UNICODE is defined.
The next table shows some corresponding generic, ASCII and UNICODE types and functions
Table 1.4: Generic types equivalence
Generic type, function or macro name
#undef UNICODE
#define UNICODE
TCHAR
char
wchar_t
_T( ) or TEXT( )
Does nothing
Add ’L’ prefix
_tmain( )
main( )
wmain( )
_tprintf( )
printf( )
wprintf( )
_tcslen( )
strlen( )
wcslen( )
_tcscat( )
strcat( )
wcscat( )
_ttoi
atoi( )
_wtoi( )
_getts( )
gets( )
_getws( )
_putts( )
puts( )
_putsws( )
(string) TCHAR* or TCHAR[ ]
char* or char[ ]
wchar_t* or wchar_t[ ]
Note1: Windows Embedded Compact projects always default to the definition of _UNICODE and UNICODE.
Note2: A catalog component provides ”String Safe Utility Functions”, these functions ending with _s should be preferred as they guard against buffer overrun.
The ASCII (or Cstring) ”Hello World!” has its Unicode equivalent L”Hello Word!”. The _T or TEXT macro resolves this by delivering an ASCII string literal or Unicode string literal depending upon the definition of the UNICODE macro.
Last but not least, in an effort towards syntax abstraction the Microsoft WIN32 API redefines all C base types and provides its own types in windows.h. These ”Windows” types are uppercase and they use a field composition technique like in LPVOID where you should read Long Pointer[1] to void (which ends-up into void*).
This syntax translation combined with ASCII vs. Unicode option created a lot of confusion since the early days of the WIN32 API. If you apply the field composition technique of Windows types and everything you just learned about Unicode types, plus the fact that STR means what is usually called sz for ”Zero (terminated) String”, decryption becomes fairly straightforward.
Table 1.5: Windows string types
Type
Description
String type
Base type
LPSTR
Long pointer to zero string
char *
LPWSTR
Long pointer to wide zero string
WCHAR *
LPCSTR
Long pointer to constant zero string
LPCWSTR
Long pointer to constant wide zero string
LPTSTR
Long pointer to generic zero string
Either
TCHAR *
LPCTSTR
Long pointer to constant generic zero string
Note: zero strings have implicit length, the WIN32 API offers other types of strings like the BSTR (also known as OLECHAR) which has an explicit length. Never copy a BSTR to an STR, you must do a translation.
Table 1.6: Windows string code samples
Type Definition
Example
typedef char* LPSTR;
char name[ ] = ”Compact2013”;
typedef WCHAR* LPWSTR;
wchar_t name[ ] = L” Compact2013”;
typedef const char* LPCSTR;
const char name[ ] = ” Compact2013”;
typedef const WCHAR* LPCWSTR;
const wchar_t name[ ] = L” Compact2013”;
typedef TCHAR* LPTSTR;
TCHAR name[ ] = _T("Compact2013");
typedef const TCHAR* LPCTSTR;
Const TCHAR name = _T(”Compact2013”);
Time for some assessment.
Question: I’m working on a native Compact 2013 console project and I need a string input from the user, how should I declare my string ?
Answer: You may declare your string as ASCII, Unicode or Generic, it only depends on how you will use it from inside your program.
// ASCII version char strA[MAXSTR]; scanf("%s", strA); printf("you typed: %s", strA); // UNICODE version wchar_t strW[MAXSTR]; wscanf(L"%s", strW); wprintf(L"you typed: %s", strW) // generic version TCHAR strG[MAXSTR]; _tscanf(_T("%s"), strG); _tprintf(_T("%s"), strG);
// ASCII version
char strA[MAXSTR];
scanf("%s", strA);
printf("you typed: %s", strA);
// UNICODE version
wchar_t strW[MAXSTR];
wscanf(L"%s", strW);
wprintf(L"you typed: %s", strW)
// generic version
TCHAR strG[MAXSTR];
_tscanf(_T("%s"), strG);
_tprintf(_T("%s"), strG);
Listing 1.3: Various versions of string code usage. ASCII, Unicode and Generic
All these implementations are valid in a native Compact 2013 project, remember that Unicode strings are only required at the WIN32 API level, as long as your strings are never used as parameters of a WIN32 function there is no reason to force Unicode or to use the Generic types. The choice has an impact on memory consumption because wide strings use twice the memory of an ASCII string, this may be a concern in an embedded application. The choice may also depend on text serialization/deserialization when the format is imposed by specifications.
Note: In the three samples provided the macros, variable types and functions types are always consistent. You must avoid cross-typing in your code as it may only compile or work in some restricted cases.
[1] Long stands here for bigger than 16-bit, remember there was (primitive) life before the WIN32 API
NEXT: A Generic Compact 2013 Operating System for Application Development – Outline
Click here to provide feedback and input