See AlsoLast week we looked at Unicode escape sequences in C string and
Strings in Objective-C
Strings in Objective-C
NSStringliteral. Today we'll take a quick overview of wide character strings and talk about where they fit into the iOS development.
When the C language was developed in the early 1970's, little thought was given to representing non-English languages. By default, most C compilers assumed that both code files and application output used 7-bit ASCII encoding and that each logical character in a string fit into a single 8-bit byte or
charvalue. By the time C was first standardized by ANSI in 1989 (and by ISO in 1990), the need to handle many more characters than ASCII was obvious, but the Unicode standard was still nascent. So the ANSI C committee included a wide character type and wide character string functions in the C89 standard, but didn't tie wide character support to any specific character encoding scheme.
C89 introduced a new integer type,
wchar_t. This is similar to a
char, but typically "wider". On many systems, including Windows, a
wchar_tis 16 bits. This is typical of systems that implemented their Unicode support using earlier versions of the Unicode standard, which originally defined fewer than 65,535 characters. Unicode was later expanded to support historical and special purpose character sets, so on some systems, including Mac OS X and iOS, the
wchar_ttype is 32 bits in size. This is often poorly documented, but you can use a simple test like this to find out:
// how big is wchar_t? NSLog(@"wchar_t is %u bits wide", 8 * sizeof(wchar_t));On a Mac or iPhone, this will print "wchar_t is 32 bits wide". Additionally,
typedeffor another integer type in C. In C++,
wchar_tis a built-in integer type. In practice, this means you need to
#include <wchar.h>in C when using wide characters.
signed or unsigned?
charinteger type is almost always a signed integer with a range from -128 to 127. You can use the
CHAR_MAXconstants defined in
<limits.h>to find out the range for a particular compiler:
NSLog(@"CHAR_MIN = %0.f", (double)CHAR_MIN); NSLog(@"CHAR_MAX = %0.f", (double)CHAR_MIN);
wchar_ttype can be signed or unsigned. The
WCHAR_MAXconstants hold the range of a
wchar_tand are defined in both
NSLog(@"WCHAR_MIN = %0.f", (double)WCHAR_MIN); NSLog(@"WCHAR_MAX = %0.f", (double)WCHAR_MIN);On Windows,
wchar_tis an unsigned 16-bit integer. On Mac and iPhone,
wchar_tis a signed 32-bit integer, so the code above will print out "WCHAR_MAX = 2147483647" and "WCHAR_MIN = -2147483648". For the most part you don't need to worry about whether
wchar_tis signed or unsigned; it only becomes important if you need to do comparisons and operations that mix
wchar_twith other integer types (a rarity).
wide character literals
We looked at C string literals in previous entries. Wide character string literals are very similar, but are prefixed with 'L':
// example of a wide character string literal wchar_t const *s = L"foobarf!";Like C string literals, wide strings separated by only whitespace are considered one logical string:
// wide strings written in segments wchar_t const *s1 = L"foo" "bar"; wchar_t const *s2 = L"Hello, " L"world!";
wide character functions
Most string functions in the standard C library are defined in the
<string.h>header. A very similar set of functions for wide character strings are defined in
<wchar.h>. The functions follow a similar naming convention. Where string functions are prefixed with
str, the wide character equivalents are prefixed with
wcs(for wide character string). So the
strlen()function calculates the length of a string and the corresponding
wcslen()function calculates the length of a wide character string.
not used much
In practice, you won't use wide character strings very often in Objective-C since the
NSStringclass does just about everything wide character strings are meant to do, but you may occasionally run across them in other C libraries.
Next time, we'll begin looking at common string operations using C strings and
NSStrings, starting with string concatenation.