See Also
Strings in Objective-C
Last time, we looked at C string and Strings in Objective-C
NSString
comparison and equality. Today we'll examine functions and methods for creating substrings of C strings and NSString
s.Substrings of C strings
Creating a C string requires you to explicitly manage the memory the string lives in. Depending on how long you need to keep the C string around, you can use either a fixed buffer or a dynamically allocated one. As always with C strings, you need to be careful not to write past the end of the buffer.
Creating a substring that starts at the beginning of the source string is straight forward: use the
strncpy()
function. There's a big gotcha when using strncpy()
to copy a substring: it doesn't automatically add a null terminator to the destination. Here's an example of copying the first three characters of a C string into a fixed buffer:// copy substring from start of source // using a fixed buffer char const *source = "foobar"; char buffer[4]; // make sure buffer includes // space for null terminator strncpy(buffer, source, 3); // copy first 3 chars from source buffer[3] = '\0'; // remember to add null terminatorUsing a dynamic buffer is similar, but requires explicit memory management.
// copy substring from start of source // using a dynamic buffer char const *source = "foobar"; char *buffer = malloc(4 * sizeof(char)); // make sure buffer includes // space for null terminator if ( ! buffer) { // must handle allocation failure } strncpy(buffer, source, 3); // copy first 3 chars from source buffer[3] = '\0'; // remember to add null terminator // use buffer ... // don't forget to free() buffer when done free(buffer);You can make this a little more compact by using
calloc()
instead of malloc()
. The calloc()
function allocates memory using malloc()
, then clears all the bytes to zero. As long as you make sure to include an extra byte at the end, your new substring will be null terminated:// copy substring from start of source // using a dynamic buffer // allocated with calloc() char const *source = "foobar"; char *buffer = calloc(4, sizeof(char)); // make sure buffer includes // space for null terminator if ( ! buffer) { // handle allocation failure } strncpy(buffer, source, 3); // copy first 3 chars from source // last char in buffer is already '/0' // use buffer ... // don't forget to free() buffer when done free(buffer);There's not a huge difference between
malloc()
and calloc()
, so choose whichever one you're more used to using, or use calloc()
if you don't have a strong preference. The cost of clearing a range of memory to zeros is so tiny as to not be worth considering in most circumstances, and knowing that your buffer is initialized to zeros can be handy.There's no standard C function for getting a substring that starts somewhere in the middle of the source string, because one isn't needed -- you simply move the pointer from the start of the string. Here's an illustration:
// C strings are pointers char const *string = "foobar"; NSLog(@"'%s'", string); // prints out 'foobar' char const *substring = string + 3; NSLog(@"'%s'", substring); // prints out 'bar'You can add an integer value to the C string pointer to get a pointer to the middle of the source string -- just be careful not to go off the end of the string! If you only need the substring for a short period of time, or if you know that the source string will live longer than the substring and never change, it's safe to simply create a substring this way. However, you can introduce weird bugs if you get this wrong. When in doubt, copy the substring to a new buffer:
// create a substring from the middle of a string char const *source = "foobar"; char const *substringSource = source + 3; size_t charCount = strlen(substringSource) + 1; char *buffer = calloc(charCount, sizeof(char)); if ( ! buffer) { // handle allocation failure } strcpy(buffer, substringSource); // use buffer ... free(buffer);Here we calculate the starting point by simply adding
3
to the string pointer source
. Then we figure out the number of char
s we need to allocate using the strlen()
function, remembering to add 1
for the null terminator character. After allocating memory, the strcpy()
function copies all the characters from substringSource
into buffer
. Unlike strncpy()
, strcpy()
will copy the null terminator, so this code will be the same whether we use calloc()
or malloc()
to allocate the buffer.If you need to grab a substring that falls between the beginning and end of a longer string, you combine these two techniques: use pointer arithmetic to get a pointer to the start of the substring, then use
strncpy()
to copy just the characters you need.Warning: beware encoding issues!
Slicing and dicing C strings is easy when you're using a single byte encoding like ASCII. If you're using a multibyte encoding like UTF-8, you need to be aware that one logical character may require two or more bytes. If you want to omit the first three logical characters in a string, you need to examine each byte from the start of a string to determine if it's part of a multibyte sequence, and adjust your string pointer accordingly. If you need to work with multibyte encodings, I recommend finding an appropriate library for the encoding, such as the International Components for Unicode for working with Unicode encodings. Or better yet, transform your C strings into
NSString
s.Substrings of
NSString
sThere are three ways to get a substring of an
NSString
. First we'll look at taking a substring from the start of an NSString
:// create a substring from the start of source NSString *source = @"foobar"; NSString *substring = [source substringToIndex:3]; // substring is "foo"The substring returned by
-substringToIndex:
is autoreleased. You should -retain
or -copy
it if you need to hold on to it.Similarly, to get a substring that starts in the middle of an
NSString
and goes to the end:// create a substring to the end of source NSString *source = @"foobar"; NSString *substring = [source substringFromIndex:3]; // substring is "bar"Finally, the general purpose way to create a substring of an
NSString
is the -substringWithRange:
method, which uses an NSRange
structure, which is defined something like this:// NSRange structure struct NSRange { NSUInteger location; NSUInteger length; }When used with
-substringWithRange:
method, the NSRange
's location
field is the zero-based index of the first character to be included in the substring, and the length
field is the number of characters to include in the substring. Here are some examples:// -substringWithRange: examples NSString *source = @"foobar"; NSRange range; range.location = 0; range.length = 3; NSString *frontHalf = [source substringWithRange:range]; // frontHalf is "foo" range.location = 3; range.length = 3; NSString *backHalf = [source substringWithRange:range]; // backHalf is "bar" range.location = 2; range.length = 2; NSString *middle = [source substringWithRange:range]; // middle = "ob"One word of caution: if the range you give falls outside the receiver (the source string), this method will raise an
NSRangeException
.Setting the fields of
NSRange
is fairly verbose; it's generally more convenient to use the NSMakeRange()
function to create the NSRange
structure instead.// NSMakeRange() example NSString *source = @"foobar"; NSString *frontHalf = [source substringWithRange:NSMakeRange(0, 3)]; // frontHalf is "foo"
NSString encoding mostly not a worry
Internally,
NSString
uses UTF-16 encoding. Although UTF-16 is a variable length encoding like UTF-8, characters from the basic multilingual plane are all two bytes (one word) in length. If you're certain that your NSString
contains only basic multilingual plane characters, then methods like -length
and -substringWithRange:
will work exactly as you expect them. However, if your NSString
includes characters outside the basic multilingual plane, it will contain surrogate pairs, which are multi-word sequences that represent a single character. You'll find that -length
tells you the number of words rather than logical characters, and if you're not careful, methods like -substringWithRange:
can split a surrogate pair in half, leaving you with an invalidly encoded string.Unless your application needs to work with characters outside the basic multilingual plane, the easiest solution is to filter out such characters when you accept data from a source outside your app. Since the basic multilingual plane contains all the characters in common use in most modern languages, this is sufficient for many applications. The standard iOS input keyboards limit the user to characters in the basic multilingual plane, but if your app reads data from the network, such as an RSS feed you don't control, you need to watch out for this.
Next time, we'll look at searching in C strings and
NSString
s.
No comments:
Post a Comment