Tuesday, August 3, 2010

Objective-C Tuesdays: concatenating strings

Last week we reviewed wide character strings. Today we'll begin looking at common string operations using C strings and NSStrings, starting with string concatenation.

Many languages have built in support for string concatenation, but C and Objective-C isn't among them. Instead, joining strings is accomplished using library functions in C and member functions of the NSString class in Objective-C.

C strings
Concatenating two C strings is particularly error prone, since it typically requires manually calculating the required buffer size and allocating it.
// concatenating two C strings
char const *s1 = "foo";
char const *s2 = "bar";
size_t size = (strlen(s1) * sizeof(char)) + (strlen(s2) * sizeof(char)) + sizeof('\0');
char *s3 = malloc(size);
if (s3) {
  strcpy(s3, s1);
  strcat(s3, s2);
} else {
  // handle memory allocation failure
}
Exploring this code in logical chunks, the first two lines are specific to this example: they define the two strings we're going to join, s1 and s2. The next line calculates the number of bytes required to hold the new string.
// calculate size of new string
size_t size = (strlen(s1) * sizeof(char)) + (strlen(s2) * sizeof(char)) + sizeof('\0');
The strlen() function counts the number of chars in a string, not including the null terminator. To be pedantically correct, we multiply the length of each string by the size of a char, but since the char integer type is one byte in size, we can write the size calculation this way instead:
size_t size = strlen(s1) + strlen(s2) + sizeof('\0');
If we were concatenating two wide character strings instead, we wouldn't be able to take that shortcut:
size_t size = (wcslen(ws1) * sizeof(wchar_t)) + (wcslen(ws2) * sizeof(wchar_t)) + sizeof(L'\0');
As a matter of style, I like to use the expression sizeof('\0') to account for the size of the null terminator, but it's more common to simply add one:
size_t size = strlen(s1) + strlen(s2) + 1;
The malloc() function allocates a block of memory. If malloc() succeeds, it returns a non-NULL pointer to the memory you requested.
char *s3 = malloc(size);
After checking that the value of pointer s3 is not NULL, we first call strcpy() ("string copy") to copy the string pointed to by s1 into the memory pointed to s3.
if (s3) {
  strcpy(s3, s1);
The strcpy() function always places a null terminator at the end of the destination string. When strcpy() returns, s1 and s3 point to identical C strings at different locations in memory.

Finally, we call strcat() ("string catenate") to append s2 to the end of s3. ("Catenate" is a synonym for "concatenate". Isn't English strange?)
strcat(s3, s2);
The strcat() function first walks down the destination string until it finds the null terminator, then it copies the source string there, overwriting the original null terminator and putting a new null terminator at the end of the appended string. When using strcat() you need to be sure that the destination memory block contains enough space to hold the concatenated strings. If it's too small, you will overwrite memory some other memory block, causing data corruption or a program crash.

If there isn't enough memory available, malloc() returns NULL.
if (s3) {
  // ...
} else {
  // handle memory allocation failure
}
Checking this return value is important; trying to use a NULL pointer will cause your program to be killed by the system. Unfortunately handling errors like this deep in your code is generally a pain in the butt; frequently there's no good option except to abort the current operation.

using a fixed buffer
If you know the maximum size of the strings before hand and the concatenated string is an intermediate value, you can often use a fixed buffer instead of a call to malloc():
// concatenating two C strings
// using a fixed buffer
char const *s1 = "foo";
char const *s2 = "bar";
char buffer[80];

strcpy(buffer, s1);
strcat(buffer, s2);
// buffer now holds concatenated strings
This greatly simplifies C string concatenation, but if your input strings are too big, you'll overflow your buffer and cause a program crash.

NSStrings
Appending one NSString to another is pretty straight forward. The -stringByAppendingString: instance method performs string concatenation, returning a new NSString instance
// concatenating two NSStrings
NSString *s1 = @"foo";
NSString *s2 = @"bar";
NSString *s3 = [s1 stringByAppendingString:s2];
The resulting NSString (s3) is autoreleased and contains "foobar".

appending a formatted string
There's an alternate way to do NSString concatenation by using -stringByAppendingFormat:
// concatenating two NSStrings using a format
NSString *s1 = @"foo";
NSString *s2 = @"bar";
NSString *s3 = [s1 stringByAppendingFormat:@"%@", s2];
Here, we specify a format string that contains an object replacement (%@) only. Additional arguments after the format string must match the replacement specifiers in the format string. This method first generates the formatted string then appends it to the receiver (s1). It's not as efficient as using -stringByAppendingString: directly, but it's more flexible. You can just as easily append an integer or a C string:
NSString *s1 = @"foo";

// appending a number
NSString *s2 = [s1 stringByAppendingFormat:@"%i", 1234];
// s2 is "foo1234"

// appending a C string
char const *s3 = "bar";
NSString *s4 = [s1 stringByAppendingFormat:@"%s", s3];
// s4 is "foobar"

NSString preferred
It should be apparent that NSString concatenation is much easier to deal with than the multi-step procedure required for C strings. In iOS programs, you should generally use NSString whenever possible.

Next time, we'll look at comparison operations and equality of C strings and NSStrings.

No comments: