Tuesday, October 5, 2010

Objective-C Tuesdays: replacing in strings

Welcome back to Objective-C Tuesdays! Today we follow closely on last week's topic of searching in strings with it's sibling, replacing in strings.

It's a nightmare in C
In our series on strings in Objective-C, we've usually started by looking at C strings then moved on to NSStrings. Today is no different. In most cases, using NSString is easier than doing the equivalent operation on C strings. When it comes to replacing characters in a string, using NSString is significantly easier and safer. The standard C library doesn't provide much support for doing common string replacement operations, so you have to implement them yourself. Because of all the manual memory management required when working with C strings, this code is very error prone -- writing off the end of a buffer and forgetting to add the null terminator are two very common types of errors you have to watch out for when working with C strings.

Replacing a character
The only replacement operation that's fairly straightforward on C strings is replacing a single character with another character. Since C strings are just pointers to arrays of chars, you simply calculate the pointer to the char you want to change, dereference the pointer and assign the new char value.

There are two variations of this. The first one uses array notation and the second pointer operations. In both examples below, we use the strdup() function to make a copy of our original C string. The strdup() function isn't part of the C standard library, but most systems have one available (possibly named _strdup()) and it's easy to write one if it's missing on your system (it's available on iOS). You own the string returned by strdup() and are responsible for calling free() when you're done with it.

Here's how you change a character in a C string by treating it as an array of chars:
char const *source = "foobar";

char *copy = strdup(source); // make a non-const copy of source

copy[3] = 'B';               // change char at index 3

NSLog(@"copy = %@", copy);
// prints "copy = fooBar"

free(copy);                  // free copy when done
The alternative way uses pointer arithmetic:
char const *source = "foobar";

char *copy = strdup(source); // make a non-const copy of source

char *c3 = copy + 3;         // get pointer to char at index 3

*c3 = 'B';                   // change char at address of c3

NSLog(@"copy = %@", copy);
// prints "copy = fooBar"

free(copy);                  // free copy when done
As far as the compiler is concerned, this is basically the same code so use whichever method makes the most sense. If you know the index of the char you want to change, use array notation. If you already have a pointer to the char, perhaps from calling strchr(), use the pointer directly.

Replacing a substring
Replacing a substring of a C string is harder. In the case where the original and the replacement have the same number of chars, you can call strncpy() to copy over the characters.
// replacing a substring of equal length
char const *source = "foobar";

char *copy = strdup(source); // make a non-const copy of source

char *c2 = copy + 2;         // get pointer to char at index 2

strncpy(c2, "OBA", 3);       // copy 3 chars

NSLog(@"copy = %s", copy);
// prints "copy = foOBAr"

free(copy);                  // free copy when done
Replacing a substring with a different sized one is even more complex. There are three special cases that need to be handled: the substring to replace is at the start of the original, in the middle, or at the end. When the replacement substring is smaller than the original, there are some short cuts you can take to make the code a little simpler, but we'll only show the general case.

We'll look at the second case, replacing a substring in the middle of the original. With a little extra logic, this code can be adapted to handle all three of our cases.
char const *source = "The rain in Spain";
char const *original = "rain";     // substring to find
char const *replacement = "plane"; // substring to replace

// calculate the required buffer size
// including space for the null terminator
size_t size = strlen(source) - strlen(original) 
            + strlen(replacement) + sizeof(char);

// allocate buffer
char *buffer = calloc(size, sizeof(char));
if ( ! buffer) {
  // handle allocation failure
}

// find original substring in source and
// calculate the length of the unchanged prefix
char *originalInSource = strstr(source, original);
size_t prefixLength = originalInSource - source;

// copy prefix "The " into buffer
strncpy(buffer, source, prefixLength);

// calculate where the replacement substring goes in the buffer
char *replacementInBuffer = buffer + prefixLength;

// copy replacement "plane" into buffer
strcpy(replacementInBuffer, replacement);

// find position of unchanged suffix in source and
// calculate where it goes in the buffer
char const *suffixInSource = originalInSource + strlen(original);
char *suffixInBuffer = replacementInBuffer + strlen(replacement);

// copy suffix " in Spain" into buffer
strcpy(suffixInBuffer, suffixInSource);

NSLog(@"buffer = %s", buffer);
// prints "buffer = The plane in Spain"

free(buffer); // free buffer when done
I won't even waste your time explaining this in detail. No one programming in a modern computer language should have to write this code! It's extremely error prone and is one of the main causes of security vulnerabilities. If you find yourself doing this, stop immediately and seek out one of the many managed string libraries for C that are available. If you're writing code for iOS, you should be using NSString to do this.

Replacing using NSString
The NSString class has a number of useful methods for replacing characters and substrings in an NSString. Because NSString is immutable, these methods all return a new NSString instance containing the replacements, leaving the source NSString unchanged.

When you know the exact area of the string you want to replace, you can use the -stringByReplacingCharactersInRange:withString: method with an NSRange structure, which has fields for location (the zero-based index to start at) and length (the number of characters in the source string to replace). Because NSString does all the memory management for you and returns a new autoreleased NSString, it's child's play compared to doing this with C strings.
// replace a range in an NSString
NSString *source = @"The rain in Spain";
NSRange range;

range.location = 4; // starting index in source
range.length = 3;   // number of characters to replace in source

NSString *copy = [source stringByReplacingCharactersInRange:range 
                                                 withString:@"trai"];

NSLog(@"copy = %@", copy);
// prints "copy = The train in Spain"

// no need to release anything
// copy is autoreleased
This is a definite improvement over working with C strings. You might actually do this in real code without tearing your hair out or causing a buffer overrun bug. We can make this code even more compact by using the NSMakeRange() function to create the NSRange structure.
// replace a range in an NSString
NSString *source = @"The rain in Spain";

// create range in line
NSString *copy = [source stringByReplacingCharactersInRange:NSMakeRange(4, 3)
                                                 withString:@"trai"];

NSLog(@"copy = %@", copy);
// prints "copy = The train in Spain"

// no need to release anything
// copy is autoreleased
If you don't know ahead of time what part of the string you want to replace, you can do a find and replace in one method. The -stringByReplacingOccurrencesOfString:withString: method will find all occurrences of one NSString in another and replace them, returning a new autoreleased NSString.
// find and replace one substring with another
NSString *source = @"The rain in Spain";

NSString *copy = [source stringByReplacingOccurrencesOfString:@"ain"
                                                   withString:@"oof"];

NSLog(@"copy = %@", copy);
// prints "copy = The roof in Spoof"
There is another variation of this method that gives you more control over how substrings are found and replaced. The -stringByReplacingOccurrencesOfString:withString:options:range: method allows you to specify a mask containing one or more options and an NSRange structure allowing you to restrict the operation to a section of the string. The most common option is NSCaseInsensitiveSearch, which matches the substring without regard to case.
// case insensitive replace
NSString *source = @"<BR>The rain<BR>in Spain";

NSString *copy = [source stringByReplacingOccurrencesOfString:@"<br>"
                                                   withString:@"<p>"
                                                      options:NSCaseInsensitiveSearch
                                                        range:NSMakeRange(0, [source length])];

NSLog(@"copy = %@", copy);
// prints "copy = "<p>The rain<p>in Spain"
Another handy search option is NSAnchoredSearch, which searches only at the start of the source string. Notice that you use the bitwise or (|) operator to combine multiple options together.
// anchored, case insensitive replace
NSString *source = @"<BR>The rain<BR>in Spain";

NSString *copy = [source stringByReplacingOccurrencesOfString:@"<br>"
                                                   withString:@"<p>"
                                                      options:NSAnchoredSearch | NSCaseInsensitiveSearch
                                                        range:NSMakeRange(0, [source length])];

NSLog(@"copy = %@", copy);
// prints "copy = "<p>The rain<BR>in Spain"
You can combine the NSBackwardsSearch with NSAnchoredSearch to only replace the substring if it occurs at the end of the source instead of at the beginning.

Replacing in NSMutableString
If you're working with an NSMutableString, you can still call any of the -stringByReplacing... methods to produce a new NSString, but you have the option of making the replacements in the NSMutableString directly. The method -replaceCharactersInRange:withString: is very similar to the -stringByReplacingCharactersInRange:withString method:
// replace a range in an NSMutableString
NSMutableString *source = [NSMutableString stringWithString:@"The rain in Spain"];

[source replaceCharactersInRange:NSMakeRange(4, 3)
                      withString:@"trai"];

NSLog(@"source = %@", source);
// prints "source = The train in Spain"
The method -replaceOccurrencesOfString:withString:options:range: works similarly.

In most cases, there's not much of an advantage to replacing in place in an NSMutableString versus creating a new NSString containing the replacement. Use whichever operation is most convenient. If you need to make many replacements on a very long string, there may be an advantage to replacing in place rather than creating many large temporary NSString instances that live in the autorelease pool.

So far, the searching and replacing methods we've seen have done only simple string matching. Next week, we'll look at more powerful string matching using regular expressions.

2 comments:

Chris said...

No one/fools use raw char*. STL.

Don McCaughey said...

True if you're using C++!