See Also
Strings in Objective-C
Welcome back after an end-of-Summer hiatus. Last time we looked at concatenating strings in Objective-C. Today we look at another common string operation: comparison.Strings in Objective-C
Identity
Comparing two variables or objects can sometimes be a tricky proposition. There are several different senses of equality. The most fundamental type of equality is identity: do two variables represent the same thing in memory. Identity only makes sense for reference types, like C strings,
NSString
s and other pointer types. Value types like int
s always designate separate things in memory. In C and Objective-C, identity equality is determined by comparing pointer values using the ==
operator.// comparing two strings for identity char const s1 = "foo"; char const s2 = s1; if (s1 == s2) { NSLog(@"s1 is identical to s2"); } NSString *s3 = @"foo"; NSString *s4 = @"bar"; if (s3 != s4) { NSLog(@"s3 is not identical to s4"); }
Equivalence
A more useful type of equality is equivalence of value: do two variables represent equivalent data. Equivalence is useful when comparing value types as well as reference types, and is usually what programmers think of when comparing two strings.
For C strings, the primary equivalence test is done with the
strcmp()
function. The strcmp()
function compares the data of two C strings char
by char
; if two C strings represent the same sequence of char
values in memory, they are equivalent and strcmp()
returns zero.// checking two C strings for equal value char const *s1 = "foo"; char const *s2 = "bar"; if (strcmp(s1, s2) == 0) { NSLog(@"s1 is equivalent to s2"); } else { NSLog(@"s1 is not equivalent to s2"); }In addition to checking for equivalence,
strcmp()
also categorizes the sort order of the two C strings. If the first argument comes before the second, a negative value is returned; if the first argument comes after the second, a positive value is returned. The strcmp()
function uses a lexicographic comparison, which means that the comparison is strictly on the basis of the integer values of the char
s in the C strings. For ASCII strings, the string "2"
(ASCII code 50) comes before "A"
(ASCII code 64), which precedes "a"
(ASCII code 97). Many sorting algorithms, including the qsort()
function in the C standard library, require a function like strcmp()
.// using strcmp() result int compareResult = strcmp(s1, s2); if (compareResult < 0) { NSLog(@"s1 comes before s2"); } else if (compareResult > 0) { NSLog(@"s1 comes after s2"); }
Sometimes you only want to see if two strings have a common prefix, or you're working with character buffers that aren't null terminated. The
strncmp()
function will compare a limited number of characters, stopping early if it encounters a null terminator in either string. Thus these two strings are equivalent when the first three characters are compared:if (strncmp("foo", "fooey", 3) == 0) { NSLog(@"both start with foo"); } // prints "both start with foo"
When sorting with
strncmp()
, short strings come first:if (strncmp("foo", "fooey", 5) < 0) { NSLog(@"foo comes before fooey"); } // prints "foo comes before fooey"
Case Insensitive
In languages that have upper and lower case letters, you often need to do a case insensitive comparisons. The C standard library doesn't define a case insensitive string comparison function, but one is part of the POSIX standard, and most compiler vendors and operating systems include one. The POSIX version is called
strcasecmp()
. Most modern Unix and Linux systems (including iOS and Mac OS X) have strcasecmp()
available in the standard library. Older Unix systems and other operating systems may call this function stricmp()
or strcmpi()
. There is usually also a length limited version called strncasecmp()
or strnicmp()
.The case insensitive comparison functions usually compare only ASCII characters, which limits their usefulness.
// case insensitive comparison char const *s = "<HTML><HEAD>..."; if (strncasecmp(s, "<html>", 6) == 0) { NSLog(@"looks like HTML"); }
Encoding Issues
The
strcmp()
function was created in the era when most computers used ASCII or other simple single byte encodings. In ASCII, there is only one byte sequence that represents any particular character sequence. This isn't true of many modern encodings, including Unicode. The Unicode character set contains both accented characters such as "é" as well as a combining accent character "´", so there are two ways to represent "é" in UTF-8 encoding:Address | 64 | 65 | 66 |
---|---|---|---|
Character | 'é' | ||
Value | 195 | 169 | |
Character | 'e' | '´' | |
Value | 101 | 204 | 129 |
strcmp()
will not see these two strings as equivalent. Accounting for this requires performing normalization on the Unicode characters in the string before doing the comparison. Unicode has several different types of normalization, which we won't dive into here. If you need to do a lot of low level processing of UTF-8 or other Unicode encoded text, you should look at the International Components for Unicode, a library of C functions for Unicode processing that is included as part of iOS. Better yet, in most cases you should use NSString
s when working with text.NSString
equalityThe
NSString
class defines the -isEqualToString:
instance method for testing if an NSString
is equivalent to another NSString
:// compare two NSStrings NSString *s1 = @"foo"; NSString *s2 = @"bar"; if ( [s1 isEqualToString:s2] ) { NSLog(@"The strings are equivalent."); }You can also use the
-isEqual:
instance method defined by NSObject
to compare two NSString
s, or to compare an NSString
with any other object:// compare two NSStrings using -isEqual: NSString *s1 = @"foo"; NSString *s2 = @"bar"; if ( [s1 isEqual:s2] ) { NSLog(@"The strings are equivalent."); }The difference between the two methods is in their declarations. The
-isEqualToString:
method is only for comparing one NSString
to another; it's declaration looks like:// declaration of -isEqualToString: - (BOOL)isEqualToString:(NSString *)aStringThe
-isEqual:
method is for comparing any kind of NSObject
to another object; it's declaration looks like:// declaration of -isEqual: - (BOOL)isEqual:(id)anObjectIt's possible to use
-isEqual:
to compare an NSString
with an object of a different type, such as an NSNumber
:NSString *fiveString = @"5"; NSNumber *fiveNumber = [NSNumber numberWithInt:5]; if ( [fiveString isEqual:fiveNumber] ) { NSLog(@"fiveString equals fiveNumber"); } else { NSLog(@"Strings aren't equivalent to numbers, silly!"); }You might hope that the
NSString
"5" is equivalent to the NSNumber
"5" but unfortunately they are not; the code above will print out "Strings aren't equivalent to numbers, silly!". In general, objects of different classes aren't considered to be equivalent with one common exception: immutable classes like NSString
can be equivalent to their mutable subclasses (NSMutableString
in this case) and vice versa.NSString *fiveString = @"5"; NSMutableString *fiveMutableString = [NSMutableString stringWithString:@"5"]; if ( [fiveString isEqual:fiveMutableString] ) { NSLog(@"immutable and mutable strings can be equivalent"); }And since
NSMutableString
is a subclass of NSString
, you can also use -isEqualToString:
to compare them:if ( [fiveString isEqualToString:fiveMutableString] ) { NSLog(@"immutable and mutable strings can be equivalent"); }
-compare:
In addition to testing for equivalence using
-isEqual:
or -isEqualToString:
, you can also discover the relative order of two NSString
objects using the -compare:
family of methods. The -compare:
method is very similar to the strcmp()
method in C. The -compare:
method returns a NSComparisonResult
value, which is simply an integer value. Similar to strcmp()
, -compare:
will return zero if the two NSString
s are equivalent, though you can also use the constant NSOrderedSame
instead of zero:// compare two NSStrings NSString *s1 = @"foo"; NSString *s2 = @"bar"; if ( [s1 compare:s2] == NSOrderedSame] ) { NSLog(@"s1 is equivalent to s2"); } else { NSLog(@"s1 is not equivalent to s2"); }Like
strcmp()
, if the receiver of the -compare:
message (the first NSString
) comes before the first argument (the second NSString
), negative one is returned; if the receiver comes after the first argument, positive one is returned. The constants NSOrderedAscending
and NSOrderedDescending
can be used instead of -1 and 1 respectively.// using NSComparisonResult NSComparisonResult comparisonResult = [s1 compare:s2]; if (comparisonResult == NSOrderedAscending) { NSLog(@"s1 comes before s2"); } else if (comparisonResult == NSOrderedAscending) { NSLog(@"s1 comes after s2"); }
Case Insensitive
-compare:
To test the equivalence of two
NSString
objects in a case insensitive manner, use -compare:options:
with the NSCaseInsensitiveSearch
flag.// case insensitive compare NSString *s1 = @"foo"; NSString *s2 = @"FOO"; if ( [s1 compare:s2 options:NSCaseInsensitiveSearch] == NSOrderedSame) { NSLog(@"s1 is equivalent to s2"); }Since case insensitive comparison is a common operation,
NSString
has a convenience method, -caseInsensitiveCompare:
which does the same thing.// case insensitive compare NSString *s1 = @"foo"; NSString *s2 = @"FOO"; if ( [s1 caseInsensitiveCompare:s2] == NSOrderedSame) { NSLog(@"s1 is equivalent to s2"); }
Unicode and
-compare:
By default,
NSString
is pretty smart about Unicode and automatically understands things like Unicode combining characters. For instance, you can represent é two ways, but NSString
knows that they represent equivalent strings:// comparing equivalent Unicode strings NSString *eAcute = @"\u00e9"; // single character 'é' NSString *ePlusAcute = @"e\u0301"; // 'e' + combining '´' if ( [eAcute isEqualToString:ePlusAcute] ) { NSLog(@"'é' is equivalent to 'e' + '´'"); }This can be surprising if you've only worked with ASCII or other single byte encodings. With
NSString
, you can't assume that equivalent strings have the same length and character sequence. Usually you don't care about the Unicode representation, but occasionally it's important. You can use the NSLiteralSearch
flag along with -compare:options:
to do a lexicographic comparison that compares strings character value by character value.// lexicographic comparison of Unicode strings if ( [eAcute compare:ePlusAcute options:NSLiteralSearch] != NSOrderedSame) { NSLog(@"'é' is not lexicographically equivalent to 'e' + '´'"); }
combining options
The options constants used in the
-compare:options:
method are bit flags. You combine them using the bitwise or operator (|
).// using multiple options NSString *eAcute = @"\u00e9"; // 'é' NSString *capitalEAcute = @"\u00c9"; // 'É' if ( [eAcute compare:capitalEAcute options:NSCaseInsensitiveSearch | NSLiteralSearch] != NSOrderedSame) { NSLog(@"'é' is equivalent to 'É'"); }
comparing substrings
If you only want to compare parts of two
NSString
objects, you can use -compare:options:range:
method and specify an NSRange
structure. The NSRange
structure is composed of two parts: a starting location field named loc
and a length field named len
. Usually it's convenient to use the NSMakeRange()
function to generate the NSRange
.// compare substrings NSString *s1 = @"foo"; NSString *s2 = @"fooey"; if ( [s1 compare:s2 options:0 range:MakeRange(0, 3)] == NSOrderedSame) { NSLog(@"both strings start with 'foo'"); }You pass in zero for the options to use the default comparison.
-compare:options:range:
is similar to strncmp()
with one important difference: the NSRange
you give must fall completely inside the receiver (the first string) or an NSRangeException
will be thrown.comparing using a specific locale
By default, the
-compare:
methods use the current locale to determine the ordering of two strings. The current locale is controlled by the user when they set their language and region for their iOS device. Most of the time you should respect the user's settings, but sometimes it's appropriate to compare strings using a fixed locale. Perhaps your app teaches French vocabulary and you want your French word list to sort in standard French order whether the user's phone is set to English, German or Japanese. In French, accented letters at the end of a word sort before accented letters earlier in a word, thus "coté" should come before "côte". If you use the default locale, the result of comparing "coté" and "côte" varies but will probably not give you the correct ordering.// compare using default locale NSString *coteAcute = @"cot\u00e9"; // "coté" NSString *coteCircumflex = @"c\u00f4te"; // "côte" if ( [coteAcute compare:coteCircumflex] == NSOrderedAscending) { NSLog(@"Not using a French locale"); }To remedy this, you can set the locale explicitly when you do your comparison:
// compare using specific locale NSLocale *frenchLocale = [[[NSLocale alloc] initWithLocaleIdentifier:@"fr_FR"] autorelease]; NSComparisonResult comparisonResult = [coteAcute compare:coteCircumflex options:0 range:NSMakeRange(0, 4) locale:frenchLocale]; if (comparisonResult == NSOrderedDescending) { NSLog(@"Using a French locale"); }
That sums up the options for comparing C strings and
NSString
s. Next time, we'll look at slicing and dicing strings by creating substrings.
9 comments:
Very good article with an insightful look at string comparison operations. Also I really appreciate that you explore both C and Objective-C syntax and their respective methods. I'm looking forward to the next Objective-C Tuesday as you explain how to slice and dice strings. (^_^)
Thanks for the post, I missed these!
As the designated proofreader, I feel I should point out you probably meant "precedes" instead of "proceeds".
@Mike Doh! (fixed :-)
Thanks. Nice articles. Please continue
@Pavel Thanks! Glad you found it useful.
For the "Case Insensitive -compare:" you should use the -caseInsensitiveCompare: method
Thanks, I didn't realize that I overlooked the -caseInsensitiveCompare: method. I added an example of its use after the example for -compare:options: with the NSCaseInsensitiveSearch flag.
// comparing equivalent Unicode strings
this snippet does not seem to be true in my testing (snow leopard and iOS4)
it does not print the ..is equivalent.. message
I figured out that to make the
// comparing equivalent Unicode strings
snippet work, you first have to normalise each string with
string=[string decomposedStringWithCanonicalMapping];
Post a Comment