Welcome back after an end-of-Summer hiatus. Last time we looked at
concatenating strings in Objective-C. Today we look at another common string operation: comparison.
Identity
Comparing two variables or objects can sometimes be a tricky proposition. There are several different senses of equality. The most fundamental type of equality is
identity: do two variables represent the same thing in memory. Identity only makes sense for reference types, like C strings,
NSString
s and other pointer types. Value types like
int
s always designate separate things in memory. In C and Objective-C, identity equality is determined by comparing pointer values using the
==
operator.
// comparing two strings for identity
char const s1 = "foo";
char const s2 = s1;
if (s1 == s2) {
NSLog(@"s1 is identical to s2");
}
NSString *s3 = @"foo";
NSString *s4 = @"bar";
if (s3 != s4) {
NSLog(@"s3 is not identical to s4");
}
Equivalence
A more useful type of equality is
equivalence of value: do two variables represent equivalent data. Equivalence is useful when comparing value types as well as reference types, and is usually what programmers think of when comparing two strings.
For C strings, the primary equivalence test is done with the
strcmp()
function. The
strcmp()
function compares the data of two C strings
char
by
char
; if two C strings represent the same sequence of
char
values in memory, they are equivalent and
strcmp()
returns zero.
// checking two C strings for equal value
char const *s1 = "foo";
char const *s2 = "bar";
if (strcmp(s1, s2) == 0) {
NSLog(@"s1 is equivalent to s2");
} else {
NSLog(@"s1 is not equivalent to s2");
}
In addition to checking for equivalence,
strcmp()
also categorizes the sort order of the two C strings. If the first argument comes
before the second, a negative value is returned; if the first argument comes
after the second, a positive value is returned. The
strcmp()
function uses a lexicographic comparison, which means that the comparison is strictly on the basis of the integer values of the
char
s in the C strings. For ASCII strings, the string
"2"
(ASCII code 50) comes before
"A"
(ASCII code 64), which precedes
"a"
(ASCII code 97). Many sorting algorithms, including the
qsort()
function in the C standard library, require a function like
strcmp()
.
// using strcmp() result
int compareResult = strcmp(s1, s2);
if (compareResult < 0) {
NSLog(@"s1 comes before s2");
} else if (compareResult > 0) {
NSLog(@"s1 comes after s2");
}
Sometimes you only want to see if two strings have a common prefix, or you're working with character buffers that aren't null terminated. The
strncmp()
function will compare a limited number of characters, stopping early if it encounters a null terminator in either string. Thus these two strings are equivalent when the first three characters are compared:
if (strncmp("foo", "fooey", 3) == 0) {
NSLog(@"both start with foo");
}
// prints "both start with foo"
When sorting with
strncmp()
, short strings come first:
if (strncmp("foo", "fooey", 5) < 0) {
NSLog(@"foo comes before fooey");
}
// prints "foo comes before fooey"
Case Insensitive
In languages that have upper and lower case letters, you often need to do a case insensitive comparisons. The C standard library doesn't define a case insensitive string comparison function, but one is part of the
POSIX standard, and most compiler vendors and operating systems include one. The POSIX version is called
strcasecmp()
. Most modern Unix and Linux systems (including iOS and Mac OS X) have
strcasecmp()
available in the standard library. Older Unix systems and other operating systems may call this function
stricmp()
or
strcmpi()
. There is usually also a length limited version called
strncasecmp()
or
strnicmp()
.
The case insensitive comparison functions usually compare only ASCII characters, which limits their usefulness.
// case insensitive comparison
char const *s = "<HTML><HEAD>...";
if (strncasecmp(s, "<html>", 6) == 0) {
NSLog(@"looks like HTML");
}
Encoding Issues
The
strcmp()
function was created in the era when most computers used ASCII or other simple single byte encodings. In ASCII, there is only one byte sequence that represents any particular character sequence. This isn't true of many modern encodings, including Unicode. The Unicode character set contains both accented characters such as "é" as well as a
combining accent character "´", so there are two ways to represent "é" in UTF-8 encoding:
Address | 64 | 65 | 66 |
Character | 'é' |
Value | 195 | 169 |
Character | 'e' | '´' |
Value | 101 | 204 | 129 |
Obviously a lexicographic comparison function like
strcmp()
will not see these two strings as equivalent. Accounting for this requires performing
normalization on the Unicode characters in the string before doing the comparison. Unicode has several different types of normalization, which we won't dive into here. If you need to do a lot of low level processing of UTF-8 or other Unicode encoded text, you should look at the
International Components for Unicode, a library of C functions for Unicode processing that is included as part of iOS. Better yet, in most cases you should use
NSString
s when working with text.
NSString
equality
The
NSString
class defines the
-isEqualToString:
instance method for testing if an
NSString
is equivalent to another
NSString
:
// compare two NSStrings
NSString *s1 = @"foo";
NSString *s2 = @"bar";
if ( [s1 isEqualToString:s2] ) {
NSLog(@"The strings are equivalent.");
}
You can also use the
-isEqual:
instance method defined by
NSObject
to compare two
NSString
s, or to compare an
NSString
with any other object:
// compare two NSStrings using -isEqual:
NSString *s1 = @"foo";
NSString *s2 = @"bar";
if ( [s1 isEqual:s2] ) {
NSLog(@"The strings are equivalent.");
}
The difference between the two methods is in their declarations. The
-isEqualToString:
method is only for comparing one
NSString
to another; it's declaration looks like:
// declaration of -isEqualToString:
- (BOOL)isEqualToString:(NSString *)aString
The
-isEqual:
method is for comparing any kind of
NSObject
to another object; it's declaration looks like:
// declaration of -isEqual:
- (BOOL)isEqual:(id)anObject
It's possible to use
-isEqual:
to compare an
NSString
with an object of a different type, such as an
NSNumber
:
NSString *fiveString = @"5";
NSNumber *fiveNumber = [NSNumber numberWithInt:5];
if ( [fiveString isEqual:fiveNumber] ) {
NSLog(@"fiveString equals fiveNumber");
} else {
NSLog(@"Strings aren't equivalent to numbers, silly!");
}
You might hope that the
NSString
"5" is equivalent to the
NSNumber
"5" but unfortunately they are not; the code above will print out "Strings aren't equivalent to numbers, silly!". In general, objects of different classes aren't considered to be equivalent with one common exception: immutable classes like
NSString
can be equivalent to their mutable subclasses (
NSMutableString
in this case) and vice versa.
NSString *fiveString = @"5";
NSMutableString *fiveMutableString = [NSMutableString stringWithString:@"5"];
if ( [fiveString isEqual:fiveMutableString] ) {
NSLog(@"immutable and mutable strings can be equivalent");
}
And since
NSMutableString
is a subclass of
NSString
, you can also use
-isEqualToString:
to compare them:
if ( [fiveString isEqualToString:fiveMutableString] ) {
NSLog(@"immutable and mutable strings can be equivalent");
}
-compare:
In addition to testing for equivalence using
-isEqual:
or
-isEqualToString:
, you can also discover the relative order of two
NSString
objects using the
-compare:
family of methods. The
-compare:
method is very similar to the
strcmp()
method in C. The
-compare:
method returns a
NSComparisonResult
value, which is simply an integer value. Similar to
strcmp()
,
-compare:
will return zero if the two
NSString
s are equivalent, though you can also use the constant
NSOrderedSame
instead of zero:
// compare two NSStrings
NSString *s1 = @"foo";
NSString *s2 = @"bar";
if ( [s1 compare:s2] == NSOrderedSame] ) {
NSLog(@"s1 is equivalent to s2");
} else {
NSLog(@"s1 is not equivalent to s2");
}
Like
strcmp()
, if the receiver of the
-compare:
message (the first
NSString
) comes
before the first argument (the second
NSString
), negative one is returned; if the receiver comes
after the first argument, positive one is returned. The constants
NSOrderedAscending
and
NSOrderedDescending
can be used instead of -1 and 1 respectively.
// using NSComparisonResult
NSComparisonResult comparisonResult = [s1 compare:s2];
if (comparisonResult == NSOrderedAscending) {
NSLog(@"s1 comes before s2");
} else if (comparisonResult == NSOrderedAscending) {
NSLog(@"s1 comes after s2");
}
Case Insensitive -compare:
To test the equivalence of two
NSString
objects in a case insensitive manner, use
-compare:options:
with the
NSCaseInsensitiveSearch
flag.
// case insensitive compare
NSString *s1 = @"foo";
NSString *s2 = @"FOO";
if ( [s1 compare:s2 options:NSCaseInsensitiveSearch] == NSOrderedSame) {
NSLog(@"s1 is equivalent to s2");
}
Since case insensitive comparison is a common operation,
NSString
has a convenience method,
-caseInsensitiveCompare:
which does the same thing.
// case insensitive compare
NSString *s1 = @"foo";
NSString *s2 = @"FOO";
if ( [s1 caseInsensitiveCompare:s2] == NSOrderedSame) {
NSLog(@"s1 is equivalent to s2");
}
Unicode and -compare:
By default,
NSString
is pretty smart about Unicode and automatically understands things like Unicode combining characters. For instance, you can represent é two ways, but
NSString
knows that they represent equivalent strings:
// comparing equivalent Unicode strings
NSString *eAcute = @"\u00e9"; // single character 'é'
NSString *ePlusAcute = @"e\u0301"; // 'e' + combining '´'
if ( [eAcute isEqualToString:ePlusAcute] ) {
NSLog(@"'é' is equivalent to 'e' + '´'");
}
This can be surprising if you've only worked with ASCII or other single byte encodings. With
NSString
, you can't assume that equivalent strings have the same length and character sequence. Usually you don't care about the Unicode representation, but occasionally it's important. You can use the
NSLiteralSearch
flag along with
-compare:options:
to do a lexicographic comparison that compares strings character value by character value.
// lexicographic comparison of Unicode strings
if ( [eAcute compare:ePlusAcute options:NSLiteralSearch] != NSOrderedSame) {
NSLog(@"'é' is not lexicographically equivalent to 'e' + '´'");
}
combining options
The options constants used in the
-compare:options:
method are bit flags. You combine them using the bitwise or operator (
|
).
// using multiple options
NSString *eAcute = @"\u00e9"; // 'é'
NSString *capitalEAcute = @"\u00c9"; // 'É'
if ( [eAcute compare:capitalEAcute
options:NSCaseInsensitiveSearch | NSLiteralSearch]
!= NSOrderedSame)
{
NSLog(@"'é' is equivalent to 'É'");
}
comparing substrings
If you only want to compare parts of two
NSString
objects, you can use
-compare:options:range:
method and specify an
NSRange
structure. The
NSRange
structure is composed of two parts: a starting location field named
loc
and a length field named
len
. Usually it's convenient to use the
NSMakeRange()
function to generate the
NSRange
.
// compare substrings
NSString *s1 = @"foo";
NSString *s2 = @"fooey";
if ( [s1 compare:s2
options:0
range:MakeRange(0, 3)] == NSOrderedSame)
{
NSLog(@"both strings start with 'foo'");
}
You pass in zero for the options to use the default comparison.
-compare:options:range:
is similar to
strncmp()
with one important difference: the
NSRange
you give must fall completely inside the receiver (the first string) or an
NSRangeException
will be thrown.
comparing using a specific locale
By default, the
-compare:
methods use the current locale to determine the ordering of two strings. The current locale is controlled by the user when they set their language and region for their iOS device. Most of the time you should respect the user's settings, but sometimes it's appropriate to compare strings using a fixed locale. Perhaps your app teaches French vocabulary and you want your French word list to sort in standard French order whether the user's phone is set to English, German or Japanese. In French, accented letters at the end of a word sort before accented letters earlier in a word, thus "coté" should come before "côte". If you use the default locale, the result of comparing "coté" and "côte" varies but will probably not give you the correct ordering.
// compare using default locale
NSString *coteAcute = @"cot\u00e9"; // "coté"
NSString *coteCircumflex = @"c\u00f4te"; // "côte"
if ( [coteAcute compare:coteCircumflex] == NSOrderedAscending) {
NSLog(@"Not using a French locale");
}
To remedy this, you can set the locale explicitly when you do your comparison:
// compare using specific locale
NSLocale *frenchLocale = [[[NSLocale alloc] initWithLocaleIdentifier:@"fr_FR"] autorelease];
NSComparisonResult comparisonResult = [coteAcute compare:coteCircumflex
options:0
range:NSMakeRange(0, 4)
locale:frenchLocale];
if (comparisonResult == NSOrderedDescending) {
NSLog(@"Using a French locale");
}
That sums up the options for comparing C strings and
NSString
s. Next time, we'll look at
slicing and dicing strings by creating substrings.