Tuesday, November 29, 2011

Objective-C Tuesdays: sorting arrays

Good news everyone! Objective-C Tuesdays is back! We left off a few weeks ago talking about techniques for inserting and removing items in arrays. Today we will cover sorting C arrays as well as NSArrays and NSMutableArrays.

Sorting C arrays with qsort()
The C standard library includes only one built-in way to sort C arrays: the qsort() function. qsort() is an implementation of the quicksort algorithm, a good general purpose sorting algorithm. qsort() sorts an array in place.

The qsort() prototype looks like this:
qsort(void *array, size_t itemCount, size_t itemSize,
int (*comparator)(void const *, void const *));

If you're not experienced using function pointers in C, the declaration of qsort() will look confusing. That's because declaring a C function pointer type is confusing. I'll use a typedef to make things easier to understand:
// alternate declaration of qsort using a typedef:
typedef int (*Comparator)(void const *, void const *);

qsort(void *array, size_t itemCount, size_t itemSize,
Comparator comparator);

Function pointers are declared in the same way that functions are declared in C, except they have a * before the function name like other pointer variables do. But there's one hitch. In a function pointer declaration, the * is ambiguous; by default, C will assume the * is part of the return type of the function. So when you declare a function pointer, you need a set of parentheses around the * and the function name in order to clue in the compiler.
// declare a function returning a pointer to an int
int *returnAPointerToSomeInt(void);
// which is interpreted as:
// (int *)returnAPointerToSomeInt(void);

// declare a pointer to a function returning an int
int (*returnSomeInt)(void);

Now that we've cleared that up, let's look at the declaration of qsort() again.
typedef int (*Comparator)(void const *, void const *);

qsort(void *array, size_t itemCount, size_t itemSize,
Comparator comparator);

The first three parameters of qsort() specify the array to sort, the count of items in the array and the size (in bytes) of a single item. Note that the array is of type void *, which allows us to pass any type of C array into qsort().

The last parameter is the comparator function.
typedef int (*Comparator)(void const *item1, void const *item2);

The comparator takes pointers to two items from the array and returns a negative number if item1 belongs before item2, a positive number if item1 belongs after item2 and zero if the two items are equal (or have equivalent ordering). This style of comparator function is used in many languages for sorting. I think of the items positioned on a number line when trying to understand a comparator:

item1 <=====<< item2
<-+------+-----+------+------+->
-2 -1 0 1 2

item2 >>=====> item1
<-+------+-----+------+------+->
-2 -1 0 1 2

When item1 belongs before item2, the comparator function will return a negative number; when item1 belongs after item2, the comparator returns a positive number. This relationship is the difference between item1 and item2, or item1 - item2.

The C standard library doesn't include many comparator functions, but they're easy to write. Here's an example of a comparator for ints to sort them in natural order:
int compareInts(void const *item1, void const *item2) {
int const *int1 = item1;
int const *int2 = item2;
return *int1 - *int2;
}

We create local variables int1 and int2 since a void pointer will automatically convert to a typed pointer in C, and we know that qsort() will actually be giving us pointers to ints. The two parameters are also const, so we can't modify the int values that the items point to (which would cause qsort() to go haywire). Then we can simply return the difference between the two int values to determine their sort order. We could do this in one line using casts and parentheses:
int compareInts(void const *item1, void const *item2) {
return *( (int const *) item1) - *( (int const *) item2);
}

Either way, now we can use our compareInts() function to sort an array of ints:
int array[6] = { 42, 17, 57, 19, 11, 5 };

qsort(array, 6, sizeof(int), compareInts);
// array now contains 5, 11, 17, 19, 42, 57

Now suppose we wanted to sort our array in reverse order? We would still use qsort() to do the job, but employ a different comparator function. Here is compareIntsReversed():
int compareIntsReversed(void const *item1, void const *item2) {
int const *int1 = item1;
int const *int2 = item2;
return -(*int1 - *int2);
// or simply return *int2 - *int1;
}

By reversing the sign of the comparator's return value, you can reverse the order of sorting.
int array[6] = { 42, 17, 57, 19, 11, 5 };

qsort(array, 6, sizeof(int), compareIntsReversed);
// array now contains 57, 42, 19, 17, 11, 5

The qsort() function is very flexible. You can sort arrays of complex types as easily as arrays of ints. To sort an array of C strings in lexicographic order, use the strcmp() function from the C standard library.
char const *array[3] = { "red", "green", "blue" };

qsort(array, 3, sizeof(char const *), strcmp);
// array now contains "blue", "green", "red"

Sorting an array of CGPoints, first write a comparator:
int compareCGPoints(void const *item1, void const *item2) {
struct CGPoint const *point1 = item1;
struct CGPoint const *point2 = item2;

if (point1->x < point2->x) {
return -1;
} else if (point1->x > point2->x) {
return 1;
}

if (point1->y < point2->y) {
return -1;
} else if (point1->y > point2->y) {
return 1;
}

return 0;
}

Notice that we first compare the X coordinates of the points and only check the Y coordinates if the X coordinates are equal. This is a common pattern when sorting over multiple fields in a struct. Here's an example of the CGPoint comparator in action:
struct CGPoint pointArray[4] = {
{ 4.0f, 3.0f },
{ 2.0f, 1.0f },
{ 4.0f, 1.0f },
{ 2.0f, 3.0f }
};
qsort(pointArray, 4, sizeof(struct CGPoint), compareCGPoints);
// pointArray now contains:
// { 2.0f, 1.0f }
// { 2.0f, 3.0f }
// { 4.0f, 1.0f }
// { 4.0f, 3.0f }


Other C sorting functions
For most common sorting jobs, qsort() it a reasonable choice, and it's the only choice that's included in the C standard library. OS X and iOS also include a couple of other sorting functions worth mentioning: heapsort() and mergesort(), which have the same parameters as qsort() and are detailed on the qsort() man page. heapsort() is slower than qsort() but uses a limited amount of memory while sorting, whereas qsort() uses recursion and can potentially overflow the stack when sorting huge arrays. mergesort() can be significantly faster than qsort() when the array is mostly in sorted order already, but is significantly slower when the array is in random order.

Sorting an NSArray
Objective-C provides a number of ways to sort an NSArray. Because NSArray objects are immutable, all the sorting methods on NSArray return a new sorted NSArray object and leave the original one unchanged. The most basic sorting method is -sortedArrayUsingFunction:context:, which is similar to sorting a C array with qsort(). The full method declaration looks like this:
- (NSArray *)sortedArrayUsingFunction:(NSInteger (*)(id, id, void *))comparator 
context:(void *)context

The first parameter is a pointer to a comparator function. As we did for qsort(), I'll rewrite the method declaration using a typedef for the function pointer to make it a little easier to read:
typedef NSInteger (*Comparator)(id item1, id item2, void *context);

- (NSArray *)sortedArrayUsingFunction:(Comparator)comparator
context:(void *)context

The comparator function for NSArray sorting has a few differences from the qsort() one. It returns an NSInteger, which is simply a typedef for int. Instead of two items of type void const *, it takes two items of type id since NSArrays can only hold object types. And finally, there's an extra context parameter, a void pointer that allows you to pass extra information to the comparator if you need to (but you can safely ignore it if you don't need it).

Let's write a comparator function that orders NSStrings by their length.
static NSInteger compareStringsByLength(id item1, id item2, void *context) {
return [item1 length] - [item2 length];
}

Since we can send any message to a variable of type id, we don't even need casts or intermediate variables here. (Of course, we'll get a runtime error if we put the wrong type of objects into our array, but comparators for qsort() have the similar problems.)

And now let's see it in action:
NSArray *array = [NSArray arrayWithObjects:@"Florida", 
@"Texas", @"Mississippi", @"Delaware", nil];

NSArray *sortedArray = [array sortedArrayUsingFunction:compareStringsByLength
context:NULL];
// sortedArray contains:
// @"Texas"
// @"Florida"
// @"Delaware"
// @"Mississippi"

Now let's use the context parameter to make it easy to switch between normal and reversed sort order. We can pass a pointer to any kind of data we want, so we will use a pointer to a BOOL, where a YES value means reversed and a NO value means normal ordering.
NSInteger compareStringsByLength(id item1, id item2, void *context) {
BOOL *reversed = context;
NSInteger order = [item1 length] - [item2 length];
if (*reversed) {
return -order;
} else {
return order;
}
}

Now we need to pass something to the context parameter when we call -sortedArrayUsingFunction:context:
NSArray *array = [NSArray arrayWithObjects:@"Florida", 
@"Texas", @"Mississippi", @"Delaware", nil];

BOOL reversed = YES;
NSArray *sortedArray = [array sortedArrayUsingFunction:compareStringsByLength
context:&reversed];
// sortedArray contains:
// @"Mississippi"
// @"Delaware"
// @"Florida"
// @"Texas"

Note that we use the & operator to pass the address of the reversed variable as the context.

If you target iOS 3.2, OS X 10.6 or later, you can use the block version, -sortedArrayUsingComparator:. Blocks make one-off comparator functions easier to read and understand by putting the comparator definition right along side the sort method call. We can rewrite our example this way:
NSArray *array = [NSArray arrayWithObjects:@"Florida", 
@"Texas", @"Mississippi", @"Delaware", nil];

BOOL reversed = YES;
NSArray *sortedArray = [array sortedArrayUsingComparator:^(id item1, id item2) {
NSInteger order = [item1 length] - [item2 length];
if (reversed) {
return -order;
} else {
return order;
}
}];
// sortedArray contains:
// @"Mississippi"
// @"Delaware"
// @"Florida"
// @"Texas"

Since a block copies the value of variables like reversed in the enclosing scope at the time it's created, there's no need for a second context parameter to -sortedArrayUsingComparator:. Blocks are really handy, but if you find yourself writing the same comparator block in multiple places in your code, you might be better off using a plain old comparator function and -sortedArrayUsingFunction: instead to prevent duplicate code, in keeping with the DRY principle. (Or you might consider how to restructure your code so that sorting is defined in only one place.)

Since items in an NSArray are all objects, very often the items themselves have a useful comparator method. When this is the case, the -sortedArrayUsingSelector: method is very handy. NSArrays of NSStrings are very common, as is the need to sort in a case-insensitive manner. Using NSString's -caseInsensitiveCompare: method, we can sort this way:
NSArray *tagNames = [NSArray arrayWithObjects:@"H1", 
@"body", @"A", @"Head", nil];

NSArray *sortedTagNames = [tagNames sortedArrayUsingSelector:@selector(caseInsensitiveCompare:)];
// sortedTagNames contains:
// @"A"
// @"body"
// @"H1"
// @"Head"

There's one more interesting way to sort an NSArray: we'll look at the -sortedArrayUsingDescriptors: method and the NSSortDescriptor class next week.

No comments: