Monday, April 27, 2009

Love, Hate, C++ and Complexity

I've learned a lot of programming languages over the years. I enjoy diving into a new language, and I almost always come out a better programmer, able to think about code in new and different ways. C++ was the fourth major programming language I learned (BASIC, Pascal and C were the first three). I've been using it since the mid 80's, before templates and the STL, when it was closer to "C with Classes" than to the multi-paradigm beast it's become today.

I learned and used C before learning C++. This was both beneficial and detrimental. C is sometimes called "high level assembly language" because statements and expressions in C map very closely to the generated machine code; the C compiler doesn't perform a lot of magical code generation for you. This engenders a "WYSIWYG" mindset in C programmers. When you see an expression like x + y, you know that the compiler is going to MOV the contents of x into a register, MOV the contents of y into another and ADD them.

C++ started off as "C with Classes". At its core, C++ classes are basically C structs that act as namespaces for related functions. And there's a little syntactic sugar to hide the struct pointer (this) in these member functions and to perform initialization and clean up automatically. This doesn't seem like much, but when you're working with large code bases, going from:
person_t *person = malloc(sizeof(person_t));
  init_person(person, "Brian", "Griffin");
  do_something_with_person(person);
  save_person(person, db);
  clean_up_person(person);
  free(person);
to something a little cleaner and less repetitive:
Person *person = new Person("Brian", "Griffin");
  person->doSomething();
  person->saveIn(db);
  delete person;
makes a big difference when repeated over and over.

Code for any significant system is written once but read many times. Anything that aids readability and organization pays off in the long run. By encouraging programmers to define their data structures and related functions together in a class instead of free form, C++ brought a level of organization to large code bases that was often not present in C. And because using classes and methods is more succinct than plain structs and functions, readability of existing code increases. Not that you can't make a mess with C++, but if you're trying not to, there are some natural paths to organization in C++ that help.

However, Bjarne Stroustrup is a computer language geek and sometimes it seems like he and the ANSI C++ committee never met a language feature they didn't like. Like the hydra, you'd master one new C++ feature and two new ones would sprout up in its place. Multiple inheritance, function overloading, operator overloading, templates, the STL, run time type information, exceptions, placement new, virtual inheritance, partial specialization, implicit conversions, inline functions, access specifiers, namespaces -- i'm sure I'm forgetting at least a dozen important concepts here.

I was drawn to C++ by the pain of writing the same boilerplate in C over and over. For instance, most moderate to large sized C programs use a linked list or two, but C doesn't give you good tools to encapsulate a linked list in a clean, compact, typesafe and reusable way. C++ does. There's even a perfectly cromulent linked list class in the C++ standard library. Moving from C to C++ was originally like moving from a bicycle to a motorcycle: it felt powerful and could take you places further and faster than you'd ever been before. But it was also a little scary. You're speeding along open and exposed, the wind buffeting your helmet and you know you can send yourself sliding down the asphalt if you're not careful.

I went through this love-hate cycle. I'd get disgusted with the limitations of C and turn to C++. It was always empowering at first -- classes, constructors, destructors, RAII, smart pointers, strings, vectors, iterators. Then the dark underside of C++ would emerge. The impenetrable error messages, the mangled function names and oh my science, those paragraph-long template errors. I never had the misfortune of using Cfront, but those early versions of Borland and Microsoft C++ weren't very kind either. I'd lose hours, sometimes days trying to fix some bizarre syntax error. That was hard, but in the long run it drove me to acquire a deeper knowledge of C++. Once I understood what the little man behind the curtain was doing, things became a lot clearer.

The more insidious side of C++ is also one of its attractions. With all the tools that it gives you, there are often many ways to approach a problem. Should I make something a struct or a class? Should this data member be private or protected? Should I write a copy constructor for this class? Should I use inheritance, composition or templates to build my family of objects? If an exception is thrown in the middle of this constructor, will everything unwind properly? I declared a const instance of this class, but I still need to change some of its internal data -- should I just make the data members mutable or is this a code smell? Should I make this class a friend of this other class, or am I really making a mess here? It's chock full of fun and interesting things to ponder, but I was often finding my progress grinding to a snail's pace as I struggled with all the choices that C++ gave me.

Eventually I'd throw up my hands in disgust and go back to the happy world of C. Boilerplate be damned, at least I know what everything is doing and there's really not that many ways to do something. You write structs and functions that act on them. About the most contentious issue is whether to create constants using #define or enum. In many ways, C is far more productive. No classes, no exceptions, no access specifiers, no templates, no overloading. You just crank out code. I'll give C++ credit here though -- it made me a better C programmer. My C code today is cleaner and better organized and about as "object oriented" as you can make C; C++ taught me that.

After a while I'd tire of writing yet another linked list and revisit the exciting world of C++. This love-hate cycle eventually ended and I made my peace with both C and C++. I acquired a pretty expert knowledge of C++ and it's an old friend now. Mangled names no longer bother me and those template compiler error messages no longer daunt. It hasn't hurt that the spec was completed a decade ago in 1998 and the last revision was in 2005 with TR1. That's a lifetime in computer years.

My peaceful repose is about to end. The next major version of C++ is close to finalization. I had the good fortune of seeing Bjarne himself talk about it a couple of years ago at Google, but in the hour and a half he talked, he barely scratched the surface. Wikipedia has an extensive article on the new version. And recently, Stephan T Lavavej of Microsoft's Visual C++ team has written a series of articles on the Visual C++ Team Blog covering some of the new features in detail. Microsoft is adding support for many of the upcoming features into the next version of Visual C++, slated to be part of Visual Studio 2010.

In his first part, Stephan covers the new syntax for defining anonymous functions in expressions (known as lamdas, lambda expressions or lambda functions), type inference in variable declarations with the auto keyword and static_assert for making compile-time assertions.

Stephan's second part covers rvalue references which enable move semantics and perfect forwarding. Move semantics enables you to optimize construction and assignment from temporary objects to minimize memory allocation and copying. Perfect forwarding allows you to write template functions that preserve the const and lvalue/rvalue quality of their arguments, which is important for writing template based wrappers.

The most recent part covers the new decltype keyword, which is used in template function declarations to specify return types that are deduced from function arguments.

This is all heady stuff, but if you use C++, it's worth spending some time going through Stefan's examples, particularly in part 2. And be sure to take a peek at the Wikipedia article to get a sense of just how much new stuff will be a part of the upcoming standard. It'll be like a whole new language: std::initializer_list and uniform initialization, a new for loop syntax, return types after the parameter list, concepts, nullptr, enum classes, template typedefs, variadic templates, Unicode string literals, long long int, std::unorded_map (hash table), std::regex (regular expressions) and more.

As an armchair computer language geek and closet masochist, I'm looking forward to some fun-filled evenings parsing bizarre new compiler errors. As a working programmer and entrepreneur, I'm glad that I'm currently doing most of my coding in Objective-C and Python.

(11:30 AM: Fixed mistake in example code.)

2 comments:

20thCenturyBoy said...

But you don't actually need to use any of the new features. A lot of them are there for library authors. But I do fear for the sanity of C++ book authors though!

Don McCaughey said...

That's true for a certain class of journeyman programmer who is content to simply cranks out application logic in a well defined framework. But anyone who needs/wants to code beyond that level is going to have their hands full.

If you write or modify a class, you need to understand rvalue references in order to know when a move constructor/assignment operator is appropriate. And there's the new syntax for deleting copy constructors, forcing the creation of default constructors and prohibiting heap allocation. And the new support for constructors that take initializer lists.

I'm not saying that this is an unqualified bad thing; personally I can't wait to start playing around with this stuff. Yeah, but how do you write a book covering the whole language now in less than 3000 pages?