Stevebook - [ blog:comparing_c_0x_lambdas_with_scheme_an_object_with_mutable_state_no_classes_involved ]

Comparing C++0x lambdas with Scheme: An "object" with mutable state, no classes involved!

After reading SICP, I got the impression that, at least from a semantic point of view, the only real difference between programming languages is whether or not they support lambda expressions. (Of course, there are many pragmatic differences between languages, too many to list; but the point is that lambdas let you do whatever you can do in non-lambda-supporting languages, but not vice-versa.)

For instance, the book points out that you can implement a basic "methods operating on private data" system, as found in object-oriented languages, in Scheme using lambda constructors to allocate the memory:

> (define make-account (lambda ()
    (define balance 0)
    (list (lambda (x) (set! balance x))
          (lambda () balance))))
> (define a (make-account))
> ((cadr a))   ; get-balance method
0
> ((car a) 5)  ; set-balance! method
> ((cadr a))   ; get-balance method
5
> ((car a) 7)  ; set-balance! method
> ((cadr a))   ; get-balance method
7

The balance variable is hidden to outside code, there is no way to modify it except through the provided methods.

Well, due to this idea that the presence or non-presence of lambdas is what makes all the difference, I wondered for some time whether the new C++0x lambdas would allow for an entirely new style of programming in C++. The fact is that I got a little tired of class-oriented programming and have lately been doing everything in a functional style. (Especially in my Python code, which supports classes perfectly well, but also when I code in C... unfortunately the latter involves a lot of casting to void*, which makes things difficult since you lose all type safety.)

When I heard about C++0x lambdas, I started to wonder if this might renew my interest in the language. For one thing, there is little doubt that the it compiles to fairly efficient code, and supports types, so it might be interesting to try it as a typed functional programming platform. Of course, without further changes to the language ecosystem like encouraging the use of persistent data structures, it will always suffer the problems induced by mutability, but at least a functional style might encourage statelessness, instead of the current object-oriented style which does the exact opposite. On the other hand, I realized that most of my fatigue with OOP came from dealing with abstraction via complicated class hierarchies, not from mutability, and that is something that function-oriented programming really does solve, especially if combined with something as powerful as the C++ template engine! Regardless, I was wondering if I couldn't pull off some Scheme-like tricks in C++0x.

Quick review of C++0x lambda syntax:

auto f = [=](..args..){..body};
auto f = [&](..args..){..body};

The [=] means to create a closure where referenced variables are copied by value, and [&] means copy by reference. You can also specify how specific closed-over variables are handled.

So, I was curious how variables are handled. (To C++'ers, ignore the printf... as a C guy, I just happen to dislike std::cout.) By the way, you need g++ version 4.5 to compile this, which is easy to install on Ubuntu through the gcc-4.5 package.

lambda.cpp:

#include <stdio.h>

int main()
{
    int balance=6;
    printf("balance: %d\n", balance);
    auto v = [&] () {
        printf("balance: %d\n", balance);
        balance=2;
    };
    v();
    v();
    balance=8;
    v();
    return 0;
}

Run:

$ g++-4.5 -Wall -std=c++0x -o lambda lambda.cpp  && ./lambda
balance: 6
balance: 6
balance: 2
balance: 8

So, here we can see that balance is truly a reference. It's possible for both the lambda and the outside scope to modify the same variable. If we change [&] to [=], we get an error saying that balance is read-only. In fact, copied variables become const, but that's okay because it doesn't really make sense to modify variables that won't exist after the function runs. If we remove the balance=2 line, it compiles, and we get:

$ g++-4.5 -std=c++0x -o lambda lambda.cpp  && ./lambda
balance: 6
balance: 6
balance: 6
balance: 6

Since balance was copied when the lambda was created, modifying it later doesn't change its value in the closure.

Okay, is it possible to create an "object", with private state?

lambda2.cpp:

#include <stdio.h>
#include <tuple>

int main()
{
    auto account = [] () {
        int balance = 0;
        printf("Initial balance: %d\n", balance);
        auto set = [&] (int n) {
            printf("Changing balance from %d to %d.\n", balance, n);
            balance = n;
        };
        auto get = [&] () {
            return balance;
        };
        return std::make_tuple(set,get);
    }();
    printf("Balance: %d\n", std::get<1>(account)());
    printf("Set balance to 5.\n");
    std::get<0>(account)(5);
    printf("Balance: %d\n", std::get<1>(account)());
    return 0;
}

Notice we have to use an std::tuple<> to return a pair of functions. We don't know their types so it's certainly nice to have auto. Run it:

$ g++-4.5 -Wall -std=c++0x -o lambda2 lambda2.cpp  && ./lambda2
Initial balance: 0
Balance: 0
Set balance to 5.
Changing balance from -1217990668 to 5.
Balance: 5

Holy cow, what happened to the balance inside set? Well, balance is a variable in the contructor lambda. Once the constructor lambda is called, there is no longer a reference to it so it is automatically deallocated, but the inner lambda for set that was returned still contains a reference to that memory.

Notice that the compiler does not warn us about this!

Is there a solution? Clearly we need to ensure that the memory for balance is not deallocated. However, changing set to [=] doesn't help (read-only), so we have to explicitly make balance an allocated value. Problem being that we don't have a destructor for when the lambda gets deallocated, so how can we ensure that its memory also gets deallocated. The answer is to use a smart pointer type, shared_ptr, which performs reference counting for us. It goes out of "scope" when the lambda is destructed, and our memory is freed. Let's provide an explicit deleter function so we can track it and be sure:

lambda3.cpp:

#include <stdio.h>
#include <tuple>
#include <memory>

void deleter(int* p)
{
    printf("deleting %p\n", p);
    delete p;
}

int main()
{
    auto account = [] () {
        std::shared_ptr<int> balance(new int(0), deleter);
        printf("Initial balance: %d (%p)\n", *balance, balance.get());
        printf("balance.use_count(): %ld\n", balance.use_count());
        auto set = [=] (int n) {
            printf("Changing balance from %d to %d.\n", *balance, n);
            *balance = n;
        };
        auto get = [=] () {
            return *balance;
        };
        printf("balance.use_count(): %ld\n", balance.use_count());
        return std::make_tuple(set,get);
    }();
    printf("Balance: %d\n", std::get<1>(account)());
    printf("Set balance to 5.\n");
    std::get<0>(account)(5);
    printf("Balance: %d\n", std::get<1>(account)());
    return 0;
}

Something to notice here is that I had to use copy semantics ([=]) for the set and get lambdas, because the shared_ptr is a value type that must be copied for its reference-tracking to work. Taking a reference to it defeats the purposes. We're printing out the reference count for balance so we can understand what the shared_ptr is doing.

$ g++-4.5 -Wall -std=c++0x -o lambda3 lambda3.cpp  && ./lambda3
Initial balance: 0 (0x850e008)
balance.use_count(): 1
balance.use_count(): 3
Balance: 0
Set balance to 5.
Changing balance from 0 to 5.
Balance: 5
deleting 0x850e008

Here we see that the behaviour is as expected. Notice that the reference count is 3 after declaring the lambdas! Each closure has a reference, as does the constructor lambda before it returns. You can see that the balance memory is indeed freed at the end of the program when both closures go out of final scope.

So, we now have an intantiated "object" with its own methods, that manipulates private data, without using the class keyword at all! Although there's no advantage, in this short example, to avoiding real classes, this is interesting because it's such a departure from how you normally do this kind of thing in C++. It shows that you really could construct a whole complicated stateful system in C++ without using classes at all, allocating all dynamic memory through closures. I'm not going to try to say why you'd want to do that here, I just think it's cool that it's now possible at all.

Let's compare directly the bare bones of the two implementations, without all the cruft:

C++:

auto account = [] () {
    std::shared_ptr<int> balance(new int(0));
    return std::make_tuple([=] (int n) { *balance = n; },
                           [=] ()      { return *balance; });
}();

Scheme:

(define make-account (lambda ()
  (define balance 0)
  (list (lambda (x) (set! balance x))
        (lambda () balance))))

They are about the same length. The complexity of the C++ version makes it slightly harder to read at first glance, but C++'ers should be comfortable enough with it. Mainly the explicit handling of memory and the special need to use tuple for returning multiple values is what makes it slightly more complicated to look at; the lambdas themselves are pretty easy to read.

One thing I like about the C++ syntax is the control it gives you over the copy/reference semantics. You can specify it for each variable if you want.

Syntax aside, it's really striking how similar the two program structures are. This makes me wonder how interesting it would be to directly translate a Lispish- or ML-/Haskell-like shell syntax down to C++0x using a very basic parser. It could provide a nice (typed!) functional language that produces native and efficient code, and is directly able to interoperate with the massive C and C++ code base out there.

Discuss this article on reddit.


A bit of further experimentation with abstraction over lambdas led to finding one awkward oddity: it seems to be difficult to combine lambdas with templates. There is no template syntax for the lambda syntax, and you can't easily return a lambda from a templated regular function because it must be auto, requiring decltype, but you can't put the lambda in decltype, so the compiler can't figure out the return type. (Not 100% sure why it can't just infer it, but..)

Apparently there was an issue with something called "polymorphic lambdas" that just didn't make it into the C++0x standard, unfortunately. However, I was able to create a templated regular function that returns an std::function<>, and puts a lambda inside it. Not sure if that's really an ideal answer, but it allowed me to return a polymorphic closure:

#include <stdio.h>
#include <functional>

class A {
public:
    int value() { return 12; }
};

class B {
public:
    int value() { return 9; }
};

template<typename T>
std::function<int()> getT() {
    auto v = ([] () {T a; return a.value();});
    return v;
}

int main() {    
    printf("%d\n", getT<A>()());
    printf("%d\n", getT<B>()());
    return 0;
}