Example Using Cassandra With Thrift in C++

Due to a very exciting, recently launched project at work, I’ve had to interface with Cassandra through C++ code. As anyone who has done this can testify, the API docs are vague at best, and there are very few examples out there. The constant API changes between 0.x versions and the fact that the Cassandra API has its docs and Thrift has its own, but there is nothing bridging the two isn’t helpful either. So at the moment it is very much a case of dissecting header files and looking at implementation in the Thrift generated source files.

The only somewhat useful example of using Cassandra with C++ one can find online is this, but due to the API changes, this is now outdated (it’s still worth a read).

So in the hope that nobody else will have to spend the better part of a day piecing things together to achieve even the most basic thing, here’s an example which works with Cassandra 0.7 and Thrift 0.6.

First of all, create a new keyspace and a column family, using cassandra-cli:

1
2
3
4
5
6
7
8
9
10
11
[default@unknown] create keyspace nm_example;
c647b2c0-83e2-11e0-9eb2-e700f669bcfc
Waiting for schema agreement...
... schemas agree across the cluster
[default@unknown] use nm_example;
Authenticated to keyspace: nm_example
[default@nm_example] create column family nm_cfamily with comparator=BytesType and default_validation_class=BytesType;
30466721-83e3-11e0-9eb2-e700f669bcfc
Waiting for schema agreement...
... schemas agree across the cluster
[default@nm_example]

Now go to the directory where you have cassandra installed and enter the interface/ directory and run: thrift -gen cpp cassandra.thrift This will create the gen-cpp/ directory. From this directory, you need to copy all files bar the Cassandra_server.skeleton.cpp one to wherever you intend to keep your sources. Here’s some example code which inserts, retrieves, updates, retrieves and deletes keys:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
#include "Cassandra.h"

#include <protocol/TBinaryProtocol.h>
#include <thrift/transport/TSocket.h>
#include <thrift/transport/TTransportUtils.h>

using namespace std;
using namespace apache::thrift;
using namespace apache::thrift::protocol;
using namespace apache::thrift::transport;
using namespace org::apache::cassandra;
using namespace boost;

static string host("127.0.0.1");
static int port= 9160;

int64_t getTS(){
    /* If you're doing things quickly, you may want to make use of tv_usec
     * or something here instead
     */
    time_t ltime;
    ltime=time(NULL);
    return (int64_t)ltime;

}

int main(){
    shared_ptr<TTransport> socket(new TSocket(host, port));
    shared_ptr<TTransport> transport(new TFramedTransport(socket));
    shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));
    CassandraClient client(protocol);

    const string&#038; key="your_key";

    ColumnPath cpath;
    ColumnParent cp;

    ColumnOrSuperColumn csc;
    Column c;

    c.name.assign("column_name");
    c.value.assign("Data for our key to go into column_name");
    c.timestamp = getTS();
    c.ttl = 300;

    cp.column_family.assign("nm_cfamily");
    cp.super_column.assign("");

    cpath.column_family.assign("nm_cfamily");
    /* This is required - thrift 'feature' */
    cpath.__isset.column = true;
    cpath.column="column_name";

    try {
        transport->open();
        cout << "Set keyspace to 'dpdns'.." << endl;
        client.set_keyspace("nm_example");

        cout << "Insert key '" << key << "' in column '" << c.name << "' in column family '" << cp.column_family << "' with timestamp " << c.timestamp << "..." << endl;
        client.insert(key, cp, c, org::apache::cassandra::ConsistencyLevel::ONE);

        cout << "Retrieve key '" << key << "' from column '" << cpath.column << "' in column family '" << cpath.column_family << "' again..." << endl;
        client.get(csc, key, cpath, org::apache::cassandra::ConsistencyLevel::ONE);
        cout << "Value read is '" << csc.column.value << "'..." << endl;

        c.timestamp++;
        c.value.assign("Updated data going into column_name");
        cout << "Update key '" << key << "' in column with timestamp " << c.timestamp << "..." << endl;
        client.insert(key, cp, c, org::apache::cassandra::ConsistencyLevel::ONE);

        cout << "Retrieve updated key '" << key << "' from column '" << cpath.column << "' in column family '" << cpath.column_family << "' again..." << endl;
        client.get(csc, key, cpath, org::apache::cassandra::ConsistencyLevel::ONE);
        cout << "Updated value is: '" << csc.column.value << "'" << endl;

        cout << "Remove the key '" << key << "' we just retrieved. Value '" << csc.column.value << "' timestamp " << csc.column.timestamp << " ..." << endl;
        client.remove(key, cpath, csc.column.timestamp, org::apache::cassandra::ConsistencyLevel::ONE);

        transport->close();
    }
    catch (NotFoundException &#038;nf){
        cerr << "NotFoundException ERROR: "<< nf.what() << endl;
    }
    catch (InvalidRequestException &#038;re) {
        cerr << "InvalidRequest ERROR: " << re.why << endl;
    }
    catch (TException &#038;tx) {
        cerr << "TException ERROR: " << tx.what() << endl;
    }

    return 0;
}

Say we’ve called the file cassandra_example.cpp, and you have the files mentioned above in the same directory, you can comile things like this:

1
2
3
4
5
6
7
8
9
10
$ g++ -lthrift -Wall  cassandra_example.cpp cassandra_constants.cpp Cassandra.cpp cassandra_types.cpp -o cassandra_example
$ ./cassandra_example
Set keyspace to 'nm_example'..
Insert key 'your_key' in column 'column_name' in column family 'nm_cfamily' with timestamp 1306008338...
Retrieve key 'your_key' from column 'column_name' in column family 'nm_cfamily' again...
Value read is 'Data for our key to go into column_name'...
Update key 'your_key' in column with timestamp 1306008339...
Retrieve updated key 'your_key' from column 'column_name' in column family 'nm_cfamily' again...
Updated value is: 'Updated data going into column_name'
Remove the key 'your_key' we just retrieved. Value 'Updated data going into column_name' timestamp 1306008339 ...

Another thing worth mentioning is Padraig O'Sullivan’s libcassandra, which may or may not be worth a look depending on what you want to do and what versions of Thrift and Cassandra you’re tied to.

May 21st, 2011