How-to initialize Cassandra in macOS?

A little note on how I setup and bootstrap a local Cassandra clusters on macOS machines for development.

Instructions below were tested on macOS Sierra, and aim to spawn a 3-nodes 2.1.x cluster.

Install ccm and its dependencies:

1 $ brew install ant
2 $ brew cask install java
3 $ pip install --upgrade ccm

Just in case we messed up a previous installation, let’s clean things up:

1 $ ccm switch test21
2 $ ccm stop test21
3 $ killall java
4 $ ccm remove
5 $ rm -rf "${HOME}/.ccm/test21"

Create a new 3-nodes cluster named test21 with the latest Cassandra release of the 2.1.x series:

1 $ ccm create test21 -v 2.1 -n 3

Here is an example on how we can alter the common configuration of all nodes of the cluster. In this case to bump all timeouts ten times:

 1 $ tee -a ~/.ccm/test21/cluster.conf <<-EOF
 2 config_options: {
 3     read_request_timeout_in_ms: 50000,
 4     range_request_timeout_in_ms: 100000,
 5     write_request_timeout_in_ms: 20000,
 6     request_timeout_in_ms: 100000,
 7     tombstone_failure_threshold: 10000000}
 8 EOF
 9 $ ccm updateconf

Also had to sometimes increase Java’s heap size, like to accommodate large data imports:

1 $ export CCM_MAX_HEAP_SIZE="12G"
2 $ export CCM_HEAP_NEWSIZE="2400M"

Before starting the server, we need to create missing local network interfaces, one for each node:

1 $ sudo ifconfig lo0 alias 127.0.0.1 up
2 $ sudo ifconfig lo0 alias 127.0.0.2 up
3 $ sudo ifconfig lo0 alias 127.0.0.3 up

We can now start the cluster:

1 $ ccm start test21

To get the state of the cluster:

1 $ ccm status
2 Cluster: 'test21'
3 -----------------
4 node1: UP
5 node3: UP
6 node2: UP

Or a much more detailed status:

 1 $ ccm status -v
 2 Cluster: 'test21'
 3 -----------------
 4 node1: UP
 5        auto_bootstrap=False
 6        thrift=('127.0.0.1', 9160)
 7        binary=('127.0.0.1', 9042)
 8        storage=('127.0.0.1', 7000)
 9        jmx_port=7100
10        remote_debug_port=0
11        byteman_port=0
12        initial_token=-9223372036854775808
13        pid=81379
14 
15 node3: UP
16        auto_bootstrap=False
17        thrift=('127.0.0.3', 9160)
18        binary=('127.0.0.3', 9042)
19        storage=('127.0.0.3', 7000)
20        jmx_port=7300
21        remote_debug_port=0
22        byteman_port=0
23        initial_token=3074457345618258602
24        pid=81381
25 
26 node2: UP
27        auto_bootstrap=False
28        thrift=('127.0.0.2', 9160)
29        binary=('127.0.0.2', 9042)
30        storage=('127.0.0.2', 7000)
31        jmx_port=7200
32        remote_debug_port=0
33        byteman_port=0
34        initial_token=-3074457345618258603
35        pid=81380

To get the detailed data ownership status, you need to get through a node and point to an existing column family:

 1 $ ccm node1 status my_column_family
 2 
 3 Datacenter: datacenter1
 4 =======================
 5 Status=Up/Down
 6 |/ State=Normal/Leaving/Joining/Moving
 7 --  Address    Load     Tokens  Owns (effective)  Host ID                               Rack
 8 UN  127.0.0.1  6.08 GB  1       100.0%            25e0440b-3ac9-490e-b0b0-260e96395f15  rack1
 9 UN  127.0.0.2  6.22 GB  1       100.0%            848edc79-db1c-49bf-bdd8-3768b588460f  rack1
10 UN  127.0.0.3  6.14 GB  1       100.0%            75acd6c7-61c5-4ae7-9008-63d6426d1468  rack1

For debugging, a node’s log is available through ccm:

1 $ ccm node1 showlog

And you can directly query through that node:

1 $ TZ=UTC cqlsh --cqlversion=3.2.1 127.0.0.1
2 Connected to test21 at 127.0.0.1:9042.
3 [cqlsh 5.0.1 | Cassandra 2.1.12 | CQL spec 3.2.1 | Native protocol v3]
4 Use HELP for help.
5 cqlsh> CONSISTENCY QUORUM;
6 Consistency level set to QUORUM.
7 cqlsh>

Finally, to restore a bunch of table snapshots from your production cluster:

 1 $ TABLES="table1 table2 table3"
 2 $ DUMP_FOLDER="${HOME}/dump/2016-09-12/"
 3 $ for host_folder in $(ls "${DUMP_FOLDER}"); do
 4 >     for table in ${TABLES}; do
 5 >         SSTABLE_FOLDER="${DUMP_FOLDER}/${host_folder}/my_column_family/${table}";
 6 >         echo "Importing: ${SSTABLE_FOLDER} ...";
 7 >         ccm bulkload "${SSTABLE_FOLDER}";
 8 >     done
 9 > done

Forcing a repair on each table after a massive import can’t be bad:

1 $ for table in ${TABLES}; do
2 >     ccm node1 nodetool repair my_column_family ${table};
3 > done

Kevin Deldycke — 🦬🪒🐇🕳 yak-shaving the rabbit holes

How-to initialize Cassandra in macOS?