Setup Cassandra Single Node Cluster on Ubuntu 16.04

Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.

Cassandra supports linear scalability by adding a new machine to it with no downtime or interruption to applications, also increases Read and Write throughput of the Cassandra.

Every Cassandra node in the cluster will have the same role. Data is distributed across the cluster which means each node holds different data. Cassandra supports replication and multi-data center replication for redundancy, failover, and disaster recovery.

System Update

Update the system with the latest security patches. You can use the below command.

$ sudo apt-get update

Install Java 8

Apache Cassandra run on top of Java Virtual Machine (JVM). We’ll install Oracle JDK 8 on the system before we install Apache Cassandra. Apache Cassandra can also run on OpenJDK

Add Oracle Java PPA to your system before proceeding with the installation of Java.

$ sudo add-apt-repository -y ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get -y install oracle-java8-installer

$ java -version
java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

Install Cassandra

We will install Cassandra using official package available on Apache Software Foundation, so add Cassandra repository to make the package available to your system.

$ echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list 
$ curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add - 
$ sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key A278B781FE4B2BDA 
$ sudo apt-get update 
$ sudo apt-get install cassandra 

Cassandra Service

Now Cassandra service should be up and running on your system. Check the status.

krishna@Ubuntu16:~$ sudo apt-get install cassandra
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages were automatically installed and are no longer required:
 gyp javascript-common libjs-inherits libjs-jquery libjs-node-uuid libjs-underscore libssl-dev libssl-doc libuv1 libuv1-dev linux-headers-4.10.0-28 linux-headers-4.10.0-28-generic
 linux-image-4.10.0-28-generic linux-image-extra-4.10.0-28-generic node-abbrev node-ansi node-ansi-color-table node-archy node-async node-block-stream node-combined-stream
 node-cookie-jar node-delayed-stream node-forever-agent node-form-data node-fstream node-fstream-ignore node-github-url-from-git node-glob node-graceful-fs node-gyp node-inherits
 node-ini node-json-stringify-safe node-lockfile node-lru-cache node-mime node-minimatch node-mkdirp node-mute-stream node-node-uuid node-nopt node-normalize-package-data node-npmlog
 node-once node-osenv node-qs node-read node-read-package-json node-request node-retry node-rimraf node-semver node-sha node-sigmund node-slide node-tar node-tunnel-agent
 node-underscore node-which python-pkg-resources zlib1g-dev
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
 libopts25 ntp
Suggested packages:
 cassandra-tools ntp-doc
The following NEW packages will be installed:
 cassandra libopts25 ntp
0 upgraded, 3 newly installed, 0 to remove and 54 not upgraded.
Need to get 30.0 MB of archives.
After this operation, 41.0 MB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:2 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 libopts25 amd64 1:5.18.7-3 [57.8 kB]
Get:1 http://dl.bintray.com/apache/cassandra 311x/main amd64 cassandra all 3.11.0 [29.4 MB]
Get:3 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 ntp amd64 1:4.2.8p4+dfsg-3ubuntu5.7 [518 kB]
Fetched 30.0 MB in 4min 47s (105 kB/s)
Selecting previously unselected package libopts25:amd64.
(Reading database ... 263455 files and directories currently installed.)
Preparing to unpack .../libopts25_1%3a5.18.7-3_amd64.deb ...
Unpacking libopts25:amd64 (1:5.18.7-3) ...
Selecting previously unselected package ntp.
Preparing to unpack .../ntp_1%3a4.2.8p4+dfsg-3ubuntu5.7_amd64.deb ...
Unpacking ntp (1:4.2.8p4+dfsg-3ubuntu5.7) ...
Selecting previously unselected package cassandra.
Preparing to unpack .../cassandra_3.11.0_all.deb ...
Unpacking cassandra (3.11.0) ...
Processing triggers for libc-bin (2.23-0ubuntu9) ...
Processing triggers for man-db (2.7.5-1) ...
Processing triggers for systemd (229-4ubuntu19) ...
Processing triggers for ureadahead (0.100.0-19) ...
Setting up libopts25:amd64 (1:5.18.7-3) ...
Setting up ntp (1:4.2.8p4+dfsg-3ubuntu5.7) ...
Setting up cassandra (3.11.0) ...
Adding group `cassandra' (GID 130) ...
Done.
vm.max_map_count = 1048575
net.ipv4.tcp_keepalive_time = 300
update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
Processing triggers for libc-bin (2.23-0ubuntu9) ...
Processing triggers for systemd (229-4ubuntu19) ...
Processing triggers for ureadahead (0.100.0-19) ...
krishna@Ubuntu16:~$

krishna@Ubuntu16:~$ sudo systemctl status cassandra
● cassandra.service - LSB: distributed storage system for structured data
   Loaded: loaded (/etc/init.d/cassandra; bad; vendor preset: enabled)
   Active: active (running) since Thu 2017-09-21 16:13:21 IST; 1min 12s ago
     Docs: man:systemd-sysv-generator(8)
   CGroup: /system.slice/cassandra.service
           └─4006 java -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX:

Sep 21 16:13:21 Ubuntu16 systemd[1]: Starting LSB: distributed storage system for structured data...
Sep 21 16:13:21 Ubuntu16 systemd[1]: Started LSB: distributed storage system for structured data.
krishna@Ubuntu16

Connecting to Cluster

Once the Cassandra service is up and running, check the status of the cluster.

krishna@Ubuntu16:~$ sudo nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  103.67 KiB  256          100.0%            1ccbcad7-693b-4d1b-ba95-6dda2a4e3214  rack1

krishna@Ubuntu16:~$

In the output ‘UN’ means UP and Normal.

connecting to cluster

krishna@Ubuntu16:~$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>
cqlsh>
cqlsh>
cqlsh>
cqlsh> help

Documented shell commands:
===========================
CAPTURE  CLS          COPY  DESCRIBE  EXPAND  LOGIN   SERIAL  SOURCE   UNICODE
CLEAR    CONSISTENCY  DESC  EXIT      HELP    PAGING  SHOW    TRACING

CQL help topics:
================
AGGREGATES               CREATE_KEYSPACE           DROP_TRIGGER      TEXT
ALTER_KEYSPACE           CREATE_MATERIALIZED_VIEW  DROP_TYPE         TIME
ALTER_MATERIALIZED_VIEW  CREATE_ROLE               DROP_USER         TIMESTAMP
ALTER_TABLE              CREATE_TABLE              FUNCTIONS         TRUNCATE
ALTER_TYPE               CREATE_TRIGGER            GRANT             TYPES
ALTER_USER               CREATE_TYPE               INSERT            UPDATE
APPLY                    CREATE_USER               INSERT_JSON       USE
ASCII                    DATE                      INT               UUID
BATCH                    DELETE                    JSON
BEGIN                    DROP_AGGREGATE            KEYWORDS
BLOB                     DROP_COLUMNFAMILY         LIST_PERMISSIONS
BOOLEAN                  DROP_FUNCTION             LIST_ROLES
COUNTER                  DROP_INDEX                LIST_USERS
CREATE_AGGREGATE         DROP_KEYSPACE             PERMISSIONS
CREATE_COLUMNFAMILY      DROP_MATERIALIZED_VIEW    REVOKE
CREATE_FUNCTION          DROP_ROLE                 SELECT
CREATE_INDEX             DROP_TABLE                SELECT_JSON

cqlsh>

Using Cassandra

Let’s tryout Cassandra installation by creating database and tables.

krishna@Ubuntu16:~$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>
cqlsh> show version;
[cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4]
cqlsh>
cqlsh> show host;
Connected to Test Cluster at 127.0.0.1:9042.
cqlsh>

cqlsh> CREATE KEYSPACE newdb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh>
cqlsh> use newdb;
cqlsh:newdb>
cqlsh:newdb> CREATE TABLE emp (id int PRIMARY KEY, name text, year text);
cqlsh:newdb>
cqlsh:newdb> DESC emp;

CREATE TABLE newdb.emp (
    id int PRIMARY KEY,
    name text,
    year text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

cqlsh:newdb>
cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (1, 'Krishna', '2017');
cqlsh:newdb>
cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (2, 'Chandra', '2017');
cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (3, 'Prajapati', '2017');
cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (4, 'Anula', '2018');
cqlsh:newdb>
cqlsh:newdb> SELECT * FROM emp;

 id | name      | year
----+-----------+------
  1 |   Krishna | 2017
  2 |   Chandra | 2017
  4 |     Anula | 2018
  3 | Prajapati | 2017

(4 rows)
cqlsh:newdb> SELECT * FROM emp WHERE id=3;

 id | name      | year
----+-----------+------
  3 | Prajapati | 2017

(1 rows)
cqlsh:newdb> SELECT id, name FROM emp;

 id | name
----+-----------
  1 |   Krishna
  2 |   Chandra
  4 |     Anula
  3 | Prajapati

(4 rows)
cqlsh:newdb> SELECT id, name FROM emp;

 id | name
----+-----------
  1 |   Krishna
  2 |   Chandra
  4 |     Anula
  3 | Prajapati

(4 rows)
cqlsh:newdb> DESCRIBE keyspaces;

system_schema  system_auth  system  system_distributed  newdb  system_traces

cqlsh:newdb> SELECT * FROM emp LIMIT 2;

 id | name    | year
----+---------+------
  1 | Krishna | 2017
  2 | Chandra | 2017

(2 rows)
cqlsh:newdb> SELECT * FROM emp WHERE name = 'Krishna' ALLOW FILTERING;

 id | name    | year
----+---------+------
  1 | Krishna | 2017

(1 rows)
cqlsh:newdb>

Data Export

Here’s a way to export data from cassandra table to csv file.

cqlsh:newdb> select * from emp;

 id | name      | year
----+-----------+------
  5 |    Elisha | 2418
 10 |     Neoan | 8918
 11 |    Syniyo | 2918
  1 |   Jekyano | 2718
  8 |    Amchro | 5318
  2 |   Chandra | 2017
  4 |     Anula | 2018
  7 |     Namka | 2318
  9 |    Lickho | 5918
  3 | Prajapati | 2017

(10 rows)
cqlsh:newdb> COPY emp(id, name, year) TO 'emp.csv';
Using 2 child processes

Starting copy of newdb.emp with columns [id, name, year].
Processed: 10 rows; Rate:      55 rows/s; Avg. rate:      55 rows/s
10 rows exported to 1 files in 0.186 seconds.
cqlsh:newdb> exit
krishna@Ubuntu16:~$ cat emp.csv
9,Lickho,5918
5,Elisha,2418
8,Amchro,5318
1,Jekyano,2718
7,Namka,2318
2,Chandra,2017
3,Prajapati,2017
4,Anula,2018
11,Syniyo,2918
10,Neoan,8918
krishna@Ubuntu16:~$

Data Import

Here’s a way to import data from csv file to cassandra table.

cqlsh:newdb> select * from emp;

 id | name      | year
----+-----------+------
  5 |    Elisha | 2418
 10 |     Neoan | 8918
 11 |    Syniyo | 2918
  1 |   Jekyano | 2718
  8 |    Amchro | 5318
  2 |   Chandra | 2017
  4 |     Anula | 2018
  7 |     Namka | 2318
  9 |    Lickho | 5918
  3 | Prajapati | 2017

(10 rows)
cqlsh:newdb> TRUNCATE newdb.emp;
cqlsh:newdb> select * from emp;

 id | name | year
----+------+------

(0 rows)
cqlsh:newdb> COPY emp(id, name, year) FROM 'emp.csv';
Using 2 child processes

Starting copy of newdb.emp with columns [id, name, year].
Processed: 10 rows; Rate:      18 rows/s; Avg. rate:      26 rows/s
10 rows imported from 1 files in 0.380 seconds (0 skipped).
cqlsh:newdb>
cqlsh:newdb> SELECT * FROM emp;

 id | name      | year
----+-----------+------
  5 |    Elisha | 2418
 10 |     Neoan | 8918
 11 |    Syniyo | 2918
  1 |   Jekyano | 2718
  8 |    Amchro | 5318
  2 |   Chandra | 2017
  4 |     Anula | 2018
  7 |     Namka | 2318
  9 |    Lickho | 5918
  3 | Prajapati | 2017

(10 rows)
cqlsh:newdb>

Conclusion

We are done with setup of Cassandra Single node Cluster.
Happy Reading …

Leave a Reply

Your email address will not be published. Required fields are marked *