Setup Cassandra Cluster on CentOS 7

Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.

Cassandra supports linear scalability by adding a new machine to it with no downtime or interruption to applications, also increases Read and Write throughput of the Cassandra.

Every Cassandra node in the cluster will have the same role. Data is distributed across the cluster which means each node holds different data. Cassandra supports replication and multi-data center replication for redundancy, failover, and disaster recovery.

System Update

Update the system with the latest security patches. You can use the below command.

$ yum update

Install Java 8

Apache Cassandra run on top of Java Virtual Machine (JVM). We’ll install Oracle JDK 8 on the system before we install Apache Cassandra. Apache Cassandra can also run on OpenJDK

Add the Apache repository of Cassandra to /etc/yum.repos.d/cassandra.repo, for example for the latest 3.11 version:

[cassandra]
name=Apache Cassandra
baseurl=https://www.apache.org/dist/cassandra/redhat/311x/
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://www.apache.org/dist/cassandra/KEYS

[root@db01 ~]# yum install java-1.8.0-openjdk.x86_64 [root@db01 ~]# java -version
openjdk version "1.8.0_161"
OpenJDK Runtime Environment (build 1.8.0_161-b14)
OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode)
[root@db01 ~]# python --version
Python 2.7.5

Install Cassandra

We will install Cassandra using official package available on Apache Software Foundation, so add Cassandra repository to make the package available to your system.

[root@db01 ~]# yum install cassandra 

Cassandra Service

Now Cassandra service should be up and running on your system. Check the status.

[root@db01 ~]# yum install cassandra
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
 * base: centos.excellmedia.net
 * extras: centos.excellmedia.net
 * updates: centos.excellmedia.net
Resolving Dependencies
--> Running transaction check
---> Package cassandra.noarch 0:3.11.2-1 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

==========================================================================================================
 Package                  Arch                  Version                    Repository                Size
==========================================================================================================
Installing:
 cassandra                noarch                3.11.2-1                   cassandra                 28 M

Transaction Summary
==========================================================================================================
Install  1 Package

Total download size: 28 M
Installed size: 37 M
Is this ok [y/d/N]: y
Downloading packages:
warning: /var/cache/yum/x86_64/7/cassandra/packages/cassandra-3.11.2-1.noarch.rpm: Header V4 RSA/SHA256 Si                                                    gnature, key ID fe4b2bda: NOKEY
Public key for cassandra-3.11.2-1.noarch.rpm is not installed
cassandra-3.11.2-1.noarch.rpm                                                      |  28 MB  00:01:05
Retrieving key from https://www.apache.org/dist/cassandra/KEYS
Importing GPG key 0xF2833C93:
 Userid     : "Eric Evans <eevans@sym-link.com>"
 Fingerprint: cec8 6bb4 a0ba 9d0f 9039 7cae f835 8fa2 f283 3c93
 From       : https://www.apache.org/dist/cassandra/KEYS
Is this ok [y/N]: y
Importing GPG key 0x8D77295D:
 Userid     : "Eric Evans <eevans@sym-link.com>"
 Fingerprint: c496 5ee9 e301 5d19 2ccc f2b6 f758 ce31 8d77 295d
 From       : https://www.apache.org/dist/cassandra/KEYS
Is this ok [y/N]: y
Importing GPG key 0x2B5C1B00:
 Userid     : "Sylvain Lebresne (pcmanus) <sylvain@datastax.com>"
 Fingerprint: 5aed 1bf3 78e9 a19d ade1 bcb3 4bd7 36a8 2b5c 1b00
 From       : https://www.apache.org/dist/cassandra/KEYS
Is this ok [y/N]: y
Importing GPG key 0x0353B12C:
 Userid     : "T Jake Luciani <jake@apache.org>"
 Fingerprint: 514a 2ad6 31a5 7a16 dd00 47ec 749d 6eec 0353 b12c
 From       : https://www.apache.org/dist/cassandra/KEYS
Is this ok [y/N]: y
Importing GPG key 0xFE4B2BDA:
 Userid     : "Michael Shuler <michael@pbandjelly.org>"
 Fingerprint: a26e 528b 271f 19b9 e5d8 e19e a278 b781 fe4b 2bda
 From       : https://www.apache.org/dist/cassandra/KEYS
Is this ok [y/N]: y
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : cassandra-3.11.2-1.noarch                                                              1/1
  Verifying  : cassandra-3.11.2-1.noarch                                                              1/1

Installed:
  cassandra.noarch 0:3.11.2-1

Complete!
[root@db01 ~]#

Connecting to Cluster

Once the Cassandra service is up and running, check the status of the cluster.

[root@db01 ~]# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens       Owns (effective)  Host ID                               Rack
UN  127.0.0.1  103.66 KiB  256          100.0%            09b3b1e0-0c66-4ce1-af8b-20394828f51f  rack1

[root@db01 ~]# cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh> exit
[root@db01 ~]# 

In the output ‘UN’ means UP and Normal.

connecting to cluster

cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>
cqlsh>
cqlsh>
cqlsh>
cqlsh> help

Documented shell commands:
===========================
CAPTURE  CLS          COPY  DESCRIBE  EXPAND  LOGIN   SERIAL  SOURCE   UNICODE
CLEAR    CONSISTENCY  DESC  EXIT      HELP    PAGING  SHOW    TRACING

CQL help topics:
================
AGGREGATES               CREATE_KEYSPACE           DROP_TRIGGER      TEXT
ALTER_KEYSPACE           CREATE_MATERIALIZED_VIEW  DROP_TYPE         TIME
ALTER_MATERIALIZED_VIEW  CREATE_ROLE               DROP_USER         TIMESTAMP
ALTER_TABLE              CREATE_TABLE              FUNCTIONS         TRUNCATE
ALTER_TYPE               CREATE_TRIGGER            GRANT             TYPES
ALTER_USER               CREATE_TYPE               INSERT            UPDATE
APPLY                    CREATE_USER               INSERT_JSON       USE
ASCII                    DATE                      INT               UUID
BATCH                    DELETE                    JSON
BEGIN                    DROP_AGGREGATE            KEYWORDS
BLOB                     DROP_COLUMNFAMILY         LIST_PERMISSIONS
BOOLEAN                  DROP_FUNCTION             LIST_ROLES
COUNTER                  DROP_INDEX                LIST_USERS
CREATE_AGGREGATE         DROP_KEYSPACE             PERMISSIONS
CREATE_COLUMNFAMILY      DROP_MATERIALIZED_VIEW    REVOKE
CREATE_FUNCTION          DROP_ROLE                 SELECT
CREATE_INDEX             DROP_TABLE                SELECT_JSON

cqlsh>

Configuring Cassandra

Most of configuration in Cassandra is done via yaml properties that can be set in cassandra.yaml. At a minimum you should consider setting the following properties:

  • cluster_name: the name of your cluster.
  • seeds: a comma separated list of the IP addresses of your cluster seeds.
  • storage_port: you don’t necessarily need to change this but make sure that there are no firewalls blocking this port.
  • listen_address: the IP address of your node, this is what allows other nodes to communicate with this node so it is important that you change it. Alternatively, you can set listen_interface to tell Cassandra which interface to use, and consecutively which address to use. Set only one, not both.
  • native_transport_port: as for storage_port, make sure this port is not blocked by firewalls as clients will communicate with Cassandra on this port.

Update the rack, dc variables in cassandra-rackdc.properties

$ [root@db01 conf]# cat cassandra-rackdc.properties
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# These properties are used with GossipingPropertyFileSnitch and will
# indicate the rack and dc for this node
#dc=dc1
#rack=rack1

dc=DC1
rack=RACK1
prefer_local=true

# Add a suffix to a datacenter name. Used by the Ec2Snitch and Ec2MultiRegionSnitch
# to append a string to the EC2 region name.
#dc_suffix=

# Uncomment the following line to make this snitch prefer the internal ip when possible, as the Ec2MultiRe                                                    gionSnitch does.
# prefer_local=true
[root@db01 conf]#

[root@db01 conf]# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID                               Rack
UN  192.168.19.3  192.94 KiB  256          67.6%             09b3b1e0-0c66-4ce1-af8b-20394828f51f  rack1
UN  192.168.19.5  231.83 KiB  256          66.9%             dfc6d68b-749e-4cb4-8185-7cb293e9f5d1  rack1
UN  192.168.19.6  232.88 KiB  256          65.5%             0e4efa8d-fd66-455a-a59e-989061cdd99e  rack1

[root@db01 conf]#

Using Cassandra

Let’s tryout Cassandra installation by creating database and tables.

krishna@Ubuntu16:~$ cqlsh
Connected to Test Cluster at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh>
cqlsh> show version;
[cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4]
cqlsh>
cqlsh> show host;
Connected to Test Cluster at 127.0.0.1:9042.
cqlsh>

cqlsh> CREATE KEYSPACE newdb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
cqlsh>
cqlsh> use newdb;
cqlsh:newdb>
cqlsh:newdb> CREATE TABLE emp (id int PRIMARY KEY, name text, year text);
cqlsh:newdb>
cqlsh:newdb> DESC emp;

CREATE TABLE newdb.emp (
    id int PRIMARY KEY,
    name text,
    year text
) WITH bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

cqlsh:newdb>
cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (1, 'Krishna', '2017');
cqlsh:newdb>
cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (2, 'Chandra', '2017');
cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (3, 'Prajapati', '2017');
cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (4, 'Anula', '2018');
cqlsh:newdb>
cqlsh:newdb> SELECT * FROM emp;

 id | name      | year
----+-----------+------
  1 |   Krishna | 2017
  2 |   Chandra | 2017
  4 |     Anula | 2018
  3 | Prajapati | 2017

(4 rows)
cqlsh:newdb> SELECT * FROM emp WHERE id=3;

 id | name      | year
----+-----------+------
  3 | Prajapati | 2017

(1 rows)
cqlsh:newdb> SELECT id, name FROM emp;

 id | name
----+-----------
  1 |   Krishna
  2 |   Chandra
  4 |     Anula
  3 | Prajapati

(4 rows)
cqlsh:newdb> SELECT id, name FROM emp;

 id | name
----+-----------
  1 |   Krishna
  2 |   Chandra
  4 |     Anula
  3 | Prajapati

(4 rows)
cqlsh:newdb> DESCRIBE keyspaces;

system_schema  system_auth  system  system_distributed  newdb  system_traces

cqlsh:newdb> SELECT * FROM emp LIMIT 2;

 id | name    | year
----+---------+------
  1 | Krishna | 2017
  2 | Chandra | 2017

(2 rows)
cqlsh:newdb> SELECT * FROM emp WHERE name = 'Krishna' ALLOW FILTERING;

 id | name    | year
----+---------+------
  1 | Krishna | 2017

(1 rows)
cqlsh:newdb>

Data Export

Here’s a way to export data from cassandra table to csv file.

cqlsh:newdb> select * from emp;

 id | name      | year
----+-----------+------
  5 |    Elisha | 2418
 10 |     Neoan | 8918
 11 |    Syniyo | 2918
  1 |   Jekyano | 2718
  8 |    Amchro | 5318
  2 |   Chandra | 2017
  4 |     Anula | 2018
  7 |     Namka | 2318
  9 |    Lickho | 5918
  3 | Prajapati | 2017

(10 rows)
cqlsh:newdb> COPY emp(id, name, year) TO 'emp.csv';
Using 2 child processes

Starting copy of newdb.emp with columns [id, name, year].
Processed: 10 rows; Rate:      55 rows/s; Avg. rate:      55 rows/s
10 rows exported to 1 files in 0.186 seconds.
cqlsh:newdb> exit
krishna@Ubuntu16:~$ cat emp.csv
9,Lickho,5918
5,Elisha,2418
8,Amchro,5318
1,Jekyano,2718
7,Namka,2318
2,Chandra,2017
3,Prajapati,2017
4,Anula,2018
11,Syniyo,2918
10,Neoan,8918
krishna@Ubuntu16:~$

Data Import

Here’s a way to import data from csv file to cassandra table.

cqlsh:newdb> select * from emp;

 id | name      | year
----+-----------+------
  5 |    Elisha | 2418
 10 |     Neoan | 8918
 11 |    Syniyo | 2918
  1 |   Jekyano | 2718
  8 |    Amchro | 5318
  2 |   Chandra | 2017
  4 |     Anula | 2018
  7 |     Namka | 2318
  9 |    Lickho | 5918
  3 | Prajapati | 2017

(10 rows)
cqlsh:newdb> TRUNCATE newdb.emp;
cqlsh:newdb> select * from emp;

 id | name | year
----+------+------

(0 rows)
cqlsh:newdb> COPY emp(id, name, year) FROM 'emp.csv';
Using 2 child processes

Starting copy of newdb.emp with columns [id, name, year].
Processed: 10 rows; Rate:      18 rows/s; Avg. rate:      26 rows/s
10 rows imported from 1 files in 0.380 seconds (0 skipped).
cqlsh:newdb>
cqlsh:newdb> SELECT * FROM emp;

 id | name      | year
----+-----------+------
  5 |    Elisha | 2418
 10 |     Neoan | 8918
 11 |    Syniyo | 2918
  1 |   Jekyano | 2718
  8 |    Amchro | 5318
  2 |   Chandra | 2017
  4 |     Anula | 2018
  7 |     Namka | 2318
  9 |    Lickho | 5918
  3 | Prajapati | 2017

(10 rows)
cqlsh:newdb>

Conclusion

We are done with setup of Cassandra Single node Cluster.
Happy Reading …

Leave a Reply

Your email address will not be published. Required fields are marked *