Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
Cassandra supports linear scalability by adding a new machine to it with no downtime or interruption to applications, also increases Read and Write throughput of the Cassandra.
Every Cassandra node in the cluster will have the same role. Data is distributed across the cluster which means each node holds different data. Cassandra supports replication and multi-data center replication for redundancy, failover, and disaster recovery.
System Update
Update the system with the latest security patches. You can use the below command.
$ yum update
Install Java 8
Apache Cassandra run on top of Java Virtual Machine (JVM). We’ll install Oracle JDK 8 on the system before we install Apache Cassandra. Apache Cassandra can also run on OpenJDK
Add the Apache repository of Cassandra to /etc/yum.repos.d/cassandra.repo, for example for the latest 3.11 version:
[cassandra] name=Apache Cassandra baseurl=https://www.apache.org/dist/cassandra/redhat/311x/ gpgcheck=1 repo_gpgcheck=1 gpgkey=https://www.apache.org/dist/cassandra/KEYS [root@db01 ~]# yum install java-1.8.0-openjdk.x86_64 [root@db01 ~]# java -version openjdk version "1.8.0_161" OpenJDK Runtime Environment (build 1.8.0_161-b14) OpenJDK 64-Bit Server VM (build 25.161-b14, mixed mode) [root@db01 ~]# python --version Python 2.7.5
Install Cassandra
We will install Cassandra using official package available on Apache Software Foundation, so add Cassandra repository to make the package available to your system.
[root@db01 ~]# yum install cassandra
Cassandra Service
Now Cassandra service should be up and running on your system. Check the status.
[root@db01 ~]# yum install cassandra Loaded plugins: fastestmirror Loading mirror speeds from cached hostfile * base: centos.excellmedia.net * extras: centos.excellmedia.net * updates: centos.excellmedia.net Resolving Dependencies --> Running transaction check ---> Package cassandra.noarch 0:3.11.2-1 will be installed --> Finished Dependency Resolution Dependencies Resolved ========================================================================================================== Package Arch Version Repository Size ========================================================================================================== Installing: cassandra noarch 3.11.2-1 cassandra 28 M Transaction Summary ========================================================================================================== Install 1 Package Total download size: 28 M Installed size: 37 M Is this ok [y/d/N]: y Downloading packages: warning: /var/cache/yum/x86_64/7/cassandra/packages/cassandra-3.11.2-1.noarch.rpm: Header V4 RSA/SHA256 Si gnature, key ID fe4b2bda: NOKEY Public key for cassandra-3.11.2-1.noarch.rpm is not installed cassandra-3.11.2-1.noarch.rpm | 28 MB 00:01:05 Retrieving key from https://www.apache.org/dist/cassandra/KEYS Importing GPG key 0xF2833C93: Userid : "Eric Evans <eevans@sym-link.com>" Fingerprint: cec8 6bb4 a0ba 9d0f 9039 7cae f835 8fa2 f283 3c93 From : https://www.apache.org/dist/cassandra/KEYS Is this ok [y/N]: y Importing GPG key 0x8D77295D: Userid : "Eric Evans <eevans@sym-link.com>" Fingerprint: c496 5ee9 e301 5d19 2ccc f2b6 f758 ce31 8d77 295d From : https://www.apache.org/dist/cassandra/KEYS Is this ok [y/N]: y Importing GPG key 0x2B5C1B00: Userid : "Sylvain Lebresne (pcmanus) <sylvain@datastax.com>" Fingerprint: 5aed 1bf3 78e9 a19d ade1 bcb3 4bd7 36a8 2b5c 1b00 From : https://www.apache.org/dist/cassandra/KEYS Is this ok [y/N]: y Importing GPG key 0x0353B12C: Userid : "T Jake Luciani <jake@apache.org>" Fingerprint: 514a 2ad6 31a5 7a16 dd00 47ec 749d 6eec 0353 b12c From : https://www.apache.org/dist/cassandra/KEYS Is this ok [y/N]: y Importing GPG key 0xFE4B2BDA: Userid : "Michael Shuler <michael@pbandjelly.org>" Fingerprint: a26e 528b 271f 19b9 e5d8 e19e a278 b781 fe4b 2bda From : https://www.apache.org/dist/cassandra/KEYS Is this ok [y/N]: y Running transaction check Running transaction test Transaction test succeeded Running transaction Installing : cassandra-3.11.2-1.noarch 1/1 Verifying : cassandra-3.11.2-1.noarch 1/1 Installed: cassandra.noarch 0:3.11.2-1 Complete! [root@db01 ~]#
Connecting to Cluster
Once the Cassandra service is up and running, check the status of the cluster.
[root@db01 ~]# nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 127.0.0.1 103.66 KiB 256 100.0% 09b3b1e0-0c66-4ce1-af8b-20394828f51f rack1 [root@db01 ~]# cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. cqlsh> exit [root@db01 ~]#
In the output ‘UN’ means UP and Normal.
cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. cqlsh> cqlsh> cqlsh> cqlsh> cqlsh> help Documented shell commands: =========================== CAPTURE CLS COPY DESCRIBE EXPAND LOGIN SERIAL SOURCE UNICODE CLEAR CONSISTENCY DESC EXIT HELP PAGING SHOW TRACING CQL help topics: ================ AGGREGATES CREATE_KEYSPACE DROP_TRIGGER TEXT ALTER_KEYSPACE CREATE_MATERIALIZED_VIEW DROP_TYPE TIME ALTER_MATERIALIZED_VIEW CREATE_ROLE DROP_USER TIMESTAMP ALTER_TABLE CREATE_TABLE FUNCTIONS TRUNCATE ALTER_TYPE CREATE_TRIGGER GRANT TYPES ALTER_USER CREATE_TYPE INSERT UPDATE APPLY CREATE_USER INSERT_JSON USE ASCII DATE INT UUID BATCH DELETE JSON BEGIN DROP_AGGREGATE KEYWORDS BLOB DROP_COLUMNFAMILY LIST_PERMISSIONS BOOLEAN DROP_FUNCTION LIST_ROLES COUNTER DROP_INDEX LIST_USERS CREATE_AGGREGATE DROP_KEYSPACE PERMISSIONS CREATE_COLUMNFAMILY DROP_MATERIALIZED_VIEW REVOKE CREATE_FUNCTION DROP_ROLE SELECT CREATE_INDEX DROP_TABLE SELECT_JSON cqlsh>
Configuring Cassandra
Most of configuration in Cassandra is done via yaml properties that can be set in cassandra.yaml
. At a minimum you should consider setting the following properties:
cluster_name
: the name of your cluster.seeds
: a comma separated list of the IP addresses of your cluster seeds.storage_port
: you don’t necessarily need to change this but make sure that there are no firewalls blocking this port.listen_address
: the IP address of your node, this is what allows other nodes to communicate with this node so it is important that you change it. Alternatively, you can setlisten_interface
to tell Cassandra which interface to use, and consecutively which address to use. Set only one, not both.native_transport_port
: as for storage_port, make sure this port is not blocked by firewalls as clients will communicate with Cassandra on this port.
Update the rack, dc variables in cassandra-rackdc.properties
$ [root@db01 conf]# cat cassandra-rackdc.properties # Licensed to the Apache Software Foundation (ASF) under one # or more contributor license agreements. See the NOTICE file # distributed with this work for additional information # regarding copyright ownership. The ASF licenses this file # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # These properties are used with GossipingPropertyFileSnitch and will # indicate the rack and dc for this node #dc=dc1 #rack=rack1 dc=DC1 rack=RACK1 prefer_local=true # Add a suffix to a datacenter name. Used by the Ec2Snitch and Ec2MultiRegionSnitch # to append a string to the EC2 region name. #dc_suffix= # Uncomment the following line to make this snitch prefer the internal ip when possible, as the Ec2MultiRe gionSnitch does. # prefer_local=true [root@db01 conf]# [root@db01 conf]# nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 192.168.19.3 192.94 KiB 256 67.6% 09b3b1e0-0c66-4ce1-af8b-20394828f51f rack1 UN 192.168.19.5 231.83 KiB 256 66.9% dfc6d68b-749e-4cb4-8185-7cb293e9f5d1 rack1 UN 192.168.19.6 232.88 KiB 256 65.5% 0e4efa8d-fd66-455a-a59e-989061cdd99e rack1 [root@db01 conf]#
Using Cassandra
Let’s tryout Cassandra installation by creating database and tables.
krishna@Ubuntu16:~$ cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. cqlsh> cqlsh> show version; [cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4] cqlsh> cqlsh> show host; Connected to Test Cluster at 127.0.0.1:9042. cqlsh> cqlsh> CREATE KEYSPACE newdb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; cqlsh> cqlsh> use newdb; cqlsh:newdb> cqlsh:newdb> CREATE TABLE emp (id int PRIMARY KEY, name text, year text); cqlsh:newdb> cqlsh:newdb> DESC emp; CREATE TABLE newdb.emp ( id int PRIMARY KEY, name text, year text ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE'; cqlsh:newdb> cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (1, 'Krishna', '2017'); cqlsh:newdb> cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (2, 'Chandra', '2017'); cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (3, 'Prajapati', '2017'); cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (4, 'Anula', '2018'); cqlsh:newdb> cqlsh:newdb> SELECT * FROM emp; id | name | year ----+-----------+------ 1 | Krishna | 2017 2 | Chandra | 2017 4 | Anula | 2018 3 | Prajapati | 2017 (4 rows) cqlsh:newdb> SELECT * FROM emp WHERE id=3; id | name | year ----+-----------+------ 3 | Prajapati | 2017 (1 rows) cqlsh:newdb> SELECT id, name FROM emp; id | name ----+----------- 1 | Krishna 2 | Chandra 4 | Anula 3 | Prajapati (4 rows) cqlsh:newdb> SELECT id, name FROM emp; id | name ----+----------- 1 | Krishna 2 | Chandra 4 | Anula 3 | Prajapati (4 rows) cqlsh:newdb> DESCRIBE keyspaces; system_schema system_auth system system_distributed newdb system_traces cqlsh:newdb> SELECT * FROM emp LIMIT 2; id | name | year ----+---------+------ 1 | Krishna | 2017 2 | Chandra | 2017 (2 rows) cqlsh:newdb> SELECT * FROM emp WHERE name = 'Krishna' ALLOW FILTERING; id | name | year ----+---------+------ 1 | Krishna | 2017 (1 rows) cqlsh:newdb>
Data Export
Here’s a way to export data from cassandra table to csv file.
cqlsh:newdb> select * from emp; id | name | year ----+-----------+------ 5 | Elisha | 2418 10 | Neoan | 8918 11 | Syniyo | 2918 1 | Jekyano | 2718 8 | Amchro | 5318 2 | Chandra | 2017 4 | Anula | 2018 7 | Namka | 2318 9 | Lickho | 5918 3 | Prajapati | 2017 (10 rows) cqlsh:newdb> COPY emp(id, name, year) TO 'emp.csv'; Using 2 child processes Starting copy of newdb.emp with columns [id, name, year]. Processed: 10 rows; Rate: 55 rows/s; Avg. rate: 55 rows/s 10 rows exported to 1 files in 0.186 seconds. cqlsh:newdb> exit krishna@Ubuntu16:~$ cat emp.csv 9,Lickho,5918 5,Elisha,2418 8,Amchro,5318 1,Jekyano,2718 7,Namka,2318 2,Chandra,2017 3,Prajapati,2017 4,Anula,2018 11,Syniyo,2918 10,Neoan,8918 krishna@Ubuntu16:~$
Data Import
Here’s a way to import data from csv file to cassandra table.
cqlsh:newdb> select * from emp; id | name | year ----+-----------+------ 5 | Elisha | 2418 10 | Neoan | 8918 11 | Syniyo | 2918 1 | Jekyano | 2718 8 | Amchro | 5318 2 | Chandra | 2017 4 | Anula | 2018 7 | Namka | 2318 9 | Lickho | 5918 3 | Prajapati | 2017 (10 rows) cqlsh:newdb> TRUNCATE newdb.emp; cqlsh:newdb> select * from emp; id | name | year ----+------+------ (0 rows) cqlsh:newdb> COPY emp(id, name, year) FROM 'emp.csv'; Using 2 child processes Starting copy of newdb.emp with columns [id, name, year]. Processed: 10 rows; Rate: 18 rows/s; Avg. rate: 26 rows/s 10 rows imported from 1 files in 0.380 seconds (0 skipped). cqlsh:newdb> cqlsh:newdb> SELECT * FROM emp; id | name | year ----+-----------+------ 5 | Elisha | 2418 10 | Neoan | 8918 11 | Syniyo | 2918 1 | Jekyano | 2718 8 | Amchro | 5318 2 | Chandra | 2017 4 | Anula | 2018 7 | Namka | 2318 9 | Lickho | 5918 3 | Prajapati | 2017 (10 rows) cqlsh:newdb>
Conclusion
We are done with setup of Cassandra Single node Cluster.
Happy Reading …