Apache Cassandra is a free and open-source distributed NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.
Cassandra supports linear scalability by adding a new machine to it with no downtime or interruption to applications, also increases Read and Write throughput of the Cassandra.
Every Cassandra node in the cluster will have the same role. Data is distributed across the cluster which means each node holds different data. Cassandra supports replication and multi-data center replication for redundancy, failover, and disaster recovery.
System Update
Update the system with the latest security patches. You can use the below command.
$ sudo apt-get update
Install Java 8
Apache Cassandra run on top of Java Virtual Machine (JVM). We’ll install Oracle JDK 8 on the system before we install Apache Cassandra. Apache Cassandra can also run on OpenJDK
Add Oracle Java PPA to your system before proceeding with the installation of Java.
$ sudo add-apt-repository -y ppa:webupd8team/java $ sudo apt-get update $ sudo apt-get -y install oracle-java8-installer $ java -version java version "1.8.0_144" Java(TM) SE Runtime Environment (build 1.8.0_144-b01) Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
Install Cassandra
We will install Cassandra using official package available on Apache Software Foundation, so add Cassandra repository to make the package available to your system.
$ echo "deb http://www.apache.org/dist/cassandra/debian 311x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list $ curl https://www.apache.org/dist/cassandra/KEYS | sudo apt-key add - $ sudo apt-key adv --keyserver pool.sks-keyservers.net --recv-key A278B781FE4B2BDA $ sudo apt-get update $ sudo apt-get install cassandra
Cassandra Service
Now Cassandra service should be up and running on your system. Check the status.
krishna@Ubuntu16:~$ sudo apt-get install cassandra Reading package lists... Done Building dependency tree Reading state information... Done The following packages were automatically installed and are no longer required: gyp javascript-common libjs-inherits libjs-jquery libjs-node-uuid libjs-underscore libssl-dev libssl-doc libuv1 libuv1-dev linux-headers-4.10.0-28 linux-headers-4.10.0-28-generic linux-image-4.10.0-28-generic linux-image-extra-4.10.0-28-generic node-abbrev node-ansi node-ansi-color-table node-archy node-async node-block-stream node-combined-stream node-cookie-jar node-delayed-stream node-forever-agent node-form-data node-fstream node-fstream-ignore node-github-url-from-git node-glob node-graceful-fs node-gyp node-inherits node-ini node-json-stringify-safe node-lockfile node-lru-cache node-mime node-minimatch node-mkdirp node-mute-stream node-node-uuid node-nopt node-normalize-package-data node-npmlog node-once node-osenv node-qs node-read node-read-package-json node-request node-retry node-rimraf node-semver node-sha node-sigmund node-slide node-tar node-tunnel-agent node-underscore node-which python-pkg-resources zlib1g-dev Use 'sudo apt autoremove' to remove them. The following additional packages will be installed: libopts25 ntp Suggested packages: cassandra-tools ntp-doc The following NEW packages will be installed: cassandra libopts25 ntp 0 upgraded, 3 newly installed, 0 to remove and 54 not upgraded. Need to get 30.0 MB of archives. After this operation, 41.0 MB of additional disk space will be used. Do you want to continue? [Y/n] y Get:2 http://in.archive.ubuntu.com/ubuntu xenial/main amd64 libopts25 amd64 1:5.18.7-3 [57.8 kB] Get:1 http://dl.bintray.com/apache/cassandra 311x/main amd64 cassandra all 3.11.0 [29.4 MB] Get:3 http://in.archive.ubuntu.com/ubuntu xenial-updates/main amd64 ntp amd64 1:4.2.8p4+dfsg-3ubuntu5.7 [518 kB] Fetched 30.0 MB in 4min 47s (105 kB/s) Selecting previously unselected package libopts25:amd64. (Reading database ... 263455 files and directories currently installed.) Preparing to unpack .../libopts25_1%3a5.18.7-3_amd64.deb ... Unpacking libopts25:amd64 (1:5.18.7-3) ... Selecting previously unselected package ntp. Preparing to unpack .../ntp_1%3a4.2.8p4+dfsg-3ubuntu5.7_amd64.deb ... Unpacking ntp (1:4.2.8p4+dfsg-3ubuntu5.7) ... Selecting previously unselected package cassandra. Preparing to unpack .../cassandra_3.11.0_all.deb ... Unpacking cassandra (3.11.0) ... Processing triggers for libc-bin (2.23-0ubuntu9) ... Processing triggers for man-db (2.7.5-1) ... Processing triggers for systemd (229-4ubuntu19) ... Processing triggers for ureadahead (0.100.0-19) ... Setting up libopts25:amd64 (1:5.18.7-3) ... Setting up ntp (1:4.2.8p4+dfsg-3ubuntu5.7) ... Setting up cassandra (3.11.0) ... Adding group `cassandra' (GID 130) ... Done. vm.max_map_count = 1048575 net.ipv4.tcp_keepalive_time = 300 update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults Processing triggers for libc-bin (2.23-0ubuntu9) ... Processing triggers for systemd (229-4ubuntu19) ... Processing triggers for ureadahead (0.100.0-19) ... krishna@Ubuntu16:~$ krishna@Ubuntu16:~$ sudo systemctl status cassandra ● cassandra.service - LSB: distributed storage system for structured data Loaded: loaded (/etc/init.d/cassandra; bad; vendor preset: enabled) Active: active (running) since Thu 2017-09-21 16:13:21 IST; 1min 12s ago Docs: man:systemd-sysv-generator(8) CGroup: /system.slice/cassandra.service └─4006 java -Xloggc:/var/log/cassandra/gc.log -ea -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=1000003 -XX: Sep 21 16:13:21 Ubuntu16 systemd[1]: Starting LSB: distributed storage system for structured data... Sep 21 16:13:21 Ubuntu16 systemd[1]: Started LSB: distributed storage system for structured data. krishna@Ubuntu16
Connecting to Cluster
Once the Cassandra service is up and running, check the status of the cluster.
krishna@Ubuntu16:~$ sudo nodetool status Datacenter: datacenter1 ======================= Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN 127.0.0.1 103.67 KiB 256 100.0% 1ccbcad7-693b-4d1b-ba95-6dda2a4e3214 rack1 krishna@Ubuntu16:~$
In the output ‘UN’ means UP and Normal.
krishna@Ubuntu16:~$ cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. cqlsh> cqlsh> cqlsh> cqlsh> cqlsh> help Documented shell commands: =========================== CAPTURE CLS COPY DESCRIBE EXPAND LOGIN SERIAL SOURCE UNICODE CLEAR CONSISTENCY DESC EXIT HELP PAGING SHOW TRACING CQL help topics: ================ AGGREGATES CREATE_KEYSPACE DROP_TRIGGER TEXT ALTER_KEYSPACE CREATE_MATERIALIZED_VIEW DROP_TYPE TIME ALTER_MATERIALIZED_VIEW CREATE_ROLE DROP_USER TIMESTAMP ALTER_TABLE CREATE_TABLE FUNCTIONS TRUNCATE ALTER_TYPE CREATE_TRIGGER GRANT TYPES ALTER_USER CREATE_TYPE INSERT UPDATE APPLY CREATE_USER INSERT_JSON USE ASCII DATE INT UUID BATCH DELETE JSON BEGIN DROP_AGGREGATE KEYWORDS BLOB DROP_COLUMNFAMILY LIST_PERMISSIONS BOOLEAN DROP_FUNCTION LIST_ROLES COUNTER DROP_INDEX LIST_USERS CREATE_AGGREGATE DROP_KEYSPACE PERMISSIONS CREATE_COLUMNFAMILY DROP_MATERIALIZED_VIEW REVOKE CREATE_FUNCTION DROP_ROLE SELECT CREATE_INDEX DROP_TABLE SELECT_JSON cqlsh>
Using Cassandra
Let’s tryout Cassandra installation by creating database and tables.
krishna@Ubuntu16:~$ cqlsh Connected to Test Cluster at 127.0.0.1:9042. [cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4] Use HELP for help. cqlsh> cqlsh> show version; [cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec 3.4.4 | Native protocol v4] cqlsh> cqlsh> show host; Connected to Test Cluster at 127.0.0.1:9042. cqlsh> cqlsh> CREATE KEYSPACE newdb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 }; cqlsh> cqlsh> use newdb; cqlsh:newdb> cqlsh:newdb> CREATE TABLE emp (id int PRIMARY KEY, name text, year text); cqlsh:newdb> cqlsh:newdb> DESC emp; CREATE TABLE newdb.emp ( id int PRIMARY KEY, name text, year text ) WITH bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99PERCENTILE'; cqlsh:newdb> cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (1, 'Krishna', '2017'); cqlsh:newdb> cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (2, 'Chandra', '2017'); cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (3, 'Prajapati', '2017'); cqlsh:newdb> INSERT INTO emp (id, name, year) VALUES (4, 'Anula', '2018'); cqlsh:newdb> cqlsh:newdb> SELECT * FROM emp; id | name | year ----+-----------+------ 1 | Krishna | 2017 2 | Chandra | 2017 4 | Anula | 2018 3 | Prajapati | 2017 (4 rows) cqlsh:newdb> SELECT * FROM emp WHERE id=3; id | name | year ----+-----------+------ 3 | Prajapati | 2017 (1 rows) cqlsh:newdb> SELECT id, name FROM emp; id | name ----+----------- 1 | Krishna 2 | Chandra 4 | Anula 3 | Prajapati (4 rows) cqlsh:newdb> SELECT id, name FROM emp; id | name ----+----------- 1 | Krishna 2 | Chandra 4 | Anula 3 | Prajapati (4 rows) cqlsh:newdb> DESCRIBE keyspaces; system_schema system_auth system system_distributed newdb system_traces cqlsh:newdb> SELECT * FROM emp LIMIT 2; id | name | year ----+---------+------ 1 | Krishna | 2017 2 | Chandra | 2017 (2 rows) cqlsh:newdb> SELECT * FROM emp WHERE name = 'Krishna' ALLOW FILTERING; id | name | year ----+---------+------ 1 | Krishna | 2017 (1 rows) cqlsh:newdb>
Data Export
Here’s a way to export data from cassandra table to csv file.
cqlsh:newdb> select * from emp; id | name | year ----+-----------+------ 5 | Elisha | 2418 10 | Neoan | 8918 11 | Syniyo | 2918 1 | Jekyano | 2718 8 | Amchro | 5318 2 | Chandra | 2017 4 | Anula | 2018 7 | Namka | 2318 9 | Lickho | 5918 3 | Prajapati | 2017 (10 rows) cqlsh:newdb> COPY emp(id, name, year) TO 'emp.csv'; Using 2 child processes Starting copy of newdb.emp with columns [id, name, year]. Processed: 10 rows; Rate: 55 rows/s; Avg. rate: 55 rows/s 10 rows exported to 1 files in 0.186 seconds. cqlsh:newdb> exit krishna@Ubuntu16:~$ cat emp.csv 9,Lickho,5918 5,Elisha,2418 8,Amchro,5318 1,Jekyano,2718 7,Namka,2318 2,Chandra,2017 3,Prajapati,2017 4,Anula,2018 11,Syniyo,2918 10,Neoan,8918 krishna@Ubuntu16:~$
Data Import
Here’s a way to import data from csv file to cassandra table.
cqlsh:newdb> select * from emp; id | name | year ----+-----------+------ 5 | Elisha | 2418 10 | Neoan | 8918 11 | Syniyo | 2918 1 | Jekyano | 2718 8 | Amchro | 5318 2 | Chandra | 2017 4 | Anula | 2018 7 | Namka | 2318 9 | Lickho | 5918 3 | Prajapati | 2017 (10 rows) cqlsh:newdb> TRUNCATE newdb.emp; cqlsh:newdb> select * from emp; id | name | year ----+------+------ (0 rows) cqlsh:newdb> COPY emp(id, name, year) FROM 'emp.csv'; Using 2 child processes Starting copy of newdb.emp with columns [id, name, year]. Processed: 10 rows; Rate: 18 rows/s; Avg. rate: 26 rows/s 10 rows imported from 1 files in 0.380 seconds (0 skipped). cqlsh:newdb> cqlsh:newdb> SELECT * FROM emp; id | name | year ----+-----------+------ 5 | Elisha | 2418 10 | Neoan | 8918 11 | Syniyo | 2918 1 | Jekyano | 2718 8 | Amchro | 5318 2 | Chandra | 2017 4 | Anula | 2018 7 | Namka | 2318 9 | Lickho | 5918 3 | Prajapati | 2017 (10 rows) cqlsh:newdb>
Conclusion
We are done with setup of Cassandra Single node Cluster.
Happy Reading …