Keys and Search restrictions in Cassandra

3 min readMar 27, 2022

In this article, we are going to have a brief introduction on different keys and search restrictions in Cassandra. In one of my previous articles, I had mentioned about Cassandra’s columnar based approach. You may refer that article here .

Start Cassandra on local

cassandra -f

2. Open the command shell for cassandra

cqlsh

3. Create a keyspace in cassandra

cqlsh> CREATE KEYSPACE sample WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 1};

4. Using a keyspace

cqlsh> USE sample;

5. Create a table in keyspace

CREATE TABLE employee(emp_id int, emp_name text, emp_city text, emp_sal varint,emp_phone varint, PRIMARY KEY(emp_id));

6. Difference between different keys in Cassandra-

Partition Key — As Cassandra is a distributed database consisting of data shared over multiple nodes. On which node the data will reside is decided by the hash of the partition key. It also uniquely identifies a row. It can be composite in nature aswell. It’s the first argument in the PRIMARY KEY declaration.

Clustering Key — Within a single node, how is the data going to be sorted is determined by the clustering key

Primary Key — It’s a combination of Partition Key and the clustering keys.

Following example will make it further clear-

PRIMARY KEY(emp_id) — Here emp_id is the partition key.

PRIMARY KEY(emp_id, emp_sal) — Here emp_id is partition key and salary is the clustering key

PRIMARY KEY((emp_id, emp_city)) — It’s a composite partition key consisting of emp_id and emp_city.

PRIMARY KEY(emp_id, emp_city, emp_sal) — Here emp_id will be partition key and emp_city, emp_sal will be clustering key and the data will be sorted first on emp_city and then on emp_sal.

PRIMARY KEY((emp_id, emp_city), emp_sal) — Here, the combination of emp_id and emp_city will be partition key and only emp_sal will be clustering key

PRIMARY KEY((emp_id, emp_city), emp_name, emp_sal) — Here the combination and emp_id and emp_city will be partition key and both emp_name and emp_sal will be clustering key.

As I said partition key is used to determine on which node the data should reside, then it’s always better to have partition key as random as possible so that data is evenly distributed across different node.

Search Restrictions in Cassandra

Cassandra is a distributed database, which is write-optimized. It’s meant for use cases where we have heavy writes and read/search operations are limited in nature-

Cassandra only allows search by partition key. So, if in the above example you want to search by emp_name then it’s not possible, Cassandra will not allow that. For such kind of situation there are work arounds

Allow filtering

You may execute your query with “allow filtering”, but this will have a significant impact on performance of your DB as it will be a full table scan. It’s going to go through all the records and then filter the results out-

2. Secondary Index

You may create indexes on columns, but this will impact the performance of your write operations as for every write the indexes will be updated. The indexes are created locally at all nodes, so the query will be sent to all the nodes.

3. Separate table with a different partition key

One of the cleanest solution to handle such problem is to create another table with a different partition key and whenever we write, we write data in both tables. As Cassandra is extremely fast in writes, hence writing in multiple tables will not be a costly operation

CREATE TABLE employee_by_name(emp_id int, emp_name text, emp_city text, emp_sal varint,emp_phone varint, PRIMARY KEY(emp_name));

In my experience so far with Cassandra, it’s better to know your queries before hand before shifting to cassandra. Just because you have large data, that doesn’t implies you should use Cassandra.

4. Materialized view

One final approach which is same as #3 above, but here the responsibility to maintain data in multiple tables shifts from application to cassandra. From 3.0 onwards, Cassandra has started giving a feature of materialized view and cassandra maintains data into two tables. For more information on materialized view, click here .

Keys and Search restrictions in Cassandra

Written by Dhruv Saksena

No responses yet