Keys and Search restrictions in Cassandra
In this article, we are going to have a brief introduction on different keys and search restrictions in Cassandra. In one of my previous articles, I had mentioned about Cassandra’s columnar based approach. You may refer that article here .
- Start Cassandra on local
cassandra -f
2. Open the command shell for cassandra
cqlsh
3. Create a keyspace in cassandra
cqlsh> CREATE KEYSPACE sample WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 1};
4. Using a keyspace
cqlsh> USE sample;
5. Create a table in keyspace
CREATE TABLE employee(emp_id int, emp_name text, emp_city text, emp_sal varint,emp_phone varint, PRIMARY KEY(emp_id));
6. Difference between different keys in Cassandra-
Partition Key — As Cassandra is a distributed database consisting of data shared over multiple nodes. On which node the data will reside is decided by the hash of the partition key. It also uniquely identifies a row. It can be composite in nature aswell. It’s the first argument in the PRIMARY KEY declaration.
Clustering Key — Within a single node, how is the data going to be sorted is determined by the clustering key
Primary Key — It’s a combination of Partition Key and the clustering keys.
Following example will make it further clear-
PRIMARY KEY(emp_id) — Here emp_id is the partition key.
PRIMARY KEY(emp_id, emp_sal) — Here emp_id is partition key and salary is the clustering key
PRIMARY KEY((emp_id, emp_city)) — It’s a composite partition key consisting of emp_id and emp_city.
PRIMARY KEY(emp_id, emp_city, emp_sal) — Here emp_id will be partition key and emp_city, emp_sal will be clustering key and the data will be sorted first on emp_city and then on emp_sal.
PRIMARY KEY((emp_id, emp_city), emp_sal) — Here, the combination of emp_id and emp_city will be partition key and only emp_sal will be clustering key
PRIMARY KEY((emp_id, emp_city), emp_name, emp_sal) — Here the combination and emp_id and emp_city will be partition key and both emp_name and emp_sal will be clustering key.
As I said partition key is used to determine on which node the data should reside, then it’s always better to have partition key as random as possible so that data is evenly distributed across different node.
Search Restrictions in Cassandra
Cassandra is a distributed database, which is write-optimized. It’s meant for use cases where we have heavy writes and read/search operations are limited in nature-
Cassandra only allows search by partition key. So, if in the above example you want to search by emp_name then it’s not possible, Cassandra will not allow that. For such kind of situation there are work arounds
- Allow filtering
You may execute your query with “allow filtering”, but this will have a significant impact on performance of your DB as it will be a full table scan. It’s going to go through all the records and then filter the results out-
2. Secondary Index
You may create indexes on columns, but this will impact the performance of your write operations as for every write the indexes will be updated. The indexes are created locally at all nodes, so the query will be sent to all the nodes.
3. Separate table with a different partition key
One of the cleanest solution to handle such problem is to create another table with a different partition key and whenever we write, we write data in both tables. As Cassandra is extremely fast in writes, hence writing in multiple tables will not be a costly operation
CREATE TABLE employee_by_name(emp_id int, emp_name text, emp_city text, emp_sal varint,emp_phone varint, PRIMARY KEY(emp_name));
In my experience so far with Cassandra, it’s better to know your queries before hand before shifting to cassandra. Just because you have large data, that doesn’t implies you should use Cassandra.
4. Materialized view
One final approach which is same as #3 above, but here the responsibility to maintain data in multiple tables shifts from application to cassandra. From 3.0 onwards, Cassandra has started giving a feature of materialized view and cassandra maintains data into two tables. For more information on materialized view, click here .