Amazon DynamoDB

DynamoDB Basics

DynamoDB Overview

DynamoDb is an AWS managed NoSQL database. It offers the following:

  • Encryption at rest
  • Scale throughput without downtime
  • On demand backup and point in time recovery
  • Data stored in SSD and replicated accross multiple AZs.

DynamoDB Components

table –> items –> attributes

DynamoDB tables

table is a collection of data items. For example the customer table stores information about all customers. An order table stores order information. It is similar in concept to a table in a relational database.

DynamoDB items

A table consists of a collection of items. An item is a collection of attributes. Each item is unique and has its own identity. For example, in a customer table each customer item is unique and identifiable. In an orders table each item is a unique order.
An item is made unique by a unique identifier known as the primary key. An item can be thought of as a row in a traditional relational database perspective. However unlike a relational database DynamoDB tables are schemaless, so each customer can have
different sets of attributes. An item has to be 400KB or less.

DynamoDB attributes

An item consists of multiple attributes. Each attribute is a fundamental unit and does not need to be divided further. For example customer name, phone number, birthdate are all attributes of the customer item. Similarly order id, name, type are attributes
of an order item. An attribute can have a single value or can be nested (list or map). For example, an address attribute can be nested to have first line, second line, city etc. DynamoDb allows 32 levels deep nesting. An attribute can also be a set.

DynamoDb Primary Key

A primary key uniquely identifies an item. There are two types of primary keys

Parition Key

A partition key is a single attribute primary key. The primary key is used to implement a hash function that allows DynamoDb to select a partition in which the item is stored. A partition is a physical space where data is stored. The selection of partition
key is important since it should hash the values in such a way that the data is distributed evenly amongst the partitions. A partition key is known as an hash attribute.

Sort Key

Sort Key is used in addition to the primary key and together they form a composite key. Parition Key determines the partition and sortkey helps storing the data in order in that partition. A sort key is known as a range attribute.

Secondary Indexes

Used to query data with another key besides the primary key. Indexes are maintained automatically for additions, updates or deletes. The other attributes besides the key that you want copied in the index have to be specified (known as projection). DynamoDB synchronizes the data between the two in an eventually consistent manner.

Global Secondary Index (GSI)

Different partition and sort key from the table. Max allowed is 20. The data is stored in a different physical location from the main table. Secondary indexes are created on a single table.

Local Secondary Index

Same partition key but different sort key from the table. max allowed is 5.

DynamoDB Design Concepts

Read Consistency

Since the data in DynamoDB is replicated to muliple AZs, the writes have to be replicated to all AZs. There are two alternatives to replication

Eventually Consistent Reads

The response is returned as soon as data is written to one of the AZs. When you read immediately after write from a different AZs, you might miss the latest write.

Strongly Consistent Reads

The response is returned only after data is written to multiple AZs. A subsequent read would return the latest write and hence would always give you the latest data. (for example, data is written to 2 out of 3 AZs before sending a success message back to the user and while reading DynamoDB returns the latest updated from any 2 of the 3 AZs)
Not available for GSI.

Read Write Capacity

You are charged based on how much you read or write to the DynamoDB table. There are two modes for read write capacity.

On Demand Mode

Pay per request for read and write. Latency is same irrespective of demand. However capacity is restricted by throughput. i.e. there is a maximum limit to number of reads and writes, however, you can ask AWS to increase that. One Read request unit (RRU) = 1 strongly consistent read of 4KB = 2 eventually consistent read of 4KB. An 8KB read requires 2 RRU for consistent read and 1 for eventually consistent read. A Write Request Unit (WRU) = 1 write of 1KB. Throttling would occur if you use double your peak usage within 30 minutes. If peak usage increases in more than 30 minutes, that becomes the new peak. Initial peak 2000 WRU or 6000 RRU. Switching from provisioned to on demand might take time.

Provisioned Mode

Reading is defined as Read Capacity Unit (RCU) , writing as Write Capacity Unit (WCU). It has the same limits as on demand mode. Transactional reads and writes need twice the RCU and WCU.
Specify a fixed number of RCU and WCU. The number can be changed based on traffic. Use when you need to control the DynamoDB usage. If used more than the provisioned amount, it is throttled.

Provisioned Autoscaling

Define a range or RCU and WCU and define target utilization. Provision is increased and decreased dynamically based on utilization.

provisioned or on-demand or provisoned autoscaled?

On-Demand is more expensive then provisoned. With Autoscaling you need to pay for the alarm and 8 alarms are needd for each read/write so that’s an additional cost. Use this criteria to determine which one to use:

  • For development work use on demand
  • For a predictable usage in production use provisioned with reserved capacity
  • For an unpredictable usage in prodution use autoscaling

Effective Partitioning

Parition is a DynamoDB implementation detail that a user does not control. A table is stored in multiple partitions where each partition is an SSD storage. Partitions can increase if a partition cannot support the capacity or read or write or if more storage is required.
DynamoDB creates a hash value from the primary key and uses this hash value to determine which partition to write the data to or read the data from. Within that partition it uses the sort key to store items in order and return a range of items when asked for.
If a partition receives a much higher quantity of read and writes compared to the other partition then it is known as a hot partition. The total provisioned capacity is divided equally between all the paritions. If a partition receives more reads or writes then the other partitions then it might reach its throttling limit even if your total cosumption is less than the provisioned value. Use adaptive Capacity to prevent that. Adapative capacity is enabled by default.

Queries

Query operation is used in DynamoDB to find items in a table based on primary key. Queries can be run on table or Secondary Indexes. Provide a –key-condition-expression which is of the form ‘primaryKeyName = Value’. Optionally specify the sort key in the key condition expression.

Filters

Filter allows you to reduce the number of items returned by filtering out some of the items. Note that the number of RCUs needed are based on the results of the query and so you pay not for the filtered items but everything that is read before the filter is applied. Query can return only 1MB of data before filtering. filter cannot contain keys.

Scans

Read all items of a table. Use can reduce the number of attributes returned. maximum returned value is 1MB. Use pagination to return more. Use filters to filter items. Use LImits to limit the number of items.

DynamoDB Transactions

You can group multiple actions and submit them in a all-or-nothing operation. TransactWriteItems allows groups of 10 wrie actions and all have to succeed. Transactions do not cover writes to GSI, streams or backups. TransactGetItems allows 10 Get operations.

DynamoDB Accelerator (DAX)

Provides microseconds response of eventually consistent data. Use

  • Applications that need fastest read time i.e. trading apps
  • Mitigate hot keys i.e. some data is read much more than others.
  • offload reading to DAX cluster to reduce cost for RCUs or dynamic reads.
  • High repeated reads

Leave a Comment