AWS Database

This post is talking about Amazon Relational Database Service (RDS), Amazon DynamoDB, Amazon Redshift, and Amazon Aurora.

Unmanaged vs managed services

AWS solutions typically fall into one of two categories: unmanaged or managed.

Unmanaged services

Unmanaged services are typically provisioned in discrete portions as specified by the user.You must manage how the service responds to changes in load, errors, and situations where resources become unavailable. The benefit to using an unmanaged service is that you have more fine-tuned control over how your solution handles changes in load, errors, and situations where resources become unavailable.

If you move to a database that runs on an Amazon Elastic Compute Cloud (Amazon EC2) instance, you no longer need to manage the underlying hardware or handle data center operations. However, you are still responsible for patching the OS and handling all software and backup operations.

managed services

Managed services require the user to configure them. For example, you create an Amazon Simple Storage Service (Amazon S3) bucket and then set permissions for it. However, managed services typically require less configuration. Say that you have a static website that you host in a cloud-based storage solution, such as Amazon S3. The static website does not have a web server. However, because Amazon S3 is a managed solution, features such as scaling, fault-tolerance, and availability would be handled automatically and internally by Amazon S3.

If you set up your database on Amazon RDS or Amazon Aurora, you reduce your administrative responsibilities. By moving to the cloud, you can automatically scale your database, enable high availability, manage backups,and perform patching. Thus, you can focus on what really matters most—optimizing your application.

Amazon RDS

The basic building block of Amazon RDS is the database instance. A database instance is an isolated database environment that can contain multiple user-created databases. It can be accessed by using the same tools and applications that you use with a standalone database instance. Amazon RDS currently supports six databases: MySQL, Amazon Aurora, Microsoft SQL Server, PostgreSQL, MariaDB, and Oracle.

RDS in VPC

You can run an instance by using Amazon Virtual Private Cloud (Amazon VPC). When you use a virtual private cloud (VPC), you have control over your virtual networking environment.

You can select your own IP address range, create subnets, and configure routing and access control lists (ACLs). The basic functionality of Amazon RDS is the same whether or not it runs in a VPC. Usually, the database instance is isolated in a private subnet and is only made directly accessible to indicated application instances. Subnets in a VPC are associated with a single Availability Zone, so when you select the subnet, you are also choosing the Availability Zone (or physical location) for your database instance.

High availbilty with Muti-AZ deployment

One of the most powerful features of Amazon RDS is the ability to configure your database instance for high availability with a Multi-AZ deployment. After a Multi-AZ deployment is configured, Amazon RDS automatically generates a standby copy of the database instance in another Availability Zone within the same VPC. After seeding the database copy, transactions are synchronously replicated to the standby copy. Running a database instance in a Multi-AZdeployment can enhance availability during planned system maintenance, and it can help protect your databases against database instance failure and Availability Zone disruption.

Therefore, if the main database instance fails in a Multi-AZ deployment, Amazon RDS automatically brings the standby database instance online as the new main instance. The synchronous replication minimizes the potential for data loss. Because your applications reference the database by name by using the Amazon RDS Domain Name System (DNS) endpoint, you don’t need to change anything in your application code to use the standby copy for failover, which means the only change is the IP address.

RDS read replicas

Amazon RDS also supports the creation of read replicas for MySQL, MariaDB, PostgreSQL, and Amazon Aurora. Updates that are made to the source database instance are asynchronously copied to the read replica instance. You can reduce the load on your source database instance by routing read queries from your applications to the read replica. Using read replicas, you can also scale out beyond the capacity constraints of a single database instance for read-heavy database workloads. Read replicas can also be promoted to become the primary database instance, but this requires manual action because of asynchronous replication.

Read replicas can be created in a different Region than the primary database. This feature can help satisfy disaster recovery requirements or reduce latency by directing reads to a read replica that is closer to the user.

Use cases

Amazon RDS works well for web and mobile applications that need a database with high throughput, massive storage scalability, and high availability. Because Amazon RDS does not have any licensing constraints, it fits the variable usage pattern of these applications.
For small and large ecommerce businesses, Amazon RDS provides a flexible, secure, and low-cost database solution for online sales and retailing.
Mobile and online games require a database platform with high throughput and availability. Amazon RDS manages the database infrastructure, so game developers do not need to worry about provisioning, scaling, or monitoring database servers.

When to use RDS

Use Amazon RDS when your application requires:

Complex transactions or complex queries
A medium to high query or write rate – up to 30,000 IOPS (15,000 reads + 15,000 writes)
No more than a single worker node or shard
High durability

Do not use Amazon RDS when your application requires:

Massive read/write rates (for example 150,000 writes per second)
Sharding due to high data size or throughput demands
Simple GET or PUT requests and queries that a NoSQL database can handle
Or, relational database management system (RDBMS) customization

For circumstances when you should not use Amazon RDS, consider either using a NoSQL database solution (such as DynamoDB) or running your relational database engine on Amazon EC2 instances (RDBMS) instead of Amazon RDS (which will provide you with more options for customizing your database).

RDS vs EC2 vs EBS

When managing your own database software directly on Amazon EC2, you should also consider the availability of fault-tolerant and persistent storage. For this purpose, we recommend that databases running on Amazon EC2 use Amazon EBS volumes, which are similar to network-attached storage. For EC2 instances running a database, you should place all database data and logs on EBS volumes. These will remain available even if the database host fails. This configuration allows for a simple failover scenario, in which a new EC2 instance can be launched if a host fails, and the existing EBS volumes can be attached to the new instance. The database can then pick up where it left off.

EBS - traditional disk storage
EC2 - a server VM that can use EBS volumes for its disks. you can install a DB on this and administer it completely yourself.
RDS - AWS will spin up the required EC2 instances and attach EBS volumes etc and do most of the low level DB management for you

Amazon DynamoDB

DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit-millisecond latency at any scale. One of the benefits of a NoSQL database is that items in the same table can have different attributes.

DynamoDB components

A table is a collection of data.
Items are a group of attributes that is uniquely identifiable among all the other items.
Attributes are a fundamental data element, something that does not need to be broken down any further.

Primary key in DynamoDB

In the first method, the query operation takes advantage of partitioning to effectively locate items by using the primary key.
The second method is via a scan, which enables you to locate items in the table by matching conditions on non-key attributes. The second method gives you the flexibility to locate items by other attributes. However, the operation is less efficient because DynamoDB will scan through all the items in the table to find the ones that match your criteria.

Amazon Redshift

Amazon Redshift is a fast and powerful, fully managed data warehouse that is simple and cost-effective to set up, use, and scale. It enables you to run complex analytic queries against petabytes of structured data by using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel data processing. Most results come back in seconds.

Parallel processing architecture

The leader node manages communications with client programs and all communication with compute nodes. It parses and develops plans to carry out database operations—specifically, the series of steps that are needed to obtain results for complex queries. The leader node compiles code for individual elements of the plan and assigns the code to individual compute nodes. The compute nodes run the compiled code and send intermediate results back to the leader node for final aggregation.

Use cases

Enterprise data warehouse (EDW). Many customers migrate their traditional enterprise data warehouses to Amazon Redshift with the primary goal of agility. Customers can start at whatever scale they want and experiment with their data without needing to rely on complicated processes with their IT departments to procure and prepare their software.
Big data customers have one thing in common: massive amounts of data that stretch their existing systems to a breaking point. Smaller customers might not have the resources toprocure the hardware and expertise that is needed to run these systems. With Amazon Redshift, smaller customers can quickly set up and use a data warehouse at a comparatively low price point.
Software as a service (SaaS) customers can take advantage of the scalable, easy-to-manage features that Amazon Redshift provides. Some customers use the Amazon Redshift to provide analytic capabilities to their applications. Some users deploy a cluster per customer,and use tagging to simplify and manage their service level agreements (SLAs) and billing. Amazon Redshift can help you reduce hardware and software costs.

Amazon Aurora

Amazon Aurora is a MySQL- and PostgreSQL-compatible relational database that is built for the cloud. Why might you use Amazon Aurora over other options, like SQL with Amazon RDS? Most of that decision involves the high availability and resilient design that Amazon Aurora offers.

Amazon Aurora is designed to be highly available: itstores multiple copies of your data across multiple Availability Zones with continuous backups to Amazon S3. Amazon Aurora can use up to 15 read replicas can be used to reduce the possibility of losing your data. Additionally, Amazon Aurora is designed for instant crash recovery if your primary database becomes unhealthy.

After a database crash, Amazon Aurora does not need to replay the redo log from the last database checkpoint. Instead, it performs this on every read operation. This reduces the restart time after a database crash to less than 60 seconds in most cases.

With Amazon Aurora, the buffer cache is moved out of the database process, which makes it available immediately at restart. This reduces the need for you to throttle access until the cache is repopulated to avoid brownouts.

Reference: AWS Academy

Thank you for reading!