Selecting a database platform – Creating Scalable and Secure Databases

In the previous chapter, we began Section 4, Applications and Databases, by looking at web-based application components in Azure and the different architectural patterns that can be leveraged to create resilient and performant applications.

Nearly all applications use a database to hold information, and therefore this chapter looks at the data tier. We start by examining the two main database options in Azure— Azure Structured Query Language (Azure SQL) and Azure Cosmos DB. We then progress into the different pricing and service tiers available for each service and look at how these impact scalability and resilience.

Finally, we look at how to secure database services through encryption and the different areas where this can be applied.

With this in mind, this chapter covers the following four topics:

  • Selecting a database platform
  • Understanding database service tiers
  • Designing scalable databases
  • Securing databases with encryption
Technical requirements

This chapter will use the Azure portal (https://portal.azure.com) for examples throughout.

Selecting a database platform

Within Azure, there are two main types of database—Azure Cosmos DB and Azure SQL. There are also several different kinds of SQL databases—Azure SQL Database, Azure SQL Managed Instance, or even traditional Microsoft SQL running on a virtual machine (VM). In this section, however, we will consider why you might choose Cosmos DB over SQL Server.

With the choice between Cosmos DB and SQL and a choice between relational SQL databases and hierarchical NoSQL databases, we need to understand the differences between them. We will start with SQL.

Understanding SQL databases

SQL databases are built around the concept of tables, and within each table, you would have rows of data split into cells. Each cell would contain an individual piece of data within a record, and each row would be an entire record.

SQL databases have a schema that lays out your data structure in tables, columns, and rows. The schema must be defined before you can enter data into it. If you want to store a customer record containing a name and address, you first need to create a table with each of the columns defined for each data type, such as the first name, address line, city, country, and so on.

Traditionally, SQL tables are built as relational—this means that rather than having extensive records containing all the information, you split the records into multiple tables and then join them as part of a query. For example, you could store core personal information, such as first and last name, in one table but the address details in another table, with a link in each record to join the related tables together.

There are many reasons why databases used a relational structure, one being because of the computing resources available. Years ago, computer components were costly, and the technology available meant computers only had a limited amount of random-access memory (RAM), central processing units (CPUs), and storage. This meant that creating tables with lots of columns caused them to become slow and difficult to manage. Large records therefore needed to be broken down into smaller schemas and linked together.

Another useful facet of relational data is the ability to create constraints on the data you enter. Elements that need to be consistent across records—for example, a category—would be defined in a separate table, with a category ID link in the person record rather than the category itself. When users needed to create a record that uses a category, they would select the category from the category table’s values. The category would then be stored in the parent record as a link to the record’s ID in the category table.

The following screenshot shows an example of how this might look:

Figure 12.1 – Example relational table

Another benefit of a relational table is that any change to a category name would automatically be reflected across all parent records. You only need to update the data in the category table.

Relational tables can also have one-to-many relationships—this means that rather than linking a single address record to a person, you can link multiple address records. This could be used to store separate addresses for billing and shipping, for example.

Some downsides to SQL are that it requires a lot of planning upfront because you must define the schema before entering data. Any changes to that structure can then be challenging to manage, especially when tables are related and you need to change those relationships. Querying across tables can result in very complex queries, which, as well as being tricky to write, results in them running slower.

As computing power became more powerful and cheaper, some of the reasoning for using relational tables became not just redundant but also constraining. This became especially true for reporting. Because complex relational structures are slow to query but great for management, many systems started to move data between the core database and a second reporting database. The reporting database’s structure is simplified and therefore optimized for querying, but there is additional overhead in the movement of data between them. This process is often performed once or twice a day, meaning there would be a lag between the live data and the ability to report on it.

Despite these issues, SQL is still extremely popular and sometimes the best choice, depending on your requirements. However, in recent years, new technology has emerged—NoSQL.


Tags:


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *