Many databases today strive to do whatever it takes to provide high performance, scalability, and availability, while minimizing complexity and maintenance costs. Azure Cosmos DB is a great example of a DBMS that can easily provide these qualities. This article describes its capabilities, along with limitations that may not be obvious at first glance and at the same time become a serious problem in the future if they are not taken into account when designing a system.

Further in the article, some facts will be marked with special symbols:

👍 Features that can be very useful in practice.

⚠️ Limitations to keep in mind, but if you din’t know about them and find them during development, not a problem, they can be circumvented.

⛔ Limitations, that can be a big problem if you don’t take them into account at the design stage.

Cosmos DB supports multiple APIs: SQL, Cassandra, MongoDB, Gremlin, Table. SQL here refers to the document-oriented API, which used to be called DocumentDB, and it differs significantly from the relational databases we are used to. This article is based on experience with the SQL API. I would like to draw your attention to the fact that the API type is selected when creating a storage instance, it will not work with the data of the same instance through different APIs.

The Cosmos DB document model stores data in containers, which are made up of items. Earlier in the documentation, they were called collections and documents. All scaling, bandwidth, indexing settings are specified at the container level (with a few exceptions, which we will talk about later). A database – union of containers.

And here it’s worth mentioning the first limitation right away:

⛔ The maximum size of an item in a container is 2 MB. Shouldn’t be a problem in most cases, but keep in mind that size is limited and storing media content in container elements is not a good idea.

On-the-fly scaling

Cosmos DB lets you manage performance metrics (bandwidth) for each container individually.

Throughput is measured in units of request per second (Request Units per second or abbreviated RU / sec). An approximate equivalent of a request unit is reading a 1 KB item by its identifier. For example, the required throughput of a container that allows 500 reads and 100 writes of 1K elements per second will be approximately 500 * 1 + 100 * 5 = 1000 RU. However, in the general case, it is practically impossible to accurately calculate the required bandwidth, since the complexity of requests and the size of the elements can be very different.

Any operation performed by the database has its own “price” in terms of RU. The bandwidth indicated on the container is the limit that Cosmos DB is allowed to “spend” per second. The base tracks the total “price” of requests within a second, and if the limit has already been reached, subsequent requests are not accepted for execution until the amount per second returns to a value less than the specified limit.

The lowest possible throughput for a container is 400 RU and will cost about $ 25 per month. The cost grows linearly with an increase in RU – a container with 800 RU will already cost about $ 50 per month.

👍 The throughput is indicated at the container level and can be changed at any time without any additional procedures.

👍 It is possible to specify the bandwidth at the database level. In this case, all containers will use the same “capacity”.

👍 There is also an “autopilot” mode (currently in the preview stage), which allows you to automatically increase or decrease the declared bandwidth, depending on the load on the database. Its disadvantage is that the minimum and maximum values ​​that you can configure are in a ratio of 1:10, i.e. you cannot make it so that during peak hours the metric is set at 40,000 RU, and at rest – 400 RU.

⚠️ In case of exceeding the stated bandwidth, Cosmos DB simply does not take new requests for execution and in response returns a special HTTP 429 status (“Request rate too large”).

⛔ To switch between configuring bandwidth at the base level and configuring at the container level, the database must be re-created, there is no smoother transition yet.

Partitioning for Infinite Scalability

Cosmos DB used to have two options for containers: with partitions and without. Partitionless containers were limited to a maximum bandwidth of 10,000 RU and a size of 10 gigabytes. At the moment, all containers can have partitions, so when creating them, you must specify the partition key.

Understanding the difference between logical and physical partitions is important: a logical partition is made up of elements that have the same partition key value, while a physical partition is a Cosmos DB “computational unit”, a node in its physical infrastructure that can handle multiple logical partitions. Depending on the data and distribution of records by the partition key, the system can create new physical partitions and redistribute logical partitions between them.

⛔ All bandwidth allocated to a container is evenly distributed among its physical partitions. Imagine that initially, a 1000 RU container has one partition. When its size reaches the 10 GB limit, the system will split it in two. This will mean that if any request executed on this container had 1000 RU per second at its disposal, now requests related to the same section will already be limited to a bandwidth of 500 RU.

⛔ Although the number of partitions is not limited, the maximum amount of data for one physical partition is 10 GB and the maximum bandwidth is 10,000 RU. The partition key should be chosen so that the probability of reaching these limits is minimal. It should be noted that the same logical partition cannot be shared between physical ones.

Auto-indexing

Instead of creating indexes for individual fields or combinations of them, Cosmos DB lets you customize indexing for paths within an object. A policy is a set of attributes: which paths to include in indexing, which to exclude, which types of indexes to use, etc.

Cosmos DB uses an inverted index instead of the classic B-tree, making it efficient when searching across multiple criteria. search across multiple fields.

👍 Indexing policy can be changed at any time.

👍 There are two indexing modes available: consistent and lazy. The lazy one makes the write faster, but in a consistent write and read, because in this case, the indexing happens in the background after the write operation has completed. While the index is being updated, queries may return out of date data.

👍Composite indexes can be created to speed up ORDER BY operations on multiple fields, otherwise composite indices are useless.

👍There is support for spatial indices.

⚠️ The policy created on the container by default (“index all fields”) can cause a large consumption of RU when writing elements with a large number of fields.

Tracking changes

Tracking changes in Cosmos DB is possible thanks to a mechanism called Change Feed. It returns documents that have changed since a specific point in time, in the order in which they were changed points in time. This initial moment of time can be flexibly controlled: either the moment of creation of the change stream itself, or a fixed timestamp, or the moment of container creation.

Changes can be processed asynchronously and incrementally, and the result can be distributed among one or more consumers for parallel processing. All this gives a great deal of flexibility in various integration scenarios.

⚠️ If you are using the low-level API for Change Feed, be sure to take into account the section splitting that can occur when the section reaches its size limit.

⛔ This mechanism does not track deletion, so it is better to use soft delete in order to be able to respond to the deletion of items.

Stored procedures and transactions

Cosmos DB supports stored procedures and triggers written in JavaScript.

⚠️ JavaScript I / O is completely asynchronous, and since async / await is not yet supported in Cosmos DB, there are a lot of callbacks to write, which doesn’t make the code very readable.

⚠️ There is no convenient way to return additional information to errors from stored procedures. The only way to solve is to add information to the error message, and then on the client “strip” this data from the messages.

⚠️ Transactions only work within stored procedures and functions or transactional packages available starting with the .NET SDK version 3.4 and higher.

⛔ Only documents from the same logical partition can be included in one transaction. Accordingly, write operations to different containers cannot be performed transactionally.

⛔ The execution time of one stored procedure is limited (5 seconds), and if the duration of a transaction exceeds this limit, it will be canceled. There are ways to implement “long-lived” transactions through several calls to the server, but they violate atomicity.

Executing queries

Although Microsoft calls the document API “SQL,” its language is only similar to SQL and differs a lot from what we are used to seeing in relational databases.

The API has parameters that help prevent expensive queries from being executed: EnableCrossPartitionQuery, EnableScanInQuery (see the FeedOptions class in the documentation). Cross-partition queries are received when the condition does not contain a fixed value for the partitioning key. Dataset scans can occur when the query condition contains non-indexed fields. Setting both parameters to false is a good way to combat excessive RU consumption. However, in some cases, running a query across multiple partitions can be useful.

👍 GROUP BY is already supported (added in November 2019).

👍 The aggregate functions MIN and MAX do not appear to use indexes.

⚠️ The JOIN keyword is present in the language, but it is used to “expand” nested collections. It is impossible to combine elements from different containers in one query, like ordinary SQL.

⚠️ Since Cosmos DB does not imply a strict data schema, the specified field may not exist in some items. Therefore, any query that imposes conditions on optional fields must also contain a check for their existence through the IS_DEFINED function. Checking for NULL may not be enough.

You can find more hints about queries in the cheat sheets published by Microsoft.

Other useful features

  • Lifetime of elements – can be specified either by default at the container level, or the application can set the lifetime for each element individually.
  • Five levels of integrity: bounded staleness, session (default), consistent prefix, eventual.
  • Unique keys (note that the container will have to be re-created to change the structure of the unique key).
  • Optimistic locking – each item has a special “_etag” field that is updated by the DBMS itself. When updating an element, the application code can specify a condition – to allow writing only if the value of the “_etag” field in the object passed by the application is equal to the value stored for this element in the database.
  • Geo-replication is very easy to configure for any number of regions. In a single “master” configuration, you can switch the main region at any time, or this switch will be automatic in the event of a failure at the data center level. The client from the SDK automatically reacts to these switches without requiring any additional actions from the developers to handle such situations.
  • Data encryption
  • Multi-Region Recording – Allows you to scale your recording operations. It should be noted that the presence of multiple copies of data that can be written almost always implies possible conflicts: the same element was changed in two regions at the same time, several elements with the same primary key were added in different regions, etc. Fortunately, Cosmos DB provides two ways to resolve conflicts: automatic (the last record wins) or “custom”, in which you can implement an algorithm that suits your conditions.

As you can see from the above, Azure Cosmos DB has many benefits that make it a good choice for a variety of projects. But nothing is perfect, and some restrictions can be a serious obstacle to the application of this technology: if you need transactions consisting of actions on several containers, or you need long transactions involving hundreds and thousands of objects, if the data cannot be effectively partitioned and in this case, they can go beyond 10 GB, etc. If none of the limitations mentioned in this article seems like a big issue for your project, it makes sense to consider Azure Cosmos DB for usage.