It happens that the application is initially developed for installation only on the client side. You can call such an application boxed or software as a product. A client buys a box and deploys the application on their servers (there are many examples of such applications).

 

But over time, the developer company may think that it would be nice to place the application in the cloud to be rented (software as a service). This deployment method has advantages for both customers and the developer company. Customers can quickly get a working system and not worry about deployment and administration. Renting an application does not require large one-time investments.

 

And the developer company will receive new customers, as well as new tasks: deploying the application in the cloud, administering, updating to new versions, migrating data during updating, data backup, monitoring speed and errors, fixing problems if they occur.

Why should the application in the cloud be multi-tenant?

To place an application in the cloud, it is not necessary to make it multi-tenant. But then there will be the following problem: each client will have to deploy a dedicated stand in the cloud with the leased application, and this is already costly, both in terms of the consumption of resources of the cloud stand and in terms of administration. It is more profitable to implement multi-tenancy in the application so that one instance can serve several clients (organizations).

If the application pulls 1000 simultaneously working users, it is advantageous to group clients (organizations) so that in total they give the desired load of 1000 users per application instance. And then there will be the most optimal consumption of cloud resources.

Assume that the application is rented by an organization for 20 users (employees of the organization). Then you need to group 50 of these organizations in order to reach the right load. It is important to isolate organizations from each other. An organization rents an application, lets only its employees go there, stores only its data, and does not see that other organizations are also served by the same application.

Implementing multi-tenancy does not mean that the application can no longer be deployed locally on the organization’s server. You can support two deployment methods at the same time:

  •     multi-tenant application in the cloud;
  •     single-tenant application on the client server.

Our application has come a similar way: from non-tenant to multi-tenant. And in this article I will share some approaches in developing multi-tenancy.

How to implement multi-tenancy in an application that is designed as non-tenant?

We will immediately limit the topic, only consider development and not touch on issues of testing, version release, deployment, and administration. In all these areas, the emergence of multi-tenancy should also be taken into account, but for now we’ll only talk about development.

To understand what an application is that was not tenanty and became multi-tenant, I will describe its purpose, a list of services and technologies used.

This is an ECM system (DirectumRX), which consists of 10 services (5 monolithic services and 5 microservices). All these services can be placed either on one powerful server, or on several servers.

Stack of technologies used:

.NET + SQLServer / Postgres + NHibernate + IIS + RabbitMQ + Redis

So, how to make services become multi-tenant? To do this, you need to refine the following mechanisms in services, namely, add knowledge about tenants to:

  •     data storage;
  •     ORM;
  •     data caching;
  •     Query Processing;
  •     processing queue messages;
  •     configuration;
  •     logging;
  •     performing background tasks;
  •     interaction with microservices;
  •     interaction with the message broker.

In the case of our application, these were the main places that required improvements. Let’s consider them separately.

Choosing a data storage method

When you read articles about multi-tenancy, the very first thing they sort it out is how to organize data storage. Indeed, the point is important.

For our ECM system, the main storage is a relational database, which has about 100 tables. How to organize the storage of data of many organizations so that organization A in no way sees the data of organization B?

Several schemes are known (a lot has already been written about these schemes):

  •     create your own database for each organization (for each tenant);
  •     use one database for all organizations, but for each organization make its own scheme in the database;
  •     use one database for all organizations, but add a column “tenant / organization key” in each table.

The choice of scheme is not accidental. In our case, it is enough to consider the cases of system administration to understand the preferred option. Cases are as follows:

  •     add tenant (a new organization rents a system);
  •     remove tenant (the organization refused to rent);
  •     transfer tenant to another cloud stand (redistribute the load between the cloud stands when one stand ceases to cope with the load).

Lets consider a tenant transfer case. The main task  is to transfer the organization’s data to another stand. Transfer is not difficult to do if the tenant has its own database, but it will be a headache if you mix the data of different organizations in 100 tables. Try to extract only the necessary data from the tables, transfer them to another database, where there is already data from other tenants, and so that their identifiers do not intersect.

The next case is the addition of a new tenant. The case is also not simple. Adding tenant is the need to fill out system directories, users, rights, so that you can log into the system at all. This task is best solved by cloning a reference database, which already has everything you need.

The tenant removal case is very easily solved by disabling the tenant database.

For these reasons, we chose a scheme: one tenant – one database.

ORM

We chose the data storage method, the next question: how to teach ORM to work with the selected scheme?

We use Nhibernate. It was required that Nhibernate work with several databases and periodically switch to the right one, for example, depending on the http request. If we process the request of organization A, then database A is used, and if the request is from organization B, then database B.

NHibernate has such an opportunity. You need to override the implementation of NHibernate.Connection.DriverConnectionProvider. Whenever NHibernate wants to open a database connection, it calls DriverConnectionProvider to get a connection string. Here we will replace it with the necessary one:

Data caching

Services often cache data in order to minimize database queries or not to calculate the same thing many times over. The problem is that caches must be broken down by tenants if tenant data is cached. It is not acceptable that the data cache of one organization be used when processing a request from another organization. The simplest solution is to add a tenant identifier to the key of each cache:

This problem must be remembered when creating each cache. There are a lot of caches in our services. In order not to forget to take into account the tenant identifier in each, it is better to unify work with caches. For example, make a general caching mechanism that will cache out of the box in the context of tenants.

Logging

Sooner or later, something will go wrong in the system, you will need to open the log file and begin to study it. The first question is: on behalf of which user and which organization were these actions committed?

It is convenient when in each line of the log there is a tenant identifier and a tenant username. This information becomes as necessary as, for example, the message time:

The developer should not think about which tenant to write to the log, it should be automated, hidden inside of the logging system.

We use NLog, so I will give an example on it. The easiest way to secure tenant identifier is to create NLog.LayoutRenderers.LayoutRenderer, which allows you to get tenant identifier for each log entry:

And then use this LayoutRenderer in the log template:

 

Code execution

In the examples above, I often used the following code:

It’s time to tell what that means. But first you need to understand the approach that we follow in services:

This means that at any place in the code execution you may ask: “For which tenant does this thread work?” or in another way, “What is the current tenant”?

TenantRegistry.Instance.CurrentTenant is the current tenant for the current stream. Stream and tenant can be linked in our applications. They are connected temporarily, for example, while processing an http request or while processing a message from the queue. One way to bind tenant to a stream is done like this:

A tenant tied to a stream can be obtained anywhere in the code, by contacting TenantRegistry – this is a singleton, an access point for working with tenants. Therefore, Nhibernate and NLog can access this singleton (at extension points) to find out the connection string or tenant identifier.

Background Tasks

Services often have background tasks that need to be performed on a timer. Background tasks can access the organization’s database, and then the background task must be performed for each tenant. To do this, it is not necessary to start a separate timer or thread for each tenant. It is possible to perform a task in different tenants within a single thread / timer. To do this, in the timer handler, we sort out tenants, associate each tenant with a stream and perform a background task:

Two tenants cannot be attached to the flow at the same time; if we attach one, the other is detached from the flow. We actively use this approach so as not to produce threads / timers for background tasks.

How to correlate an http request with a tenant

To process the client’s http request, you need to know from which organization he came. If the user is already authenticated, then the tenant identifier can be stored in the authentication cookie (if work with the application is performed through the browser) or in the JWT token. But what if the user has not authenticated yet? For example, an anonymous user has opened an application website and wants to authenticate. To do this, he sends a request with a login and password. In the database of which organization to look for this user?

Also, anonymous requests will be received to get the login page to the application, and it may differ for different organizations, for example, the language of localization.

To solve the problem of correlation of anonymous http-request and organization (tenant), we use subdomains for organizations. The name of the subdomain is formed by the name of the organization. Users must use the subdomain to work with the system:

The same multi-tenant web service is available at these addresses. But now the service understands from which organization an anonymous http request will come, focusing on the domain name.

The binding of the domain name and tenant is performed in the web service configuration file:

About configuring services will be described below.

Microservices. Data storage

When I said that the ECM system needs 100 tables, I talked about monolithic services. But it happens that a microservice requires a relational storage, in which 2-3 tables are needed to store its data. Ideally, each microservice has its own storage, to which only it has access. And the microservice decides how to store data in the context of tenants.

But we went the other way: we decided to store all the organization’s data in one database. If a microservice requires relational storage, then it uses the existing organization database so that the data is not scattered across different storages, but is collected in one database. Monolithic services use the same database.

Microservices work only with their tables in the database, and do not try to work with tables of a monolith or other microservice. There are pros and cons to this approach.

 

Pros:

  •     organization data in one place;
  •     easy to backup and restore organization data;
  •    in the backup, the data of all services is consistent.

Cons:

  •     one database for all services is a narrow neck during scaling (requirements to DBMS resources increase);
  •     microservices have physical access to each other’s tables, but do not use this feature.

Microservices. Knowledge of tenants is not always required

A microservice may not know that it works in a multi-tenant environment. Consider one of our services, which is engaged in converting documents to html.

What the service does:

  •     Takes a message from a RabbitMQ queue to convert a document.
  •     Retrieves the document id and tenant id from the message
  •     Download a document from a document storage service.
  •     For this generates a request in which it transmits the document identifier and tenant identifier
  •     Converts a document to html.
  •     Gives html to the service for storing conversion results.

The service does not store documents and does not store conversion results. It has indirect knowledge of tenants: the tenant identifier passes through the service in transit.

Microservices. Subdomains are not needed

I wrote above that subdomains help solve the problem of anonymous http requests:

But not all services work with anonymous requests, most require authentication already passed. Therefore, microservices that work via http often don’t care what hostName the request came from, they receive all the information about the tenant from the JWT token or authentication cookie that comes with each request.

Configuration

Services need to be configured so that they know about tenants:

  •     specify the strings for connecting to the database of tenants;
  •     bind domain names to tenants;
  •     specify the default language and time zone of the tenant.

Tenants can have many settings. For our services, we set the tenant settings in the configuration xml-files. This is not web.config and not app.config. This is a separate xml-file, the changes of which must be able to catch without rebooting the services so that adding a new tenant does not restart the entire system.

The list of settings is like this:

When a new organization rents a service, it needs to add a new tenant to the configuration file for it. And it is desirable that other organizations do not feel this. Ideally, there should not be a restart of services.

At us not all services are able to pick up a config without restarting, but the most critical services (monoliths) are able to do this.

Total

When an application becomes multi-tenant, it seems that the complexity of the development has increased dramatically. But then you get used to the multitenantness, and treat its support as a normal requirement.

It is also worth remembering that multi-tenancy is not only development, but also testing, administration, deployment, updating, backups, data migrations. But better about them another time.