Multitenancy with db-per-tenant

tudorghircoias · April 3, 2020, 10:45am

Hi guys,

First of all, I want to thank you, there have been already 3 years since we work with this platform and have done some nice projects with it.
Now we want to begin our biggest projects of all, migrating our ERP-accounting applications on cuba.
We use PostgreSQL as DBMS, and now we have multiple databases which can even be created and dropped at runtime. Actually each database corresponds to a company for which the accounting is managed.
Each database has identical structure, only the name is different. Some users have of course 2 or more companies (there can be even 150) for which they want to use our accounting software. That corresponds to 2, 150 or more databases.
There has been a long discussion with the db-per-tenant approach, explained in this example: GitHub - cuba-labs/db-per-tenant: A multi-tenant CUBA application with separate databases for tenants and discussed here: High productivity application development platform.
It implies a main database in which all the configurations are stored, and addidtional data-stores for the other databases. We tested this and it worked. With this approach there are some limitations explained by Konstantin here: GitHub - cuba-labs/db-per-tenant: A multi-tenant CUBA application with separate databases for tenants. The biggest of them is that all the configurations that are stored in the main database are common between all the other databases.
That time there was version 6.xx, when tenantId wasn’t implemented, but now, with the current version we want to combine tenantId from version 7xx with db-per-tenant. This means that after login the user can choose the database he wants to connect to, and that database should be the tenantId. Doing this we hope we can overcome the limitations explained by Konstantin. That means that all the configurations, although they are stored in the main database, are distinct for each additional database.

With this approach we hit against another issue: a user can be assigned to only one tenant. This means that a user can login to only one database (since tenantId is replaced by the database).
So, should you consider implementing another idea: similar to tenantId there should be a databaseId and treated in the same way? Do you have a better solution?

Thank you.

krivopustov · April 9, 2020, 6:19am

Hi Tudor,

Thank you very much for appreciation of our work!

Unfortunately, I cannot say anything concrete on the topic. As you know, we provide one way for building multi-tenant applications out-of-the-box with the help of Multitenancy add-on. We don’t have unlimited resources and at the moment developing other multi-tenancy solutions is not on our task list. We will certainly come back to this problem later, but I cannot promise any timeframe.

Also please don’t forget that it’s open source after all, so you could test your ideas on a fork and provide a PR.

Regards,
Konstantin

stukalov · April 9, 2020, 2:38pm

Hi @tudorghircoias,

What’s the purpose of having a multi-tenant application in your case? Do you need better level of data isolation? Why couldn’t you just have an SSO that redirects you to different installations?

Regards,
Aleksey

tudorghircoias · April 10, 2020, 6:20am

Hi Aleksey,

We need also a good level of data isolation. We have a client with 200 db’s (this is a maximum number but the db’s are quite small in size), so we want to avoid multiple tomcat instances. The resources necessary for a cloud deployment with such many tomcat instances is not feasible.
Or maybe you mean something else, or I’m missing something?

Thanks.

stukalov · April 10, 2020, 10:25am

Hi Tudor,

From my experience having a single app server with multiple dbs is a huge pain when updating the middleware after invasive changes in the database model. I mean that you will need to stop serving all your customers for quite significant time to update each db instance and this time will grow with your customer base. Since you hit, say, 500 this may become not acceptable for some of your clients.

Also it tremendously grows the mistake price. Any update of you middleware can cause an uncontrollable collapse, as one error in middleware will affect all users at once. If fact you are loosing a chance for the gradual update, when you run a new version for a limited number of your customers for some time and then apply them for the rest of them with full confidence.

I think there is something you are missing - progress in devops :). Deployment can be significantly automated and narrowed down to a simple button click. Copying tomcat is not the preferable way nowadays, docker is mostly used to make maintenance easier. There are different PaaS solutions are available on the market, e.g. OpenShift https://www.openshift.com/ that are intended to help with the operational part.

Finally, if you still want to have a single application server, I would look towards sharding solutions, like this one Database sharding explained in plain English.

Feel free to contact me via private message if you are interested in learning our experience in continuous integration and DevOps. Also we can discuss sharding solution adoption for CUBA apps.

Regards,
Aleksey

ghircoiascamil1 · April 10, 2020, 1:17pm

In reality, the example Tudor explained is a single customer (ie accounting company) with multiple companies they manage locally on their server. Every accounting company has from one to hundreds of databases that their users can access. Usually up to 20-30 users are connecting to one or more of the databases at a time. That’s the reality. Mentenance is done now automatically by scripts we distribute with our actual software (window forms based), so that wouldn’t be a problem but with cuba paradigm, if we cannot bypass that multidatabase support, is an almost unbreakable barrier we would like to overcome and didn’t find any good solution. Hope you understand what i explained.

stukalov · April 10, 2020, 2:11pm

Hi @ghircoiascamil1,

Thanks for the explanation.

Still, don’t understand why not to set up SSO (even custom one, not CUBA one) and re-direct to the right url. All apps servers work with their own db. Since you manage dbs, managing apps should take nearly no efforts. Users will not see any difference.

Another way is, as it was already mentioned, looking towards sharding solutions. These are developed to cover exactly the case you explain.

Finally, it’s not CUBA paradigm, but your custom requirements that definitely can be handled in some of the proposed ways. We do provide commercial service to meet custom requirements, let me know if you are interested.

Regards,
Aleksey

tudorghircoias · April 11, 2020, 2:09pm

Hi,

By sharding the Postgresql database you mean using foreign data wrapper and table partitioning in order to have one single logical database and a number of n physical databases that are accesible by using a distinct column (like databaseId) in each foreign table?

stukalov · April 11, 2020, 3:11pm

This is what I mean Shard (database architecture) - Wikipedia. There are quite a few solutions for Postgres. I have already shared a link to Citus Data as one of possible options.