Easy horizontal scaling for BE servers (app-core)

Hi,

i’ve thinking about this for quite a while. To illustrate the situation i’ve scratched the current deployment strategy on my AWS environment (aws-cuba-vpc-direct-be.png). Basically it’s multiple FE servers behind a load balancer. Each has two connections to the backend servers configured.

What i’m thinking of is if it is possible to make use of horizontal scaling for the backend servers. What i mean by that is that instead of giving the FE servers the direct connections to the BE servers just give them connection to one or two BE load balancers. This seems a little bit more aligned towards what is able with something like AWS elastic container service (ECS).

The problem with this is, that when the ECS cluster should start new BE servers because of load requirements or because an instance is not available anymore - it just can’t easily spin up a new instance. The reason is that the backend servers would have to get to know each other in a
fully meshed network.

My question now is, if there is a way to achieve this. I could imagine that the user sessions could be stored in the DB or in an additional external cache or something. In order to understand it would be interesting to know what exactly do the BE servers share. Can you give a little explanation on this topic?

Bye,
Mario

aws-cuba-vpc-direct-be

Hi,

so i think i totally missed the point. My assumption was that the connectionUrlList attribute is not only for FE servers to communicate to BE servers, but between BE servers as well. After going through the docs once again, it seems that the BE (middleware) servers uses a self discovery mechanisms in order to talk between the different BE server nodes.

So from this point of view i would not need to handle the connection lists between the BE servers - which is very good for my desired scenario.
So then my only question remains if it possible to put a load balancer between FE and BE servers. This would remove the need to manually maintain the connection lists from the FE servers. This would open up the above described possibility to add another BE node if required. Since a self discovery mechanism like implemented in JGroups is a good basis for more than two BE servers, i think i should work, right?

Nevertheless, can you explain a little on how JGroups communicates between nodes and especially as i saw that UDP broadcast is the default in the jgroups.xml if you have any experience and or problem with running this scenario within Docker?

Bye,
Mario

Hi Mario,
You are right, the cuba.connectionUrlList application property is designed primarily for connection of clients to middleware servers. It can be used on the middleware too if a middleware block needs to connect to another middleware block as a client, but this is not a typical use case.
Each CUBA client block contains some kind of embedded load-balancer with sticky-session behavior. The balancer is set up with the cuba.connectionUrlList application property. This is convenient for local deployments where you can easily control your hosts and IP addresses but becomes cumbersome in a distributed cloud deployment. So you can use a regular external load-balancer between clients and middleware and point cuba.connectionUrlList to it. There is one hidden complexity in this scenario: your load-balancer will redirect each call to the different middleware server, so immediately after login the next call will go to a server that might not receive the user session yet because it was created on the first server and the user session replication is asynchronous by default. In order to ensure the new user session exists on all cluster members, you should turn on synchronous replication for login. It can be done for all clients by using the cuba.syncNewUserSessionReplication application property on the middleware (see ServerConfig.getSyncNewUserSessionReplication()), or only for clients of some kind. In the latter case, these clients should pass the cuba.syncNewUserSessionReplication parameters with true value to the LoginService.login() method.

Regarding the cluster communication between middleware blocks, I should note that default JGroups config using UDP-based auto-discovery of cluster members is not suitable for cloud deployments - UDP just won’t work there. TCP-based configuration works well, but it is not convenient in the cloud because you should provide static IP addresses. So you should use one of the discovery protocols listed in the JGroups manual: Chapter 7. List of Protocols
We use S3_PING in one of our projects deployed in Amazon AWS, see the example config file below.


<config xmlns="urn:org:jgroups"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-2.12.xsd">
    <TCP bind_port="7800"
         bind_addr="<this_mw_block_host>"
         loopback="true"
         recv_buf_size="20M"
         send_buf_size="640K"
         discard_incompatible_packets="true"
         max_bundle_size="64K"
         max_bundle_timeout="30"
         enable_bundling="true"
         use_send_queues="true"
         sock_conn_timeout="300"

         timer_type="new"
         timer.min_threads="4"
         timer.max_threads="10"
         timer.keep_alive_time="3000"
         timer.queue_max_size="500"

         thread_pool.enabled="true"
         thread_pool.min_threads="1"
         thread_pool.max_threads="10"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="false"
         thread_pool.queue_max_size="100"
         thread_pool.rejection_policy="discard"

         oob_thread_pool.enabled="true"
         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="8"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="discard"/>

    <S3_PING
      location="<my_bucket>"
      access_key="<my_access_key>"
      secret_access_key="<my_secret_access_key>"
      timeout="2000"
      num_initial_members="2"
      skip_bucket_existence_check="true"
    />
    <MERGE2  min_interval="10000"
             max_interval="30000"/>
    <FD_SOCK/>
    <FD timeout="3000" max_tries="3" />
    <VERIFY_SUSPECT timeout="1500"  />
    <BARRIER />
    <pbcast.NAKACK
                   use_mcast_xmit="false"
                   retransmit_timeout="300,600,1200,2400,4800"
                   discard_delivered_msgs="true"/>
    <UNICAST timeout="300,600,1200" />
    <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
                   max_bytes="4M"/>
    <pbcast.GMS print_local_addr="true" join_timeout="3000"

                view_bundling="true"/>
    <UFC max_credits="2M"
         min_threshold="0.4"/>
    <MFC max_credits="2M"
         min_threshold="0.4"/>
    <FRAG2 frag_size="60K"  />
    <pbcast.STATE_TRANSFER/>
</config>

Finally, I would like to note that CUBA mechanisms do not use JGroups directly. Instead, they work with the ClusterManagerAPI bean, which has an implementation based on JGroups. So it is possible to completely replace JGroups with something different in an application project just by overriding the bean.

Hi Konstantin,

thank you for the explanation. Regarding the load balancer and the not received user session:
Would it help if the BE load balancer is configured such that it uses sticky sessions for the FE --> BE communication? Then this problem should not occur, right? Or is the synchronous session replication not a big deal?

Thanks for the tip with the AWS S3 ping mechanism. Since VPC doesn’t allow Broadcast at all - this is very handy. What i wounder about is that your settings differ to some degree from the one that is in the detault jgrous.xml file in the project (which i copied over here):


    <TCP bind_port="7800"
         bind_addr="${jgroups.bind_addr:192.168.11.11}"
         recv_buf_size="${tcp.recv_buf_size:5M}"
         send_buf_size="${tcp.send_buf_size:5M}"
         max_bundle_size="64K"
         sock_conn_timeout="300"

         timer_type="new3"
         timer.min_threads="4"
         timer.max_threads="10"
         timer.keep_alive_time="3000"
         timer.queue_max_size="500"

         thread_pool.min_threads="2"
         thread_pool.max_threads="8"
         thread_pool.keep_alive_time="5000"
         thread_pool.queue_enabled="true"
         thread_pool.queue_max_size="10000"
         thread_pool.rejection_policy="discard"

         oob_thread_pool.min_threads="1"
         oob_thread_pool.max_threads="8"
         oob_thread_pool.keep_alive_time="5000"
         oob_thread_pool.queue_enabled="false"
         oob_thread_pool.queue_max_size="100"
         oob_thread_pool.rejection_policy="discard"></TCP>

    <TCPPING async_discovery="true"
             initial_hosts="${jgroups.tcpping.initial_hosts:192.168.11.11[7800],192.168.11.12[7800]}"
             port_range="2"></TCPPING>
    <MERGE3 min_interval="10000"
            max_interval="30000"></MERGE3>
    <FD_SOCK></FD_SOCK>
    <FD timeout="3000"
        max_tries="3"></FD>
    <VERIFY_SUSPECT timeout="1500"></VERIFY_SUSPECT>
    <BARRIER></BARRIER>
    <pbcast.NAKACK2 use_mcast_xmit="false"
                    discard_delivered_msgs="true"></pbcast>
    <UNICAST3></UNICAST3>
    <pbcast.STABLE stability_delay="1000"
                   desired_avg_gossip="50000"
                   max_bytes="4M"></pbcast>
    <pbcast.GMS print_local_addr="true"
                join_timeout="2000"
                view_bundling="true"></pbcast>
    <MFC max_credits="2M"
         min_threshold="0.4"></MFC>
    <FRAG2 frag_size="60K"></FRAG2>
    <RSVP resend_interval="2000"
          timeout="10000"></RSVP>
    <pbcast.STATE_TRANSFER></pbcast>

Examples of this diff are: recv_buf_size, send_buf_size …

Is there a reason behind it?

Bye,
Mario

> Would it help if the BE load balancer is configured such that it uses sticky sessions for the FE --> BE communication?
It would help but I don’t know how to configure the load-balancer for sticky sessions while there are no standard HTTP sessions in this communication channel. If you manage to do this - please let us know.

> Or is the synchronous session replication not a big deal?
It’s definitely not a big deal because it is synchronous only for the first request on login.

> your settings differ to some degree from the one that is in the detault jgrous.xml file in the project
Yes, but I believe the differences in recv_buf_size, send_buf_size are not important - these are just default values from some previuos JGroups version.

> Yes, but I believe the differences in recv_buf_size, send_buf_size are not important - these are just default values from some previuos JGroups version.

You mean the values in your example are older? Or the one that comes with CUBA (in my case: 6.1.2). ok, with recv_buf_size changes i can live with, but there are some major ones like “thread_pool.queue_enabled” or the absence of “pbcast.NAKACK2” xml tag.

Yes, this config is probably older (it uses pbcast.NAKACK instead of NAKACK2), but it should work for newer versions too. As I said, I took it from the real application in production. You can play with all these protocols and parameters, but I’m sure default values will be fine for most cases.