Deployment Question - AWS Beanstalk Timeouts During Excel Import

distinctstephen · May 14, 2021, 8:11pm

Hello Cuba Community,

Looking for deployment guidance. We have an app that has been deployed to an AWS beanstalk. The app is deployed as a single war deployment with an AWS application load balancer in front of the beanstalk. (We currently have two instances running of that app in different availability zones, with the one application load balancer and a DB)

We use the data import add-on to upload Excel files in the app. Uploads work perfectly fine in development. My problem comes in where a file that may have only take a few seconds to import will take several minutes to do the same in AWS. (Example 300+ rows, 12 attributes, approx 7 minutes to upload) Some uploads take so long the Tomcat service stops (During the import there are unique checks/calculations/and sorting that also happens) - Other than the issues during large imports the rest of the app works perfectly fine.

I have extended the NGINX configuration in the AWS beanstalk to extend the time outs, same with the application load balancer. While this has helped on the timeouts I still have long upload times for files with many rows which I would like to speed up.

I have been reading up on the application scaling in the Cuba manual. There it suggests having separate Middleware and Web client Tomcat instances vs a Single War Deployment.
https://doc.cuba-platform.com/manual-latest/scaling.html

Would it be better to have a separate middleware server in this instance? Could this be the cause of my speed issues or is there something else I should look at?

Should I instead focus on giving more memory to the Tomcat servers (currently at 2gigs), or more memory to the Database server?

I really appreciate any additional experiences or thoughts anyone could share.

Thank you

mario · May 15, 2021, 8:20am

Hi,

Are you using the solutions directly through the UI (e.g the wizard)?

Then I would suggest trying out do develop a solution that runs in the background. In particular storing the file as a fileDescriptor first and then using the DataImport API to perform the import. For this use case I normally create an entity (ClaimImport if it’s about insurance claims e.g) that has a reference to the fileDescriptor and is created once for every import execution.

The invocation can run in a backend background thread, so you’ll not run into any timeouts. You can automatically start it when the entity is saved. If you need to provide information to the user, you can do polling on the front end for that.

Cheers
Mario

mario · May 15, 2021, 8:23am

Here you can find examples of this approach: cuba-example-using-data-import/modules/core/src/de/diedavids/ddcdit at master · mariodavid/cuba-example-using-data-import · GitHub

albudarov · May 17, 2021, 6:46am

Hi,

In the AWS there is a management console which shows some monitoring information for each object. There you can find which objects are loaded higher during the upload process: the database or the BeansTalk.
There you can determine which layer needs scaling.
I would start by upgrading db instance size (for database) or by upgrading instance size and adding more memory to Tomcat (if it’s beanstalk).