Db changelog scripts merging with git branches and reversed date filename timestamps

mario · March 1, 2017, 12:32pm

hi,

i have a question regarding the database changelog scripts. Here’s the scenario with the following sample application:

In the VCS we have a trunk based version and a 1.0.x branch.

Both contain 10.create-db.sql which will create a table customer (with a name attribute, length 255, not mandatory).

on 2017-03-01 there is a 1.0.0 release and the branch 1.0.x gets created
on 2017-03-02 there is a change in trunk which will increase the size of the name column to 4000
on 2017-03-03 there is a change in 1.0.x which will define the name column as mandatory
on 2017-03-04 there is a 1.0.1 release with the changes in branch but not in trunk

1.0.0 and 1.0.1 are now in production.

So here is the question:

On 2017-03-10 there will be a 1.1.0 release which should contain the changes from 1.0.x and the change (2.) from trunk.

How will the database migration tool react if the filename of the change from 3. is “170303-2-updateCustomerNameMandatory.sql” and the trunk filename of change (2.) is “170302-2-updateCustomerNameLength4000.sql”

Normally from the sort by name, so it would execute the trunk file (2.) first and the branch file (3.) after that, correct? Although in my examples the changes don’t depend on each other, it could be possible that this is. In this case, what would you do?

I can think of the following approaches:

Option 1: rename the “170303-2-updateCustomerNameMandatory.sql” (3.) to a date that is before the change in the trunk and then copy it over to trunk. This would enable the SYS_DB_CHANGELOG table to see have the file from branch and after switching to trunk it would just append the trunk files (3.). But does that mean, that every change in the branch has to be executed before the first trunk change? What are potential downsides on that?

Option 2: leave the name as it is and merge it over to trunk. In trunk there will be the following files after that:


# ~/dev/projects/examples/cuba-question-changelog-with-branch/modules/core/db/update on git:master ?? [13:18:37]
? tree 
.
??? hsql
    ??? 17
        ??? 170302-2-updateCustomerNameLength4000.sql
        ??? 170303-2-updateCustomerNameMandatory.sql

2 directories, 2 files

Then in the SYS_DB_CHANGELOG there will be already a entry for the “170303-2-updateCustomerNameMandatory.sql” (3.) that will not get executed anymore. In this case, i’m not sure if it is just enough to do so and then the DbUpdater will pick up the trunk file (2.) and execute it. Im not really sure how this will look like. Is the sorting of the filename irrelevant in this case?

Im just trying to get my head around this stuff and perhaps this has nothing to do with CUBA but more with the possibility to use branches correctly and how to deal with these db changes in this cases.

Nevertheless, i would really be interested in how you do handle this kinds of situations and what would be the right decision if the trunk change has some expectations on the state of the db that the branch change will invalidate. Or would we just simply adjust the trunk db change (which should normally never be done on a changelog script), because it has never been released?

Bye
Mario

iskandarov · March 1, 2017, 4:07pm

Hi Mario,

At first, let me explain how the DB - update works.

If there is no sys_db_changelog table: it is created, all init scripts are executed and update scripts are written to changelog (just marked as executed).

When the sys_db_changelog already exists:

All DB - scripts are assembled and sorted considering their paths.
It means scripts in the “10-cuba” folder precede those in “20-reports”.
“Init” precedes “Update” and “170227-1-createChild.sql” occurs before “170227-2-createchild.sql”.
Sorted scripts are handled one by one.
The script is executed if the sys_db_changelog table has no corresponding entry. After successful execution, the script is written to changelog.

So when you update production server and deliver new scripts they will be executed anyway. Of course, “dates” will affect the execution order and you can manage it by changing names of files.
Note, if to rename an already executed script the platform tries to run it again (and fails as the DB is already in the target state).

I hope this could help.

Regards.

krivopustov · March 2, 2017, 11:26am

Hi Mario,

Unfortunately there is no simple solution to the problem of updating the database with scripts merged from different branches. There always will be a probability of conflict.

I would suggest the following approach:

After merging the branches, try to update the target database.
In case of incorrect sequence of scripts causing errors, either change script names to adjust the order, or add some conditional operators to scripts, like (in pseudo-code) “if the field does not exist, add it”. At this point you can use the update scripts for a common test or production database.
Commit your changes in scripts and notify all developers working with the branch that they must re-create their local databases from init scripts after updating from VCS. At this point all developers will have their local databases in sync with the target DB.