Some errors with JPQL/SQL

o.jeskulke · October 22, 2019, 10:56am

Hello,

I encountered a few strange behaviors using JPQL and SQL I would like to show, asking if there are bugs or if I am doing something wrong. I transferred all examples to the sec_userRole entity so you can try it in any project. For the ease of use I was executing these examples in the SQL-/JPQL-console from the Admin Tools Plugin.

Basically I want to see how many different versions of roles are there and how many users have those roles. I only want to count users that don’t have a deleted role. Usually I would write it like this (sorry about the blockquote, the preformatting doesn’t seem to work):

SELECT
s.version,
COUNT(DISTINCT CASE WHEN s.deletedBy IS NULL THEN s.user ELSE null END)
FROM sec$UserRole s
GROUP BY s.version

This gives me a syntax error which may be caused by the fact, that the generated SQL ommits s.user from the CASE-statement:

SELECT t0.VERSION,
COUNT(DISTINCT(CASE WHEN (t0.DELETED_BY IS NULL) THEN ELSE ? END))
FROM SEC_USER_ROLE t0, SEC_USER t1
WHERE ((t0.DELETE_TS IS NULL) AND ((t1.ID = t0.USER_ID) AND (t1.DELETE_TS IS NULL))) GROUP BY t0.VERSION

When I replace s.user with a literal e.g. 1 and the null with 2 then I get a (useless) result so the error may have to do with counting entities and null together(?).

I then tried to write the statement in a different way and had success with is:

SELECT
s.version,
(SELECT COUNT(DISTINCT s2.user) FROM sec$UserRole s2 WHERE s2.version=s.version AND s2.deletedBy IS NULL)
FROM sec$UserRole s
GROUP BY s.version

Here I got the result, that there are 2 users with non-deleted roles of version 0.
But then I also wanted to count the users no matter what their deletedBy was so I added just another field:

SELECT
s.version,
COUNT(s.user),
(SELECT COUNT(DISTINCT s2.user) FROM sec$UserRole s2 WHERE s2.version=s.version AND s2.deletedBy IS NULL)
FROM sec$UserRole s
GROUP BY s.version

With the result, that my subselect went to 0. So I got version=0 and total users=2 but the users of non-deleted roles are now 0!

I thought maybe there is some weird JPQL transformation going on and the resulting SQL is broken but it doesn’t seem to be and to be sure, I wrote the query in very simple form in SQL:

SELECT
t0.VERSION,
COUNT(t0.USER_ID),
(SELECT COUNT(DISTINCT(t3.USER_ID))
FROM SEC_USER_ROLE t3
WHERE t3.VERSION = t0.VERSION
AND t3.DELETED_BY IS NULL
)
FROM SEC_USER_ROLE t0
GROUP BY t0.VERSION

The result is the same, my subselect is 0 and only delivers the correct number when I ommit the other COUNT.

I tried this method in another database (Oracle) outside of CUBA and this is not a problem. So is it a bug in the HSQL database or does the SQL get transformed by CUBA or what could it be?

My worries are that I have to make quite a number of statistical work and have been spending a lot of time to find out what is working in JPQL and what not and I don’t want to spend more time avoiding obstacles than writing the queries.

Thank you very much in advance, if you need any more clarification I will provide everything that is needed.

mario · October 23, 2019, 5:31am

hi oli,

without going into the details that you provided for the different cases I just wanted to give you some information and pragmatic approach on how to solve 80% of those problems.

First question: why do you think you should use JPQL for any kind of DB query interaction? What benefits do you expect?

Generally using CUBA’s JPQL functionality there are multiple abstraction layers between the native database SQL operations that are executed against the database and the JPQL query that is expressed in application code. Those are the major abstraction layers:

The Java persistence API itself, which provides a specification (JSR 338). This compared to standard SQL (and even more against vendor SQL) is a subset of functionalities. This standard is very mature, probably one of the most mature Java specifications. However, it has to unite different database vendors under one umbrella, which sometimes causes additional friction and limitations.
the JPA reference implementation: EclipseLink, which CUBA uses. Although EclipseLink is the official reference implementation of JPA, by the fact that it is an implementation, there is probably a subset of the JPA standard (I assume the delta is comparably little though). But furthermore - once again, since it is an implementation it also contains bugs.
The CUBA JPQL parsing logic. Although CUBA is using EclipseLink under the hood, the JPQL that is passed into the dataManager API is parsing the JPQL and there is some interpretation layer. So, the JPQL is not natively passed into the EclipseLink interpreter I would assume. There is some heavy lifting going on (if you look at the JPQL related code you see even some ANTLR parser. There are some extensions to the JPQL language like the JPQL macros that are only available in CUBA, but not in the JPA specification. On the other hand there are some functionalities available in JPA, but not in CUBA JPQL (see this comparison in the docs). When you search the forum or the issue tracker, this piece sometimes causes further problems either because of incompatibilities or because of bugs.

So what this means is that all those abstraction layers will make the subset of functionalities smaller. This means that certain edge cases will cause problems. It is questionable if a CASE operation is already considered to be an edge case, but I think you get the point .

In my current project, we connect CUBA quite successfully to an Oracle DB, which has the majority of the “business logic” in the database as stored procedures and packages, views (with and without “instead of” triggers) and materialized views.

What I see working quite well is to utilize database Views in the DB wherever possible, where the majority of those kinds of statements are hidden. Then you can put a JPA entity on top of this view. With that you hide away potential problematic statements within the View and the JPQL that you ultimately will write “just” be a SELECT foo FROM bar WHERE foo = .... This is not possible for all cases though.

For other cases, you can use JooQ which is a fantastic Java library which gives you native Access to the DB with almost 100% feature set and additional compile time checks e.g.

Next alternative is to convert the query problem into a persistence problem. Instead of doing problematic operations at read time, you can duplicate data into another table and store it in a form that is optimized for reading. Then the read operations oftentimes get trivial as well.

I know that does not really help you with your concrete questions, but perhaps it helps you to think about if the approach you choose is the right one

Cheers
Mario

o.jeskulke · October 23, 2019, 10:25am

Hi Mario,

So nice to hear from you and thank you for taking the time to clarify those details. I got a better picture now of the interplay of the components involved.

I don’t think JPQL should be used for every kind of DB interaction, for the simple tasks there are methods that can load one entity etc. But I need to create new entities based on fields and aggregates of several entities. This also can be done with CUBA methods and Java-code but seems unreasonable for big data sets (e.g. like 100,000 investigation objects involved).
Normally this would be a task for SQL in the database like you mentioned you are using in your application. But I got the request to stay DB independent and that it should be easy to see which entities I am using in my code. That way, a developer who changes an entity may quickly assess if that would concern my code that is doing the statistics.

And I understand that using the abstraction layers, I am losing some proprietary functions of one special DB, but my examples are so basic and adding a column to a SELECT should not change another columns value that I got a very bad feeling commiting to that. It just feels that I can’t rely on it.

The idea of using views in the DB is most likely to be tested next by me. I wasn’t aware you can put an entity directly to a view. If you could point out a section in the manual for doing so that would be really great.

JooQ is also worth checking out although - like DB views - we get very DB dependent, unless we code only standard SQL which is not always that easy (e.g. substr vs. substring).

Gathering data in another table using trigger and doing optimization is a good idea but the requirements for the statistics (e.g. what investigation should be counted) can change quite a bit over time so the already processed data will not be valid anymore.

Thanks again for you help and sharing of insight, I appreciate that
All the best
Oli

mario · October 23, 2019, 11:36am

hi,

I think you misunderstood my question. It was not about “why do you use JPQL and not use java code?” but it was more like “why do you use JPQL instead of SQL” - but you already clarified that afterwards. DB independence - interesting topic.

Absolutely makes sense. I would not do mass data operation in Java mostly. I mean you get some benefits to it as well, like ability to test your code etc. but in general I would agree.

Ok let’s unfold that a little bit here. I understand the wish to stay DB independent in general. Reason might be less effort on maintaining dialects of different database vendors.

What this boils down to, is to use high level abstractions in order to not deal with lower level details. But this inherently means that you have to live within those abstractions - which is a good truism .

However, there is a difference between database independence and database agnostic. CUBA is not database independent (i would even argue most applications / application frameworks are not). If you look any CUBA application, it contains DDL scripts for each DB vendor for the application. If you want to support multiple DBMS, you have to create multiple DDL scripts, that contains the specific information for creating the tables, views, indexes etc. So this means, if your application should support multiple DBMS, someone had to do the effort of creating the DDL files (in case of Studio it is just generating the files, because Studio can do that out of the box).

The same is true for all other DB specifics. If you create a view, the view definition has to be created for all DBMS systems that the application should support. And those View definitions can be DB vendor specific.

So this means, that CUBA gives you the ability support multiple DBMS, but the act of supporting is up to you. The amount of support that you have to do depends on how much percent of code you write in those high level abstractions (like JPQL) and how much code lives in the underlying SQL views (e.g.).

You can think of it like this:

The effort of support is different depending on where the logic lives. However both variants are possible.
If you need more flexibility in the implementation you perhaps decide to put it under the JPA abstraction and make it DB dependent. Then it is more effort, but as you see it is more a trade off.

I think what makes sense is to differentiate the use cases. Like in your case of making a statistic - you can split this task into two subparts (just making this up here). First one: provide data in a somewhat easy to consume form. Second one: aggregate the easy to consume data, do some data crunching on it and provide the final statical value.

When you start to split those two subtasks in your head, you can now make a dedicated decision for each. Perhaps it is a good idea to make Subtask 1 underneath the JPA abstraction and Subtask 2 above it. With that you basically apply the complexity where needed, but not everywhere.

I think perhaps a dogmatic approach like: “100% in JPA, 0% in SQL” is a little too black / white. Perhaps “80% / 20 %” or “60 % / 40%” is a more healthy approach in order to fulfil the other requirements that you have as well.

Not sure if I understand correctly. So you mean it will become a save refactoring? Not sure if this is even true for plan JPQL, because it is just a String in the code.

However, I would assume that this should anyhow be solved by integration tests against the different databases, correct?

Not only functionality of specific DB dialects but the abstraction layers also take out pieces of the ANSI SQL standard (CUBA JPQL is a subset of JPQL, which is just a subset of SQL).

Right, to a certain degree you are totally correct here. This is why I said "it is questionable if a CASE statement is an edge case already…

So, there is not that much documentation of it actually (perhaps because there is not much too it). In the Studio UI for the data model, you can select the checkbox “DB View” - which basically enables that feature.

I create a couple of months ago an example which does that here: GitHub - mariodavid/cuba-example-calculated-values: CUBA example that shows to render calculated values in entity lists. It actually is only slightly about this topic, but the link describes a view based solution.

Studio also allows you to generate JPA entity classes out of existing views, which is another time saver. It allows you to create the view in the SQL editor of your choice and then use Studios Generate Model functionality to generate the JPA entity plumbing.

Right. Data duplication also has downsides. I just mentioned it because it is also oftentimes a valid solution for a lot of problems (probably more than most people think).

There you go - no problem

Have a nice day!

o.jeskulke · October 28, 2019, 8:43am

Hi Mario,

Thanks again for sharing your expertise and the huge effort and detail you put into your answer. Sorry for the late reply, I was discussing some of those issues with my colleagues first.

You are totally right, no need to do 100% pure SQL either. I think I got an idea how to split those tasks in my head and will put it to a test this week.

Well, not true refactoring like IntelliJ supports it but a dev could search the code to see if the entity he wants to change is used in some JPQL. Searching through the SQL is a bit harder because you have to look up the tables used by the entity first.

Thanks for your example project as well, also for the pointer to the automatic entity generation from a view. I will check it out later this week, all of this sounds really promising.

Have a nice week!

o.jeskulke · November 12, 2019, 9:53am

Hi Mario,

I got one additional question regarding your view based solution you posted earlier (GitHub - mariodavid/cuba-example-calculated-values: CUBA example that shows to render calculated values in entity lists)

How do I provide parameters for such a database view? For example if I want the view only return customers with a total turnover above a certain limit? If I use JPQL I can use query.setParameter(…) but if I chose to put all my complicated query content in a database view, how do I provide the parameters for it?

My query is a bit more complex and I want the view to do all the filtering in the database before returning the results, so in your example I don’t want the total turnover calculated for every customer, return all those customers and then do the filter on the CustomerTurnoverView entity. I don’t think CUBA would be able to handle this because instead of adding a WHERE clause to the view SQL, a HAVING would be necessary. And there are even more complex scenarios with multiple parameters (I will post an example below).

Thanks for any hint, I was not able to find something in the documentation!

-----EXAMPLE with two params:

WHERE nth.gueltigVon<=:zrbis
  AND (nth.gueltigBis>=:zrvon OR nth.gueltigBis IS NULL)
  AND (SELECT SUM(ta.anzahl)
         FROM vet_TierangabenNutztierhaltung ta
        WHERE ta.nutztierhaltung=nth
          AND ( ta.gueltigVon>=:zrvon AND ta.gueltigVon<=:zrbis
              OR ta.gueltigVon=(SELECT MAX(taVor.gueltigVon)
                                  FROM vet_TierangabenNutztierhaltung taVor 
                                 WHERE taVor.nutztierhaltung=nth
                                      AND taVor.katTierart=ta.katTierart 
                                      AND taVor.gueltigVon<:zrvon)
                                )
              )>0

o.jeskulke · November 12, 2019, 10:10am

I have to add, I know that normally in views you can’t use params and that you query the view and use the params on that outer query. However, I not only need those params in the WHERE but also in the SELECT (like CASE WHEN gueltigVon<:zrvon THEN 1 ELSE 0 END).
And my main reason for using DB views besides performance is that I can pass my SQL unchanged to the database. Maybe I will have to checkout JooQ that you already mentioned. But that would leave me with those complex SQLs directly in my code again…

mario · November 24, 2019, 2:59pm

Hi,

I think you very precisely described the limitations of this idea. You can use a view when the limitation of only providing a filter on the outer layer is acceptable.

Actually this was one part of the investigation of this demo app - to see if it is possible to do something like this. Unfortunately i’m not a big expert when it comes to JPQL and eclipselink. My idea was that perhaps it is possible to call a stored procedure / package, every time a select * from entity is executed - but at least natively I did not find something out of the box.

But there are some resources, that might help you:

https://www.baeldung.com/jpa-stored-procedures

Furthermore - for sure such a thing should be possible, when creating a complete own dataStore for this particular entity. I have created an example in this blog post some time ago: CUBA with DynamoDB – Road to CUBA and beyond.... In particular it is about the section: Implement a custom DynamoDB datastore.

JooQ is for sure another alternative. But since you are writing database aware java code, you somehow would have to create an abstraction that you based on the configured DB activate / deactivate. Something like

public interface VetTierangabenNutztierhaltungDataSupplier {

   List<TierangabenNutztierhaltung> fetchData(Map<String, Object> params);
}

with multiple implementations for each DB. Then you would configure the Spring application context to wire a particular implementation based on the DB choice / even do the wiring completely on your own based on a factory or something.

But for that scenario - you are most likely then back in the situation that you cannot really reuse most of the CUBA standard stuff, like the Filter component.

To sum it up - all options are not ideal and have their drawbacks I guess.

Cheers
Mario

o.jeskulke · November 25, 2019, 2:55pm

Hi,

Thanks for the info, at least it is good to know that I am not missing some big secret
I will have a look at the links you provided to get a better understanding of the whole DB related processes in JPA/CUBA. Right now the reading data part is working well but the writing/commit takes a lot of time.

Thanks again!
Oli