Many inserts of BaseDbGeneratedIdEntity are very slow

Hello,

this is either a problem report or a warning that could be put in CUBA’s documentation. We came across a big performance issues when trying to save lot’s of entities (inherited from BaseDbGeneratedIdEntity) using CUBA’s modified version of EntityManager. (This was done during importing data from a third party file). Given the profiler output, most of the time was spent during calling of equals on IdProxy class. This is being called from PersistenceImplSupport::beforeStore, where all the entities are stored in Set withoutPossibleChanges. As hashCode in IdProxy is always zero, if entity is not of HasUuid, equals need to be called on all entites whenever new entity is added to the set. That is due to this line:


this.hashCode = 0; // trade-off, see com.haulmont.cuba.primary_keys.IdentityTest.testEquality

We did not quite find what the comment means. So the warning could be - implement hashCode in your entities, if they’re not HasUuid otherwise you may experience problem with standard Sets, Maps and the framework itself.

1 Like

Hi Jan,

This is a complicated problem. In CUBA, we try to always provide an entity identity right from creating an instance in memory. It means that we need an identifier that is defined on creation and provides equals() and hashCode contracts for the whole instance lifecycle and when the instance goes through all tiers and to the database and back. This is not a problem if we can generate the identifier on creation time - whether it UUID or Long. For Identity key generated in the database, we have to use this IdProxy class with a special logic in equals(). It works for equals() contract, but hashCode() can use only the state of the instance, so it was impossible to do it consistent for the case when the key appears on saving. We need some state that is the same before and after saving to implement hashCode in the right way. That’s why we recommend implementing HasUuid for subclasses of BaseIdentityIdEntity, it is mentioned in the docs.

Thank you for pointing out to the performance issue with saving. We’ll try to fix it somehow, maybe IdentityHasMap will work there.

Hello Konstantin,

thank you for your response. I do understand the necessity of using IdProxy. What I still don’t understand is why you don’t use the hashCode of this.uuid if it is not a HasUuid instance. Looking at this:


if (entity instanceof HasUuid) {
    this.uuid = ((HasUuid) entity).getUuid();
    if (this.uuid == null)
        throw new IllegalStateException("Entity " + entity.getClass() + " implements HasUuid but its instance has no UUID assigned");
    this.hashCode = this.uuid.hashCode();
} else {
    this.uuid = UuidProvider.createUuid();
    this.hashCode = 0; // trade-off, see com.haulmont.cuba.primary_keys.IdentityTest.testEquality
}

It seems to me that if I did implement HasUuid in my entities using transient member and UuidProvider.createUuid() then I would only mimic the implementation of IdProxy whilst having this.hashCode = this.uuid.hashCode(). If there are no problems with this implementation, I don’t why not to create IdProxy that would fill hashCode like this for entities that do not implement HasUuid. On the other hand, if there are problems with this implementation (and there most probably are, if you decided not to do it, but I fail to see what problems) then I would come across these problems using this kind of implementation HasUuid.

Please answer me these two questions to help me understand:

  1. Why this.hashCode = 0 and not this.hashCode = this.uuid.hashCode()
  2. If implementing HasUuid helps, what is the right why to do it - what I showed seems to have the same problems and I would like to avoid storing two IDs for one entity.

Thank you very much!

Jan,

Let me explain the problem by reproducing the test code from IdentityTest.testEquality():


Field idField = BaseIdentityIdEntity.class.getDeclaredField("id");
idField.setAccessible(true);

// e1 & e2 are different instances with the same Id
IdentityEntity e1 = metadata.create(IdentityEntity.class);
idField.set(e1, 100L);

IdentityEntity e2 = metadata.create(IdentityEntity.class);
idField.set(e2, 100L);

// they should be equal and with the same hashCode
assertEquals(e1, e2);
assertTrue(e1.hashCode() == e2.hashCode());

This is the case when you load the same entity instance from the database twice, or you have one object returned from persist() and the second loaded from the database afterwards.

How can you ensure the last assert about hashCode if your hash function is based on a transient attribute?

Well I don’t know if I’m missing something obvious, but the answer seems simple to me:

The same way as you do it in com.haulmont.cuba.core.entity.IdProxy#equals, where it needs to work as well. IdProxy#equals fallbacks to comparison of this.uuid if this.value == null or that.value = = null and similarly for this.entity.getDbGeneratedId(). So if your IdProxy#hashCode looked like this:


public int hashCode() {
    if (this.value != null)
        return this.value.hashCode();
    if (this.entity.getDbGeneratedId() != null)
        return this.entity.getDbGeneratedId().hashCode();
    return uuid.hashCode();
}

If I am not mistaken, your test should pass now. Also no need to have field private int hashCode. I know I can implement my own proxy, I am just worried, that something will not get broken in the internals of CUBA. If I am realling missing something obvious I apologize for wasting your time.

Thank you,

I see, it is the contract of the hashCode function.

Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified

But that is not the case here, because information used in equals comparison has changed. Example (slightly changed code from the same test class):


        IdentityEntity e1 = metadata.create(IdentityEntity.class);
        // e1 & e3 are different instances with the same UUID
        IdentityEntity e3 = TestSupport.reserialize(e1);

        idField.set(e3, 100L);
        idField.set(e1, 101L);

        assertNotEquals(e1, e3); // will pass
        assertNotEquals(e1.hashCode(), e3.hashCode()); // will not pass

The whole problem is, that IdProxy#equals is not compliant with the equals contract - that’s why you cannot implement hashCode properly. Your equals uses two different implementations depending on what it is compared to. That makes it non-transitive:


        IdentityEntity e1 = metadata.create(IdentityEntity.class);
        IdentityEntity e2 = TestSupport.reserialize(e1);
        IdentityEntity e3 = TestSupport.reserialize(e1);


        idField.set(e1, 100L);
        idField.set(e3, 101L);


        assertEquals(e1, e2); // will pass
        assertEquals(e2, e3); // will pass
        assertEquals(e1, e3); // should be - transitivity, but will not pass

I’ve checked your proposed hashCode() implementation and it failed on another test case. It’s in the same test method and checks an entity instance before and after persisting it (i.e. when ID has just been generated).

The last test code that you provided I didn’t quite understand - it must fail on the first assert because e1 has ID and e2 has not. Maybe I missed something?

Actually I would be happy to fix the issue with identity and inefficient hashcode in a general way and I would appreciate your help. You can test your ideas on the test suite in the com.haulmont.cuba.primary_keys package. Let me know if you have any difficulties with running the tests.

Hello Konstantin,

I’ve done some research and some thinking dedicated to the problem. One of the interesting sources of thoughts was this forum https://forum.hibernate.org/viewtopic.php?f=1&t=928172&sid=c9bfdc41235c02dbd80f6fb63f2ec5bd

Basically the problematic invariant (contract, if you will) is the one corresponding to the part of the test you mentioned as not passing - Entity after persisting has to be equal to a “fork”(resereliazed entity) of this entity before persisting. Example (from the test):


		IdentityEntity e1 = metadata.create(IdentityEntity.class);
		// e1 & e3 are different instances with the same UUID
		IdentityEntity e3 = TestSupport.reserialize(e1);
		// one of them has an Id, other has not - this is the case when a newly committed instance returns from
		// middleware to the client
		idField.set(e3, 100L);
		assertEquals(e1, e3);
		assertTrue(e1.hashCode() == e3.hashCode());

I came to conclusion that there basically only two options here:

  1. give up this invariant - entities may not be equal in this scenario
  2. implement HasUuid and in the way, that uuid is actually persisted in the DB

The hacky solution in IdProxy where equals for a persisted entity is implemented in the way that it uses two different implementations of comparing depending on what the entity is compared to - if the second entity has only uuid compare only those, if the second has generated id as well compare those. This hacky solution is good enough for equals, but as in hashCode there is no way of knowing to what the generated hashCode is going to be compared to, this hack cannot be applied to it (they came to a similar problem in the aforementioned topic on hibernate forum). As a second problem I see that this implementation of equals is not transitive - however improbable the scenario is.

Conclusion for us - and maybe for you and the docs - use BaseUuidEntity. It implements HasUuid in the right way and does not store any other redundant primary keys. Any other implementation will either have two different primary keys or its hashCode is not gonna work well and big performance issues may be expected. Understanding all this, I think I would be happier if CUBA did not let me use anything else then BaseUuidEntity. Our next project based on CUBA is definitely going to be built with BaseUuidEntity.

Another big advantage of using BaseUuidEntityis avoiding the requirement to retrieve IDs from DB after every insert. That way inserts can be further optimized using http://www.eclipse.org/eclipselink/documentation/2.5/jpa/extensions/p_jdbc_batchwriting.htm If IDENTITY strategy is used this property is ignored as ID is retrieved after every insert.

In the meantime we are testing our version of IdProxy that breaks the invariant and everything seems to be working, but when we come to a first problem with the invariant we are going to redo our entities to make them BaseUuidEntity - another reason for this is Entity Log and similar framework features - btw. when do you think the 6.5 is going to be released?

I attached our version of IdProxy. There is another performance issue that I think I fixed in IdProxy and that is method copy creating new instances all over. That is IMHO not required and I attempted to fix that. That has nothing to do with the invariant and you can safely adopt, if you want.

Sorry for a delay and really long comment. Thank you for your time,

IdProxy2.java (4.9K)

Hi Jan,

Thank you for your research and feedback. I absolutely agree with you that BaseUuidEntity is the base class suitable for most cases. In fact, it was a single option for several years when the platform was an internal tool and we have implemented support for other key types by requests from community. While using Long/Integer keys can be justified by performance and database size concerns, I think that Identity or Composite keys should be used only when working with a legacy database. Actually I didn’t realize that someone would use Identity without a strong reason, so based on your feedback we should make the documentation more explicit in this part.

As for your suggestion about IdProxy, we’ll investigate it a bit later.

Hi Jan,

The problem with performance should be solved in the platform version 6.5.0.

Hi Rostislav,

I would be very surprised, given what Konstantin said. Could you point me to the solution in the code?

Thx,

Hi Jan,

Here you can see that we have changed HashSets to IdentityHashSets and implemented your idea about using a cached instance of IdProxy on copying.

:ticket: See the following issue in our bug tracker:

https://youtrack.cuba-platform.com/issue/PL-8762