Motivation:
Doing batch correctly, especially against MySQL, requiers several settings. Moreover, Spring Data deleteAllInBatch() and deleteInBatch() names are missleading! They don't do batch. They do bulk operations!
Description:
Batch deletes that don't involve associations in MySQL.
Note: Spring deleteAllInBatch() and deleteInBatch() don't use delete batching and don't take advantage of automatic optimstic locking mechanism to prevent lost updates (e.g., @Version is ignored). They rely on Query.executeUpdate() to trigger bulk operations. These operations are fast, but Hibernate doesn’t know which entities are removed, therefore, the Persistence Context is not updated accordingly (it's up to you to flush (before delete) and close/clear (after delete) the Persistence Context accordingly to avoid issues created by unflushed (if any) or outdated (if any) entities). The first one (deleteAllInBatch()) simply triggers a delete from entity_name statement and is very useful for deleting all records. The second one (deleteInBatch()) triggers a delete from entity_name where id=? or id=? or id=? ... statement, therefore, is prone to cause issues if the generated DELETE statement exceedes the maximum accepted size. This issue can be controlled by deleting the data in chunks, relying on IN operator, and so on. Bulk operations are faster than batching which can be achieved via the deleteAll(), deleteAll(Iterable<? extends T> entities) or delete() method. Behind the scene, the two flavors of deleteAll() relies on delete(). The delete()/deleteAll() methods rely on EntityManager.remove() therefore the Persistence Context is synchronized accordingly. Moreover, if automatic optimstic locking mechanism (to prevent lost updates) is enabled then it will be used.
Key points for regular delete batching:
For deleting in batches rely on deleteAll(), deleteAll(Iterable<? extends T> entities) or delete() method
In application.properties set spring.jpa.properties.hibernate.jdbc.batch_size:
Before Hibernate 5, we need to set in application.properties a setting for enabling batching for versioned entities during update and delete operations (entities that contains @Version for implicit optimistic locking): spring.jpa.properties.hibernate.jdbc.batch_versioned_data; starting with Hibernate 5, this setting should be true by default
Optimizations that are specific to MySQL
In application.properties set JDBC URL with rewriteBatchedStatements=true (optimization for MySQL and PostgreSQL, statements get rewritten into a single string buffer and sent in a single request)
In application.properties set JDBC URL with cachePrepStmts=true (enable caching and is useful if you decide to set prepStmtCacheSize, prepStmtCacheSqlLimit, etc as well; without this setting the cache is disabled)
in application.properties set JDBC URL with useServerPrepStmts=true (this way you switch to server-side prepared statements (may lead to signnificant performance boost))
Testing time:
Tam Ta Da Dam! :) The complete application is available on GitHub.
If you need a deep dive into the performance recipes exposed in this repository then I am sure that you will love my book "Spring Boot Persistence Best Practices".
Thanks, Leonard for the insightful article. Do you have any recommendations for Optimizations that are specific to Postgre? Looks like rewriteBatchedStatements is specific to MYSQL so wondering if any params can be configured for Postgre as well to achieve approach 3 performance