How To JDBC Batch a Big JSON File Via ForkJoinPool

Leonard Anghel
Jan 9, 2020
2 min read

Updated: May 9, 2020

Motivation:

This article is useful if you want to JDBC batch inserts concurrently. For example, we want to batch the content of a huge file. By employing concurrent batching we can do it much faster than in the sequential approach.

150+ PERSISTENCE PERFORMANCE ITEMS

THAT WILL ROCK YOUR APPS

Description:

This is a Spring Boot application that reads a relatively big JSON file (200000+ lines) and inserts its content in MySQL (it can be any other database) via batching using ForkJoinPool, JdbcTemplate and HikariCP connection pool.

Key points:

We use the MySQL, json type for our rows
First, we read the file content in memory, into a List
Next, the list is halved via the Fork/Join Framework and subtasks are created until the list size is smaller than the batch size (e.g., we set the batch size to 30):

Next, the JoiningComponent perform the JDBC batching via JdbcTemplate:

Some hints:

Set the HikariCP to provide a number of database connections that ensure that the database achives a minimum context switching (e.g., 2 * number of CPU cores).
In order to run the application you have to unzip the citylots.zip in the current location. This is the big JSON file collected from Internet and stored in the application root path as a ZIP archive.
If you want to see in log details about the batching process then simply activate the DatasourceProxyBeanPostProcessor.java. Uncomment the line, // @Component - this is needed because this application relies on DataSource-Proxy

Testing time (1000, 10000 and 25000

inserts via 1, 4 and 8 threads):

Tam Ta Da Dam! :) The complete application is available on GitHub.

If you need a deep dive into the performance recipes exposed in this repository then I am sure that you will love my book "Spring Boot Persistence Best Practices".