When handling millions of records, a single-threaded batch process might be too slow.
Parallel batch processing divides large datasets into smaller chunks and processes them concurrently. This technique significantly improves throughput and overall performance when working with Firebird in Spring Boot.
1. Why Parallel Batch Processing
Large-scale updates or inserts can quickly bottleneck your system if done sequentially. Parallel processing allows multiple batches to execute simultaneously, taking advantage of modern multi-core CPUs.
This approach is ideal for:
- Migrating or transforming data.
- Cleaning up large tables.
- Aggregating historical records.
2. Splitting Large Datasets into Chunks
Start by splitting your data into smaller chunks before processing:
public List<List<Employee>> partitionList(List<Employee> employees, int size) {
List<List<Employee>> partitions = newArrayList<>();
for (int i = 0; i < employees.size(); i += size) {
partitions.add(employees.subList(i, Math.min(i + size, employees.size())));
}
return partitions;
}
If you have 100,000 rows, you can divide them into 100 chunks of 1,000 records each.
3. Running Batches in Parallel
Use Java’s
ExecutorService to process these partitions concurrently:
@Autowired
private EmployeeRepository employeeRepository;
public void processInParallel(List<Employee> employees) throws InterruptedException {
ExecutorService executor = Executors.newFixedThreadPool(4);
List<List<Employee>> partitions = partitionList(employees, 1000);
for (List<Employee> chunk : partitions) {
executor.submit(() -> {
employeeRepository.saveAll(chunk);
employeeRepository.flush();
});
}
executor.shutdown();
executor.awaitTermination(10, TimeUnit.MINUTES);
}
This code creates four parallel workers, each handling a separate chunk of data.
Adjust the thread count based on your CPU and Firebird’s connection limits.
4. Optimizing Batch Configuration
To make batch inserts and updates more efficient, configure Hibernate batch settings in
application.properties:
spring.jpa.properties.hibernate.jdbc.batch_size=50
spring.jpa.properties.hibernate.order_inserts=true
spring.jpa.properties.hibernate.order_updates=true
spring.jpa.properties.hibernate.jdbc.fetch_size=100
Firebird performs best when inserts and updates are grouped into medium-sized batches (around 50–200 rows each).
5. Ensuring Transaction Isolation
Parallel threads must not conflict over shared data. Use
read-committed isolation for concurrent writes:
spring.datasource.hikari.data-source-properties.isolation=TRANSACTION_READ_COMMITTED
Also, ensure each thread operates on a unique dataset range (for example, by ID range or timestamp) to prevent deadlocks.
6. Using Spring Batch for Structured Parallelism
If your workload involves complex steps (reading, processing, writing), integrate
Spring Batch, which supports chunk-based processing out of the box.
Example configuration snippet:
@Bean
public Step employeeBatchStep() {
return stepBuilderFactory.get("employeeBatchStep")
.<Employee, Employee>chunk(1000)
.reader(employeeReader())
.processor(employeeProcessor())
.writer(employeeWriter())
.taskExecutor(newSimpleAsyncTaskExecutor())
.build();
}
This lets Spring handle transaction boundaries, parallel threads, and error recovery automatically.
7. Monitoring and Logging
Add logs to measure batch execution time and detect performance bottlenecks:
long start = System.currentTimeMillis();
processInParallel(employees);
System.out.println("Batch completed in " + (System.currentTimeMillis() - start) + " ms");
You can also expose metrics via Spring Boot Actuator for better observability.
image quote pre code