on spring, hibernate and batch insert-update

Most programming examples are about small amounts data processed, as it is the sufficient amount to demonstrate ideas. However one can sometimes run into cases involving insertion/update of 100k+ rows in a short amount of time and see that it's not practical with all default settings.

Batch INSERTs and UPDATEs

In most cases, every INSERT or UPDATE command to the database has a cost of sending a command through the network stack. The DBMS spends microsecond-level time to fulfil the command and the network takes millisecond-level time for the trip. JDBC API has a batch capability so that one can do more efficient than a command (and a millisecond trip) per row to be inserted.

Batch Updates with Hibernate

One can declare a batch-size attribute in the configuration and those entities begin. Not a bad idea if
  • There's no need for insertion or the key property is not identity, as there's no standard way to batch-read generated keys. Objects with keys generated by the database are inserted one by one.
  • You know that you have to clean up session cache regularly so that unneeded objects do not pile up.   It's both a memory limitation and a time limitation; hibernate does get slower when it has to manage lots of objects.
In short, one can still stick to standard hibernate session in most cases.

Using (or not using) StatelessSession

The manual mentions about it but I was unable to find an example with spring declarative transaction management. As I am typing this line it is an open issue with a partial solution attached. Maybe it is a neat solution in your case.

As none of my teammates write (or willing to maintain) manual transaction management, StatelessSession is not the tool to be used for the time being.

Spring JdbcTemplate

As there is no standard way for batch-reading generated keys, there's need for extra work in case you want to make use of the capabilities that your DBMS has, for example DB2 offers this capability for a few extra lines of coding. One can put it into JdbcTemplate such as:

public interface GeneratedKeySetter {
  /**
   * Used to batch-set generated keys.
   * @param keyValue Generated value returned by getGeneratedKeys() or the alternative
   * @param rowIndex The index of the row in the batch
   */
  public void setGeneratedKeys(int keyValue, int rowIndex);
}

public class MyJdbcTemplate extends JdbcTemplate {
  /**
   *
   */
  public supportsBatchGeneratedKeys() {
    return ps instanceof DB2PreparedStatement;
  }
  /**
   * Alternative batchUpdate() with generated keys.
   * See: SPR-1836. We had to write it as nobody is willing to.
   */
  public int[] batchUpdateWithGeneratedKeys(String sql, final BatchPreparedStatementSetter pss, final GeneratedKeySetter keySetter) {
    return execute(new SimplePreparedStatementCreatorWithGeneratedKeys(sql), new PreparedStatementCallback() {
      @Override
      public int[] doInPreparedStatement(PreparedStatement ps) throws SQLException {
        try {
          int batchSize = pss.getBatchSize();
          InterruptibleBatchPreparedStatementSetter ipss = (pss instanceof InterruptibleBatchPreparedStatementSetter ? (InterruptibleBatchPreparedStatementSetter) pss: null);
          if (supportsBatchGeneratedKeys(ps) && JdbcUtils.supportsBatchUpdates(ps.getConnection())) {
            for (int i = 0; i < batchSize; i++) {
              pss.setValues(ps, i);
              if (ipss != null && ipss.isBatchExhausted(i)) {
                break;
              }
              ps.addBatch();
            }
            int[] rowsAffectedArray = ps.executeBatch();
            ResultSet[] nativeKeyResults = ((DB2PreparedStatement)ps).getDBGeneratedKeys();
            int nAssigned = 0;
            for (int rowIndex = 0; rowIndex < nativeKeyResults.length; rowIndex++) {
              while(nativeKeyResults[rowIndex].next()) {
                int key = nativeKeyResults[rowIndex].getBigDecimal(1).intValue();
                keySetter.setGeneratedKeys(key, rowIndex);
                nAssigned++;
              }
            }
            if (nAssigned != rowsAffectedArray.length)
              throw new RuntimeException("Not all ID values are retrieved.");
            return rowsAffectedArray;
          } else {
            List rowsAffected = new ArrayList();
            for (int i = 0; i < batchSize; i++) {
              pss.setValues(ps, i);
              if (ipss != null && ipss.isBatchExhausted(i)) {
                break;
              }
              rowsAffected.add(ps.executeUpdate());
              ResultSet genKeyResult = ps.getGeneratedKeys();
              genKeyResult.next();       
              keySetter.setGeneratedKeys(genKeyResult.getInt(1), i);
            }
            int[] rowsAffectedArray = new int[rowsAffected.size()];
            for (int i = 0; i < rowsAffectedArray.length; i++) {
              rowsAffectedArray[i] = rowsAffected.get(i);
            }
            return rowsAffectedArray;
          }
        } finally {
          if (pss instanceof ParameterDisposer) {
            ((ParameterDisposer) pss).cleanupParameters();
          }
        }
      }
    });
  }
}

The motivation for batch-fetching generated keys is to propagate them into other objects, enabling them to be batch-inserted as well e.g. batch-inserting Pet objects after batch-inserting Owner objects. Describing that process will take another article.

Comments

Popular posts from this blog

Motivation