Google

Jun 5, 2012

Batch processing in Java with Spring batch - part 1


Make sure that you do this beginner tutorial first

Beginner Spring batch tutorial with simple reader and writer

Q. What do you understand by batch processing and why do you need them?
A. The idea behind batch processing is to allow a program to run without the need for human intervention, usually scheduled to run periodically at a certain time or every x minutes.The batch process solves
  • execution that takes a lengthy amount of time (e.g. consolidated reports, re-balancing the financial portfolio, etc)
  • processing a large input set of data via ETL (Extract Transform and Load) operation.
  • automating some business process that doesn’t necessarily need to be triggered by a human.
In general, a batch process involve

  • Reading data from data sources like databases, files, etc
  • Processing the data through a number of steps. This can involve transforming the data, performing calculation, generating length reports, etc.
  • Finally, writing the processed data to databases or file systems.

Q. What libraries do you need to get started with spring batch
A. The pom.xml file will be the start. Fill in the appropriate versions and the additional dependencies required.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.myapp</groupId>
  <artifactId>mybatchapp</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>mybatchapp</name>
  <url>http://maven.apache.org</url>
  <dependencies>
    <dependency>
        <groupId>org.springframework.batch</groupId>
        <artifactId>spring-batch-core</artifactId>
        <version>....</version>
    </dependency>
    <dependency>
        <groupId>commons-lang</groupId>
        <artifactId>commons-lang</artifactId>
        <version>....</version>
    </dependency>
    <dependency>
        <groupId>org.springframework</groupId>
        <artifactId>spring</artifactId>
        <version>...</version>
    </dependency>
 
 ...other dependencies like hibernate
 
  </dependencies>
  ...
</project> 

Q. What are the key terms of Spring batch?  
A.


  • JobLauncher: It helps you to launch a job. It uses a JobRepository to obtain a valid JobExecution.
  • JobRepository: A persistent store for all the job meta-data information. Stores JobInstances, JobExecutions, and StepExecutions information to a database or a file system. The repository is required because a job could be a rerun of a previously failed job. 
  • Job: Represents a job. For example, an portfolio processing job that calculates the portfolio value. JobInstance: A running instance of a job. If a job runs every week night, then there will 5 instances of that job in a week. 
  • JobParameters: Parameters that are used by the JobInstance. For example, the portfolio processing job requires the list of account numbers as the input parameters. The job parameters are passed as command line arguments.
  • JobExecution: Every attempt to run a JobInstance results in a JobExecution. If a job fails then it will be rerun and we end up having a single JobInstance with 2 JobExecutions (1 failed execution and 1 successful execution). 
  • Step: Each job is made up of one or more steps. 
  • StepExecution: Similar to JobExecution, there are StepExecutions. This represents an attempt to run a Step in a Job.  

Q. Can you describe the key steps involved in a typical batch job scenario you had worked on?
A. A simplified batch process scenario is explained below.

The scenario is basically a batch job that runs overnight to go through all the accounts from the accounts table and calculates the available_cash by subtracting debit from the credit.

Step 1: Define a batch control table that keeps metdata about the batch job. The example below shows a batch_control table that process accounts in chunks. The account numbers 000 - 999 ara processed by a job and account_no 1000 - 1999 by another job. This table also holds information about when the job started, when the job finished, status (i.e. COMPLETED or FAILED), last processed account_no, etc.

job_id job_name start_timestamp end_timestamp status account_ no_from account_ no_to last_ account_no
1 accountValueUpdateJob1 21/04/2012 3:49:11.053 AM 21/04/2012 3:57:55.480 AM COMPLETED 000 999 845
2 accountValueUpdateJob2 21/04/2012 3:49:11.053 AM 21/04/2012 3:57:55.480 AM FAILED 1000 1999 1200

Step 2: To keep it simple, a single data table is used. You need to define a job and its steps. A job can have more than one steps. Each step can have a reader, processor, and a writer. An ItemReader will read data from the accounts table shown below for account numbers between account_from and account_to read from the batch_control table shown above. An ItemProcessor will calculate the availbale_cash and an ItemWriter will update the available_cash on the accounts table.

account_no account_name debit credit available_cash
001 John Smith 200.00 4000.0 0.0
1199 Peter Smith 55000.50 787.25 0.0

Step 3: Once the batch job is completed, the batch_contol table will be updated accordingly.

Step 4: Listeners can be used to process errors (e.g. onProcessError(....), onReadError(....), etc) and other pre and post item events like beforeRead(...), afterRead(...), etc. The spring-batch framework make use of the configuration xml file, for example batch-context.xml and the Java classes annotated with @Component to wire up the components and implement the logic.

Step 5: The batch job can be executed via a shell or batch script that invokes the spring-batch framework as shown below. The CommandLineJobRunner is the Spring class that initiates the job by wiring up the relevant components, listeners, daos, etc via the configuration file batch-context.xml. The job parameter that is passed is "accountValueUpdateJob1", which is used to retrieve the relevant job metatdata from the job control table.

my_job_run.sh  accountValueUpdateJob1 accountValueUpdateJob1.log
   $0                 $1                    $2

The my_job_run.sh looks something like
...
JOB_CLASS=org.springframework.batch.core.launch.support.CommandLineJobRunner
APPCONTEXT=batch-context.xml
SPRING_JOB=availableBalanceJob
CLASSPATH=$JAVA_HOME/bin:....................... 
JOB_TO_RUN=$1
...

# the jobParameter is jobName. It is passed via job script argument <jobNameToRun>.
$JAVA_HOME/bin/java -classpath ${CLASSPATH} ${JOB_CLASS} ${APPCONTEXT} ${SPRING_JOB} jobName=${JOB_TO_RUN}"

Step 6: The shell script (e.g. my_job_run.sh) or batch file will be invoked by a job scheduler like quartz or
Unix cron job at a particular time without any human intervention.

This is basically the big picture. The wiring up of spring-batch will be explained in a different post.




Labels:

5 Comments:

Blogger Arvind said...

Thanks for your detailed explanation.I was searching for good example and finally I landed to the right page. Appreciate it.

7:20 PM, June 12, 2012  
Blogger Unknown said...

Stay tuned, there will be 2 more parts coming up on spring-batch.

11:24 PM, June 12, 2012  
Anonymous Anonymous said...

could you please provide us a link to download this source code?

8:06 PM, July 27, 2012  
Blogger Unknown said...

These are simplified code snippets.

11:14 AM, July 28, 2012  
Blogger Brady... said...

Thank you Arul, it is very much helpful. Keep the going...

8:40 PM, November 07, 2013  

Post a Comment

Subscribe to Post Comments [Atom]

<< Home