TWS recover and rerun job until it succeeds
Scenario
There are two jobs: Job 1 and Job 2. Job 2 is dependent on Job 1 i.e. Job 1 has to succeed before Job 2 can run.
Dilemma
Out-of-the-box a job can only recover and rerun once. So how do you rerun job 1 until it succeeds?
Solution
Use the Recovery options to run a recovery job when Job 1 fails. The recovery job runs a job to rerun Job1. The workflow will continue until Job 1 succeeds.
Details
Job stream ORB_SCHED1 |
|
Job |
Script/Command |
ORB_JOB1 |
|
ORB_JOB1R |
|
ORB_JOB1RR |
|
ORB_JOB2 |
env |
Copy the scripts (links above) to the target workstation, in this example to the directory /orbdata/TWS/scripts. Configure the workstation to use a variable table for where the scripts and files are located as the following example:
CPUNAME LAPHROAIG
DESCRIPTION “laphroaig on Solaris x64”
VARTABLE ORB_LAPHROAIG
OS UNIX
NODE laphroaig.scotchwhisky.local TCPADDR 31211
TIMEZONE Europe/London
DOMAIN MASTERDM
FOR MAESTRO
TYPE FTA
AUTOLINK ON
BEHINDFIREWALL OFF
FULLSTATUS ON
END
And create the variable table ORB_LAPHROAIG…
VARTABLE ORB_LAPHROAIG
MEMBERS
TWSFILES_PATH “/orbdata/TWS/scripts/files”
TWSSCRIPT_PATH “/orbdata/TWS/scripts/”
END
Create a job stream including Job 1 and Job 2. Job 2 follows Job 1 as the following example:
SCHEDULE LAPHROAIG#ORB_SCHED1
MATCHING PREVIOUS
:
LAPHROAIG#ORB_JOB1
LAPHROAIG#ORB_JOB2
FOLLOWS ORB_JOB1
END
Create Jobs with the sleep parameter sleep=<seconds> to delay execution. Create Job 1 and configure it to run a Recovery job with the option stop as the following example:
LAPHROAIG#ORB_JOB1
SCRIPTNAME “perl ^TWSSCRIPT_PATH^/job_one.pl ^TWSFILES_PATH^ “sleep=10″”
STREAMLOGON twsuserw
TASKTYPE OTHER
RCCONDSUCC “RC=0”
RECOVERY STOP
AFTER LAPHROAIG#ORB_JOB1R
Create the recovery job as the following example
LAPHROAIG#ORB_JOB1R
SCRIPTNAME “perl ^TWSSCRIPT_PATH^/recov_job.pl ^TWSFILES_PATH^ sleep=10 rerunjob=ORB_JOB1RR”
STREAMLOGON twsuserw
TASKTYPE OTHER
RCCONDSUCC “RC=0”
RECOVERY STOP
The recovery job has to exit with a failure otherwise Job 2 will think Job 1 succeeded. The recovery job launches the rerun job.
Create the rerun job as the following example:
LAPHROAIG#ORB_JOB1RR
SCRIPTNAME “perl ^TWSSCRIPT_PATH^/rerun_job.pl ^TWSFILES_PATH^ sleep=10”
STREAMLOGON twsuserw
TASKTYPE OTHER
RCCONDSUCC “RC=0”
RECOVERY STOP
To run this example submit the job stream ORB_SCHED1 and view the jobs, recovery jobs and rerun jobs. See how the rerun job gets executed and the ORB_JOB1 job by looking at the job stream ORB_SCHE1 in the TDWC. To make the job stream, ORB_JOB1 and ORB_JOB2 succeed create the file job1.test in the directory /orbdata/TWS/scripts/files.
The rerun job looks at the jobs in current job stream. It captures the job in status ABEND and runs the conman rr command for that job. The rerun job has to exit successfully. Job 2 will switch to follow the rerunning Job 1.
The recovery jobs and the rerun jobs will be visible in the same job stream. The job stream will show status success after Job 1 has finally successfully run followed by successfully running Job 2.
Job stream ORB_SCHED1
Job stream ORB_SCHED1 Successful
Jobs in Job stream ORB_SCHED1
Jobs in job stream ORB_SCHED1
The scripts use in this example can be found here: job_one.pl recov_job.pl rerun_job.pl
Views: 1309