TWS recover and rerun job until it succeeds

by Anders Soderback

Scenario

There are two jobs: Job 1 and Job 2. Job 2 is dependent on Job 1 i.e. Job 1 has to succeed before Job 2 can run.

Dilemma

Out-of-the-box a job can only recover and rerun once. So how do you rerun job 1 until it succeeds?

Solution

Use the Recovery options to run a recovery job when Job 1 fails. The recovery job runs a job to rerun Job1. The workflow will continue until Job 1 succeeds.

TWSrerun 592x366

Details

Job stream ORB_SCHED1

Job

Script/Command

ORB_JOB1

job_one.pl

ORB_JOB1R

recov_job.pl

ORB_JOB1RR

rerun_job.pl

ORB_JOB2

env

Copy the scripts (links above) to the target workstation, in this example to the directory /orbdata/TWS/scripts. Configure the workstation to use a variable table for where the scripts and files are located as the following example:

CPUNAME LAPHROAIG

  DESCRIPTION “laphroaig on Solaris x64”

  VARTABLE ORB_LAPHROAIG

  OS UNIX

  NODE laphroaig.scotchwhisky.local TCPADDR 31211

  TIMEZONE Europe/London

  DOMAIN MASTERDM

  FOR MAESTRO

    TYPE FTA

    AUTOLINK ON

    BEHINDFIREWALL OFF

    FULLSTATUS ON

END

And create the variable table ORB_LAPHROAIG…

VARTABLE ORB_LAPHROAIG

  MEMBERS

  TWSFILES_PATH “/orbdata/TWS/scripts/files”

  TWSSCRIPT_PATH “/orbdata/TWS/scripts/”

END

Create a job stream including Job 1 and Job 2. Job 2 follows Job 1 as the following example:

SCHEDULE LAPHROAIG#ORB_SCHED1

MATCHING PREVIOUS

:

LAPHROAIG#ORB_JOB1

LAPHROAIG#ORB_JOB2

 FOLLOWS ORB_JOB1

END

Create Jobs with the sleep parameter sleep=<seconds> to delay execution. Create Job 1 and configure it to run a Recovery job with the option stop as the following example:

LAPHROAIG#ORB_JOB1

 SCRIPTNAME “perl ^TWSSCRIPT_PATH^/job_one.pl ^TWSFILES_PATH^ “sleep=10″”

 STREAMLOGON twsuserw

 TASKTYPE OTHER

 RCCONDSUCC “RC=0”

 RECOVERY STOP

  AFTER LAPHROAIG#ORB_JOB1R

Create the recovery job as the following example

LAPHROAIG#ORB_JOB1R

 SCRIPTNAME “perl ^TWSSCRIPT_PATH^/recov_job.pl ^TWSFILES_PATH^ sleep=10 rerunjob=ORB_JOB1RR”

 STREAMLOGON twsuserw

 TASKTYPE OTHER

 RCCONDSUCC “RC=0”

 RECOVERY STOP

The recovery job has to exit with a failure otherwise Job 2 will think Job 1 succeeded. The recovery job launches the rerun job.   

Create the rerun job as the following example:

LAPHROAIG#ORB_JOB1RR

 SCRIPTNAME “perl ^TWSSCRIPT_PATH^/rerun_job.pl ^TWSFILES_PATH^  sleep=10”

 STREAMLOGON twsuserw

 TASKTYPE OTHER

 RCCONDSUCC “RC=0”

 RECOVERY STOP

To run this example submit the job stream ORB_SCHED1 and view the jobs, recovery jobs and rerun jobs. See how the rerun job gets executed and the ORB_JOB1 job by looking at the job stream ORB_SCHE1 in the TDWC. To make the job stream, ORB_JOB1 and ORB_JOB2 succeed create the file job1.test in the directory /orbdata/TWS/scripts/files.

The rerun job looks at the jobs in current job stream. It captures the job in status ABEND and runs the conman rr command for that job. The rerun job has to exit successfully. Job 2 will switch to follow the rerunning Job 1.

The recovery jobs and the rerun jobs will be visible in the same job stream. The job stream will show status success after Job 1 has finally successfully run followed by successfully running Job 2.

Job stream ORB_SCHED1

JobStreamORB SCHED1

Job stream ORB_SCHED1 Successful

Jobs in Job stream ORB_SCHED1

Jobs JobStreamORB SCHED1

Jobs in job stream ORB_SCHED1

The scripts use in this example can be found here: job_one.pl recov_job.pl rerun_job.pl

by Anders Soderback

Views: 1309