Being a Mainframe Programmer since 1986 when times were so so hard and now witnessing all the modern tools and languages other people are using I thought I would be a little different and write my blog about debugging program dumps and failures the ‘old school’ way.
Whilst the modern day Java or C# developer has a vast array of utilities to assist them in debugging:
...and other wonderfully named tools designed to make the programmers job easier, all the humble Mainframe Cobol programmer had was the equally (un)excitingly named tool AbendAid !
This product did, as its name suggests, aided you in solving your abend.
There were several prerequisites before you could utilise this powerful solution, the main one being that you had to have an up to date (printed) listing of the program that had failed so that you could ascertain the exact spot that the code has crashed.
So when you are tucked up in bed at 2am and the dreaded pager (remember those?) beeps incessantly reminding you that it is indeed you who are on call supporting the overnight batch this evening and that the main program MassiveBatch.exe has failed with a S0C7 at displacement 00001FC you start to wish you had brought every one of those 5000 up-to-date program listings (remember there were no laptops and dial up software in those days) home with you rather than have to get dressed, scrape the ice off the car and drive the 20 miles to the office so that you could debug/fix and rerun the offending code and thus be the hero of the office for keeping the system alive.
So, you arrive in the office at 3am and make your way to the ‘Program Listings’ cabinet or such for an up-to-date copy of the compiled program listing.
Bingo! You are in luck and the listing is there and up to date.
You return to your desk, crack your knuckles in readiness and ‘fire’ up your state of art IBM 3270 terminal with accompanied ‘springy’ keyboard:
You bring up the failed job via your state of the art AbendAid software and scan for the problematic program failure:
Model - 902X OPSYS - MVS/SP 5 Job - MYJOB01
CP FMID - CBA0010 Step - RUNIT
System - S001 Time - 14.25.13
DFSMS/MVS - V1R2M0
JES2 - SP 5
Completion Code - S0C7
* Next Sequential Instruction Section *
The next sequential instruction to be executed in program
TEST1 was at displacement 000001FC.
The program was compiled on 03 FEB and is 00000400 bytes long.
It is part of load module TEST1.
The module was loaded from STEPLIB library
It was link edited on 03 FEB and is 00000890 bytes long.
The last known I/O operation or call was issued from program
TEST1 at displacement 000001E6.
Armed with these essential pieces of information (highlighted in bold) you now know that a data exception has occurred and what the NEXT sequential instruction was. Flicking through the listing you make your way to the part that shows the Cobol Verbs being issued and the displacement of those verbs:
LINE # HEXLOC VERB LINE # HEXLOC VERB LINE # HEXLOC VERB
000053 0001D8 DISPLAY 000054 0001E6 MOVE 000055 0001EC ADD
000061 00020A GOBACK
*** TGT MEMORY MAP ***
You know the next displacement was 1FC so the verb immediately PRIOR to that was an ADD statement at displacement 1EC and you know the line number in the program is 55.
You look at the code and see what line 55 is doing:
COBOL Compile Listing
PP 5668-958 IBM VS COBOL II Release 4.0 09/15/92 Date 02/03/00Time 14:25:05 Page 1
000001 000100 IDENTIFICATION DIVISION.
000002 000200 PROGRAM-ID. TEST1.
000004 000900 ENVIRONMENT DIVISION.
000005 001000 CONFIGURATION SECTION.
000007 001500 INPUT-OUTPUT SECTION.
000008 001600 FILE-CONTROL.
000013 002100 DATA DIVISION.
000014 002200 FILE SECTION.
000040 005503 01 HEADER-LINE-1. BLW=0000+000
000041 005504 05 FILLER PIC X(59) VALUE SPACES. BLW=0000+000,0000000
000042 005505 05 FILLER PIC X(07) VALUE 'PURELY '. BLW=0000+03B,000003B
000043 005506 05 FILLER PIC X(10) VALUE 'FICTITIOUS'. BLW=0000+042,0000042
000044 005508 05 FILLER PIC X(56) VALUE SPACES. BLW=0000+04C,000004C
000046 006106 01 COUNTERS-ALL. BLW=0000+088
000047 006107 05 SAMPLE-ACC PIC 9(04) VALUE 0 USAGE COMP-3. BLW=0000+088,0000000
000048 006108 05 EMPLOYEE-ACC PIC 9(02) VALUE 1 USAGE COMP-3. BLW=0000+08B,0000003
000049 006109 05 PAGE-NUM-ACC PIC 9(03) VALUE 1 USAGE COMP-3. BLW=0000+08D,0000005
000052 006500 PROCEDURE DIVISION.
000053 006510 DISPLAY 'PROGRAM TEST1'.
000054 006600 MOVE ALL '!' TO COUNTERS-ALL.
000055 006610 ADD 1 TO EMPLOYEE-ACC
You can see that the variable EMPLOYEE-ACC is a numeric so adding 1 to it shouldn’t cause the problem as it is initialised to value 1 …… or is it?
Looking at the line above you see that COUNTERS-ALL (which incorporates EMPLOYEE-ACC) is having non-numeric characters moved to it (!). Therefore as soon as you reference any of the fields below it, the data exception would occur.
Delighted you have quickly found the root cause using this amazing piece of software the company has so kindly bought the department at great expense (resulting in no pay rise or bonus for the next calendar year L ), you amend the program, recompile and instruct the operators to re-submit the job.
You are sure that it will work on re-run but can’t really settle (or go home) until that magical return code zero appears.
It works, it's 7am so there is no point going home. You get a coffee and await for your colleagues to come into the office and lavish you with praise for a job well done.
I would like to acknowledge the ownership of certain bits of this blog:
Created by http://www.theamericanprogrammer.com. You may copy this document provided this notice is attached.