You've never had it so good

By Martyn Daly - Jurassic Park
Last updated: 12.07.2017
mainframe debugging

Being a Mainframe Programmer since 1986 when times were so so hard and now witnessing all the modern tools and languages other people are using I thought I would be a little different and write my blog about debugging program dumps and failures the ‘old school’ way.

 

Whilst the modern day Java or C# developer has a vast array of utilities to assist them in debugging:

Takipi

Splunk

JSwat

Eclipse

 

...and other wonderfully named tools designed to make the programmers job easier, all the humble Mainframe Cobol programmer had was the equally (un)excitingly named tool AbendAid !

This product did, as its name suggests, aided you in solving your abend.

There were several prerequisites before you could utilise this powerful solution, the main one being that you had to have an up to date (printed) listing of the program that had failed so that you could ascertain the exact spot that the code has crashed.

So when you are tucked up in bed at 2am and the dreaded pager (remember those?) beeps incessantly reminding you that it is indeed you who are on call supporting the overnight batch this evening and that the main program MassiveBatch.exe has failed with a S0C7 at displacement 00001FC  you start to wish you had brought every one of those 5000 up-to-date program listings (remember there were no laptops and dial up software in those days) home with you rather than have to get dressed, scrape the ice off the car and drive the 20 miles to the office so that you could debug/fix and rerun the offending code and thus be the hero of the office for keeping the system alive.

So, you arrive in the office at 3am and make your way to the ‘Program Listings’ cabinet or such for an up-to-date copy of the compiled program listing.

Bingo! You are in luck and the listing is there and up to date.

You return to your desk, crack your knuckles in readiness and ‘fire’ up your state of art IBM 3270 terminal with accompanied ‘springy’ keyboard:

You bring up the failed job via your state of the art AbendAid software and scan for the problematic program failure:

 

                            Model - 902X          OPSYS     - MVS/SP 5        Job  - MYJOB01

                                                  CP FMID   - CBA0010         Step - RUNIT

                                                  System    - S001            Time - 14.25.13

                                                  DFSMS/MVS - V1R2M0

                                                  JES2      - SP 5

                                                  Completion Code - S0C7

 

                                       *******************************************

                                       *   Next Sequential Instruction Section   *

                                       *******************************************

 

                               The next sequential instruction to be executed in program

                                         TEST1 was at displacement 000001FC.

 

                          The program was compiled on 03 FEB  and is 00000400 bytes long.

 

                                           It is part of load module TEST1.

                               The module was loaded from STEPLIB library

                                mytest.LOAD

                            It was link edited on 03 FEB and is 00000890 bytes long.

 

                               The last known I/O operation or call was issued from program

                                            TEST1 at displacement 000001E6.

 

Armed with these essential pieces of information (highlighted in bold) you now know that a data exception has occurred and what the NEXT sequential instruction was. Flicking through the listing you make your way to the part that shows the Cobol Verbs being issued and the displacement of those verbs:

 

LINE #  HEXLOC  VERB                        LINE #  HEXLOC  VERB                        LINE #  HEXLOC  VERB

 000053 0001D8 DISPLAY                       000054 0001E6 MOVE                          000055 0001EC ADD

 000061 00020A GOBACK

 

                   *** TGT MEMORY MAP ***

                    TGTLOC

 

You know the next displacement was 1FC so the verb immediately PRIOR to that was an ADD statement at displacement 1EC and you know the line number in the program is 55.

You look at the code and see what line 55 is doing:

 

COBOL Compile Listing

PP 5668-958 IBM VS COBOL II Release 4.0 09/15/92                      Date 02/03/00Time 14:25:05   Page   1

   000001         000100 IDENTIFICATION DIVISION.

   000002         000200 PROGRAM-ID.  TEST1.

   000003         000800*

   000004         000900 ENVIRONMENT DIVISION.

   000005         001000 CONFIGURATION SECTION.

   000006         001100*

   000007         001500 INPUT-OUTPUT SECTION.

   000008         001600 FILE-CONTROL.

   000009         001700*

   000013         002100 DATA DIVISION.

   000014         002200 FILE SECTION.

   000015         002300*

   000039         005502*

   000040         005503 01  HEADER-LINE-1.                                          BLW=0000+000

   000041         005504     05 FILLER    PIC X(59) VALUE SPACES.                    BLW=0000+000,0000000

   000042         005505     05 FILLER    PIC X(07) VALUE 'PURELY '.                 BLW=0000+03B,000003B

   000043         005506     05 FILLER    PIC X(10) VALUE 'FICTITIOUS'.              BLW=0000+042,0000042

   000044         005508     05 FILLER    PIC X(56) VALUE SPACES.                    BLW=0000+04C,000004C

   000045         005560*

   000046         006106 01  COUNTERS-ALL.                                                BLW=0000+088

   000047         006107     05 SAMPLE-ACC     PIC 9(04) VALUE 0 USAGE COMP-3.            BLW=0000+088,0000000

   000048         006108     05 EMPLOYEE-ACC   PIC 9(02) VALUE 1 USAGE COMP-3.            BLW=0000+08B,0000003

   000049         006109     05 PAGE-NUM-ACC   PIC 9(03) VALUE 1 USAGE COMP-3.            BLW=0000+08D,0000005

   000050         006110*

   000052         006500 PROCEDURE DIVISION.

   000053         006510     DISPLAY 'PROGRAM TEST1'.

   000054         006600     MOVE ALL '!' TO COUNTERS-ALL.

   000055         006610     ADD 1 TO EMPLOYEE-ACC

   000056         006620*

   000057         006630*

 

 

You can see that the variable EMPLOYEE-ACC is a numeric so adding 1 to it shouldn’t cause the problem as it is initialised to value 1 …… or is it?

Looking at the line above you see that COUNTERS-ALL (which incorporates EMPLOYEE-ACC) is having non-numeric characters moved to it (!). Therefore as soon as you reference any of the fields below it, the data exception would occur.

 

Delighted you have quickly found the root cause using this amazing piece of software the company has so kindly bought the department at great expense (resulting in no pay rise or bonus for the next calendar year L ), you amend the program, recompile and instruct the operators to re-submit the job.

You are sure that it will work on re-run but can’t really settle (or go home) until that magical return code zero appears.

 

It works, it's 7am so there is no point going home. You get a coffee and await for your colleagues to come into the office and lavish you with praise for a job well done.

 

 

 

I would like to acknowledge the ownership of certain bits of this blog:

Created by http://www.theamericanprogrammer.com. You may copy this document provided this notice is attached.