Automating IBM DataStage Director logs using Python

In this blog post, I am going to explain how to Automate IBM DataStage Director using Python with RegEx and pandas.
Bonus: I will also include the script automate the mail of logs in excel format if I get comments asking for it.

The log file director can export is in txt format, which obviously is not very readable, as I will show you the example of it soon.
Excel is much more readable than the txt file.
Also, to reduce the manual work and improve the efficiency of the team.

Obtaining Raw Data:
To obtain the job logs, we can simply print the job logs and select all entries to print all the entries in the current view to a text file. Click the link below to read the official documentation.

Now we have a log.txt file that contains out raw data like this:

DataStage Report - Detail Status of jobs:  
 Produced on: 02-05-2020 15:47:27
 Project: testing
 Host system: TESTING
 Items: 1 - 3
 Sorted on: Job name
Item #: 1
    Job name: J_CTRL_JOB
    Status: Finished
    Started: 01:46
    On date: 02-05-2020
    Last ran: 01:57
    On date: 02-05-2020
    Elapsed time: 00:11:39
Item #: 2
    Job name: J_TEST1
    Status: Finished
    Started: 01:46
    On date: 02-05-2020
    Last ran: 01:48
    On date: 02-05-2020
    Elapsed time: 00:02:39
Item #: 3
    Job name: J_TEST2
    Status: Finished
    Started: 01:48
    On date: 02-05-2020
    Last ran: 01:49
    On date: 02-05-2020
    Elapsed time: 00:00:28

End of report.


  • Python (Version 3.x Recommended)
  • Basic knowledge of Regular Expressions (Or you can just copy-paste my code)
  • Pandas library of Python to save data to excel (a bit overkill but works like a charm)

Let’s do it now:

  • Just put the log.txt file and the in the same folder
  • Open Terminal
  • run this command: python

Here is the script for

import re
from pandas import DataFrame

with open('log.txt', 'r', encoding='utf-8') as f:
     log =

search_pattern = (
 r"""\bItem #: (\d+)\s+"""
 r"""Job name: (.+)\s+"""
 r"""Status: (.+)\s+"""
 r"""Started: (.+)"""
 r"""\s+On date: (.+)\s+"""
 r"""Last ran: (.+)\s+"""
 r"""On date: (.+)\s+"""
 r"""Elapsed time: (.+)\s+"""

pattern = re.compile(search_pattern)

item, job_name, status, \
 start_date_time, end_date_time, \
 elapsed_time, description  = ([] for _ in range(7))

matches = pattern.finditer(log)
for match in matches:
     start_date_time.append(' '
     end_date_time.append(' '

df = DataFrame(
  {'Item': item,
  'Job Name': job_name,
  'Status': status,
  'Start Time': start_date_time,
  'End Time': end_date_time,
  'Elapsed Time': elapsed_time,
  'Description': description,
df.to_excel('job_status.xlsx', sheet_name='sheet1', index=False)

On Successfully, following all the steps, and running the python script, you should see a new excel file named ‘job_status.xlsx’ in the same folder. That’s it. It is this simple automating IBM DataStage Director. Yayyy!

Hackerman meme IBM InfoSphere DataStage Director

Mailing this excel file to the team members:

Now that we have the excel file ready which looks so much better in terms of readability and usability(you can use filters and colour the blocks and what not!). Some people may not need to mail this file by automation. So I am not writing about it as of now.

You can find all the code with log file on my Github Repo:

Check out my other posts:
Squared Error vs Absolute Error
why use harmonic mean to calculate f1 score

Read my StackExchange answers:

Feel free to write comments in case of any doubt or query, I will reply asap.

Show CommentsClose Comments

Leave a comment