Search This Blog

Friday 1 April 2016

IBM Infosphere Datastage - Fork Joins and buffer resistance

What is fork join in datastage?
        Assume we have three stages A, B, and C
                   A is producing two outputs
                                 A1 is consumed by B
                                 A2 is consumed by C
                   B is consuming output from A and produce the output to C
                   C is consuming output from A and B
In this above example, A producing two outputs and most likely these outputs will be produced almost same time
                  So A output to C will be available sooner but C will still have to wait for B output (B would obviously take some time to process A's output and then produce output to C).

         This situation is called fork join

This will slowdown overall process and could cause dead lock as well.
Dead locks will happen due to following reason
                    C will be forced to stop as it does not have all the data required to process.
                    Because C is not ready to consume A's output, A is not allowed to produce output to C.
                    Therefore A is forced to stop.
                    If A can not produce output to C, then A might not be able to product output to B also (assume A is just copy a stage).
                    So nothing is running and everything just hanged. This is called deadlock.

To avoid this situation, datastage automatically insert the buffer operator between A and C... so that A could still produce the output irrespective of whether C could consume it or not. A's output will be kept in buffer until C is read to consume A's output.

This default behavior could be controlled using the environment variable APR_BUFFERING_POLICY

There is also environment variable called APT_BUFFER_MAXIMUM_MEMORY to determine maximum buffer. Default is 3MB.

The environment variable APT_BUFFER_FREE_RUN could be used to define threshold limit of buffered free space before telling producer that "no more please".. this is called buffer resisting..

 

1 comment:

  1. Did you know that that you can generate money by locking special areas of your blog or site?
    Simply open an account with Mgcash and add their Content Locking tool.

    ReplyDelete