Search This Blog

Friday 1 April 2016

What is bounded length variable data means in InfoSphere DataStage

"Bounded length filed" is a Datastage terminology for handling the variable length data while processing in memory and storing in persistent datasets.
 
For performance optimization , datastage allocates memory sufficient to store maximum length of a field with data type as VARCHAR, RAW, STRING and UNSTRIG. For example if filed is defined as "fieldname:string[max=5000]" then 5000 bytes will be allocated while processing this field in memory. You can see the record schema if set OSH_PRINT_SCHEMAS=True. This approach will turn out to be inefficient if data model is most of varchars and have max length defined with higher value. This will require lot of memory even tough actual data is very small in size.
 
If we do not like this feature,
1.     We can’t control this for persistent data. Starting from V8.5, by default variable lenght data is stored as bounded length field i.e. field is allocated with max length size irrespective of how much data actually stored. This will increase the performance as datastage does not need additional processing to find out the actual length of data. Before prior V8.5, we can change this behavior set setting environment variable APT_COMPRESS_BOUNDED_FIELDS=True.
2.     But for scratch disk  memory requirements, we can control this behavior to reduce the memory requirement (Example for SORT) by setting APT_OLD_BOUNDED_LENGTH=True. This will reduce the CPU utilization (i.e. impacts the performance) but reduces the memory usage.
 
In contrast, "Unbounded length filed" are the variable length field whose max length is not defined. Example "fieldname:string"… in this case the memory allocated for this filed will be actual data length, while working with datasets only. But this will not have performance advantage that "bounded length filed" will have as every time unbounded length fields are referenced, datastage needs to understand the length first.

More details could be found at IBM Link

In case of oracle connection, if the length of varchar fields missing then Connector stage assumes the length of 4000. Use environment variable CC_ORA_UNBOUNDED_STRING_LENGTH to set this default length. So the fields are processed as "Bounded length filed" irrespective of whether we give length or max length or not.

In case of DB2 connection, if the length is missing then Connector stage assumes the length of 32.

No comments:

Post a Comment