Technical excerpts: 2016

Tuesday, 26 July 2016

Linux - Identify the disk type and disk partitions where mount points allocated

We can use the command LSSCSI. This will show disk device type and model.

In order to find which physical device its assigned to , we can use command line
LS -L /sys/block/sda

This will give long string of "../../devices/pci0000:00/0000:00:0d.0/host..." where "0000:00:0d.0" is a PCI device ID. Use "lspci" to identify it. LSPCI is in sbin therefore require superuser access

"parted -l" also gives information about disks.

We can check the file "/proc/partitions" for all list of disk partitions to which mount points are linked.

all the mount points can be seen in the file "/proc/mounts"
"fdisk" could be used to analyse the disks but it requires super user access.

If you are looking for IBM SAN disks , as an option we can check file system as GPFS
Note from Wiki: IBM discontinued selling the SAN File System in April 2007. It has been replaced by IBM General Parallel File System(GPFS).

Monday, 4 July 2016

Working with Memory in Unix

In Unix, files are arranged in file hierarchy, this hierarchy called file system. Files in different devices or on same devices can be mounted to one hierarchy so that all these files can be accessed as single file system. Each mount point can be assigned a space limit. The mount points, space usage and available space on each mount point can be listed using following command.

DF -H (all small letters)

When mount point reached max space limited then processes writing to that mount point will start failing as space is not available for that mount point even though lots of space available in disk.

In this scenario we have two options,

1) Either increase space limit for mount point (If more space available in disk)

2) Purge some files to release space

The command that can be used to increase the logical extents for the mount point is lvextend .

The following article explains how to increase space limit on mount point

http://www.techrepublic.com/blog/linux-and-open-source/how-to-use-logical-volume-manager-lvm-to-grow-etx4-file-systems-online/

In terms of releasing the space.. it will be as easy as using “rm -rf”

But if do not want to delete existing folder structure and delete only the old or big files, we need to do following

1) Identify the folder which occupied more space

2) Delete older and larger files

To identify the folders that occupied more space in a mount point, we can use following command

DU -H –MAX-DEPTH=1 (all small letters)

To delete files bases on size and time, use find as below

FIND /TMP/VAR -TYPE F -MTIME +15 -SIZE +2M -EXEC RM -RF {} \; (all small letters)

If we want to check the RAM size and memory usage in linux system, we could use either of following commands

view /proc/meminfo file

or use "free" command

or use vmstat for memory statistics.

Tuesday, 21 June 2016

Working with Dates in Oracle

Oracle DATE data type could be used to store both date and time parts. Does this mean it always stores both date and time parts or do we have to explicitly store time part? If it automatically stores time part, why time part is not displayed when I query dates in SQL developer or SQL plus? Can I use arithmetic functions directly with date fields? If yes, does the result contains value in days or months or years? How do we extract a specific part (i.e. day or year or secs etc) from date field?

This article is aimed to answer most of these questions.

Storage

Oracle DATE data type by default stores both date and time parts.

The DATE datatype stores the year (including the century), the month, the day, the hours, the minutes, and the seconds (after midnight).

While inserting data, if time part is ignored then it will store zeros (i.e. 00:00:00 A.M.) in time part. Remember, DATE data type does not store nano seconds.. if we need fraction of seconds to be stored, then we need to use timestamp.

We could use DUMP function to get the storage details

select dump (sysdate) from dual;

DUMP(SYSDATE)
_________________________________________
Typ=13 Len=8: 7,218,11,26,14,21,24,0

The above result indicates that data is stored as 8 bytes field.

Display

Oracle DATE fields are displayed as per the NLS_DATE_FORMAT settings in oracle database.

select * from SYS.NLS_DATABASE_PARAMETERS where parameter='NLS_DATE_FORMAT';

To display date in different format, change the value for NLS_DATE_FORMAT

Ex. alter session set nls_date_format='DD/MM/YYYY HH.MI.SS';

dbms_session.set_nls('nls_date_format', 'DD/MM/YYYY HH.MI.SS');

For changing at database level, use ALTER SYSTEM.

Date Arithmetic Functions

In order to understand how dates could be used in the arithmetic functions like add, multiple etc.. we need to first understand what data types are accepted on arithmetic functions and also oracle data type internal conversions.

Excerpt from Oracle documentation:

You can use an arithmetic operator with one or two arguments to negate, add, subtract, multiply, and divide numeric values. Some of these operators are also used in datetime and interval arithmetic. The arguments to the operator must resolve to numeric datatypes or to any datatype that can be implicitly converted to a numeric datatype.

Since dates are stored as numbers in the oracle database, we can perform add/subtract arithmetic functions on dates, but multiply/divide are not supported on dates.

When two dates are used in binary +/- operations, the results will be displayed as number of days. If date fields also have time part then result will also have fraction of days (ex 2.33 , 3.44 etc). If we want the result as complete integer value, then we will have to truncate the time part using TRUNC function. The reason for result being number of days (with datatype as integer) is, the dates are stored as days (with data type as number) internally.

Therefore, apart from able to perform +/- on two dates, we can also perform +/- between date and integer (representing number of days).

Important date functions

EXTRACT

This date function could be used to extract date or time part from fileds with datetime or interval data types

Ex: select extract(month from sysdate) from dual;

ADD_MONTHS

This function returns date plus interger months… ex.. add_months(sysdate,1) will return next month date

INTERVAL

We have both INTERVAL date function and INTERVAL date data type. Both cases, it is a means of dealing with date/time intervals. For example if we want to store year to month interval (ex 2 years 2 months) then we can added column with data type as INTERVAL YEAR [(year_precision)] TO MONTH.

Similarly if we want to just increate the date by 20 seconds then we can use interval function as

sysdate + INTERVAL '20' SECOND

try below query

select to_char(sysdate,'YYYY/MM/DD HH:MI:SS'),to_char(sysdate + INTERVAL '20' SECOND,'YYYY/MM/DD HH:MI:SS') from dual;

MONTHS_BETWEEN

This function returns number of months between two dates

For example select Months_between('02-FEB-15','01-JAN-15') from dual; returns

1.03225806451612903225806451612903225806

The fraction of value return because the difference is 1 month and 1 day .. this one day is converted to month in result.

ROUND and TRUNC

Both round and trunc date functions do same thing.

TRUNC is meant to truncate the time part and ROUND is meant to round the date to nearest day/year or month etc..

For example if the date field has “2016/06/21 07:53:35” then TRUNC function returns 21-JUN-16 but ROUND function return 22-JUN-16.

You can see the difference in these functions if we do not specific the format string and time part in date field is after 12:00 noon. Otherwise both these functions return same result.

We can use ROUND and TRUNC function to return the first day of the year (i.e. using function as TRUNC(DATE,YEAR)) similarly month and week etc..

Wednesday, 25 May 2016

Continuous Integration in software development

In software development process, continuous integration plays very important role. By the way what is continuous integration?

Imagine, we have a unit test, integration test and UAT environments. Every developer could modify everything in unit test environment. But no developer could modify the code or compile the code in higher environments like integration test or UAT. Therefore, any fix or code change tested in unit test environment needs to be elevated or promoted or deployed to higher environments before it could be tested in higher environment. In order to elevate to next environment, we first check-in code to central repository or version control tool. So the code will be just in one repository that is version controlled. Then the code from version controlled repository will be build (i.e. compiling, linking, binding variables etc) and executables will be deployed to next higher environment or a workspace from where elevation happens. This process is called integration. If we are making changes and check-in code to version controlled repository very frequently and executables are regenerated continuously in next higher environment or a workspace, then it will become continuous integration. Continuous integration normally be automated so that as soon as check-in happens, build will start and executables will be deployed to next environment or workspace or server, from there elevation to next environment takes place.

Continuous Integration process varies depending on type of code. The above explanation holds good if code is, for example, Java that need to be compiled, JAR or WAR need to be created. But if it is just unix shell scripting which do not require compiling, the continuous integration may be just copying the code from version control tool to designated environment therefore we may not see any continuous integration tools like Hudson or Jenkins playing any role in Unix Shell Scripting projects as it could be automated with simple scripting in native OS.

Assuming we are working in a JAVA project, the continuous integration could be implanted as below

Wednesday, 20 April 2016

Does the commit in called program commits transactions in calling program or main procedure?

Yes, in normal process. But if we do not want this, then we have to process called program or child program transactions as autonomoustransactions.

An autonomous transaction is an independent transaction started by another transaction, the main transaction. Autonomous transactions do SQL operations and commit or roll back, without committing or rolling back the main transaction

To start an autonomous transaction, we need to use AUTONOMOUS_TRANSACTION pragma (compiler directive) at the top of the section.

Autonomous Transactions Vs Nested transactions

By default, if the main procedure/function shares its context with nested/child procedures/functions. We can create autonomous transactions using the AUONOMOUS_TRANSACTION pragma. This will mark the nested transaction as autonomous hence this autonomous transaction will run in its own context.

Therefore, autonomous transactions

· Do not share resources such as locks with main transactions

· Do not depend on main transactions

· Its committed transactions are visible to other transactions immediately

· When an autonomous transaction calls another non-autonomous transaction, then this non-autonomous will become nested therefore this child will share same context as parent autonomous transaction

Monday, 11 April 2016

Create virtual machine in Windows 8 and windows 10 then install Linux Mint OS

Hyper-V (Hypervisor), virtual machine management services, is part of Window 8 and Windows 10 operating system. But this service is not enabled/installed by default. In order to install or enable virtual machine management services in windows 8/10, we need follow below steps

1. First we need to enable virtual technology hardware at processor level by accessing BIOS steps

BIOS is basic input output system. BIOS configuration defines operating system behavior.
In order to access BIOS, we need to restart the computer from “settings à change pc settings à update and recovery à recovery à Advanced Startup (and restart now) à troubleshoot à Advanced options à startup settings….. then when it’s being restarted, keep enter pressed
Once BIOS is open, you can enable virtual technology from configuration tab

2. Once VT is enabled from BIOS, go to add/remove programs in the control panel, and click “turn windows features on or off” and select all Hyper-V options and click on. It will install Hyper-V

3. From Hyper-V manager , you can create new virtual machine and select install OS later while creating virtual machine

4. Download linux mint iso file from https://www.linuxmint.com/download.php

5. Then start the virtual machine from Hyper-V manager and choose the ISO file for OS installation and follow the installation steps.

6. OpenSuse can be installed after downloading ISO file. But we need a network connection from virtual machine for this installation. In order to establish the network connection from virtual machine, create virtual external switch using virtual switch manger and share the external virtual switch while creating the virtual machine

Now connecting this virtual machine unix box from host putty session

1. Check virtual machine unix box ip address

Ip add show

à eth0: this is the network name in which inet ip address is the ip address we are looking for

2. Using the above ip address we can connect via putty using SSH connection type. But before this we need to install SSH in the guest linux OS (in virtual machine). We can do this using “sudo apt-get install openssh-server” command

3. Once SSH is installed, we need to open the port 22. (this can be done by editing file “sudo vi /etc/ssh/sshd_config” and un comment 22 port).. then stop and start the ssh service as “sudo /sbin/service sshd stop” and “sudo /sbin/service sshd start”

4. Check if SSH is enabled and running using the command “netstat –lnpt | grep 22”

5. You can test if SSH is working fine by running “ssh <vm ip>” in virtual machine itself

6. If any issues in order to establish SSH connection from host computer to hyper-v virtual machine, we need to configure virtual machine with static ip. To do this, open YaST control center and go to SYSTEMà NETWORK SETTINGS, then in overview tab edit the ip address as static and give some ip address. Then give this same ip address as default IP4v gateway in routing tab. This will cause internet connection not working in virtual machine, but enable SSH connection from host to virtual machine.

7. Once successfully connected using ip via SSH connection, revert the static ip to dynamic using the YaST control center. This time both internet and putty SSH connection will work.

Thursday, 7 April 2016

Some important URLs for technical library

IBM DeveloperWorks - Forum
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000825

IBM DeveloperWorks - Technical Library
http://www.ibm.com/developerworks/views/data/libraryview.jsp?type_by=All+Types&search_by=InfoSphere+DataStage

IBM Knowledge Center - DataStage and QualityStage
https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/
com.ibm.swg.im.iis.ds.nav.doc/containers/cont_iisinfsrv_ds_and_qs.html

Wednesday, 6 April 2016

IBM Infosphere QualityStage simplified

IBM Infosphere server quality stage is part of information server suit and its available with additional license. Buying QualityStage will enable us to use more stages (plus match specification and rule designer, for use in these stages).. that’s it.. Everything else is same and part of DataStage..
Do not confuse with Information Analyzer and Data Rules stage is not part of QualityStage, its part of Information Analyzer.

What are the additional stages comes with QualityStage

1. Investigate

2. Standardize

3. One-source Match

4. Two-source Match

5. Match Frequency

6. Survive

7. SQA (Standardization Quality Assessment)

8. MNS (Multinational Standardization)

If we understand steps in the data quality assurance process, then understanding these stages will be easy. Data quality process steps and corresponding QualityStage Stages are

While working on any DataQuality stages, its important to understand single domain column and free form columns

Single Domain columns represent one specific business attribute… for example first name or customer id…

Free Form columns contains free text, combination of many attributes.. for example name, it could contain first name and last name or first, last and middle name… similarly address, it could contain only the house number and street name or entire address along with post code.

Analyzing single domain columns is easy because we upfront know what data the field contains. But analyzing free form fields is not that easy because we need to first set some rules expecting what free form field might contains.. i.e. to analyses the free form fields, we need to define set of rules (using rule designer). The rules work on the fact that free form field is nothing but bunch of tokens and then we map each token to a pattern, then decide output action when token is as expected (i.e. conformed to the pattern)

To standardise single domain columns, we may need rule set in some cases for example if we want to lookup correct value or spell correct or change description to acronym or we could just use modify, transformer stage to deal with nulls or change the format etc...

DataQuality Standardise stage normally used to standardise the free form column to split the free form columns into single domain columns. We need rule sets for this purpose.

Following quality stage use the Rule sets

· Investigate

· Standardize

· Multinational Standardize (MNS)

Rule set contains

· Classifications

· Output Columns

· Rules

o Rules basically define the input patterns and the actions when pattern matches. As part of this pattern matching we could use pattern specification, classifications tables, look up tables

o We can specify what action to be taken when pattern matches and what output to be written to output columns

When it comes to matching process,

Matching is nothing but comparing two records and checking if both records are duplicates are not. Then we can use survive stage to keep one record and drop other record in the duplicate pair.

Match stages take match specification as input and match specification could be created using match specification designer.

Match specification tells,

· On which keys we need to match

· What are group by columns in order to divide total records into blocks to operate on blocks first instead of operating on entire data set. Dividing entire data into blocks in very important. Because we are not trying to identify only 100% matched records. Two records match by 20% may be good enough for us to decide these records may be duplicates and need to be reviewed by someone before confirming these are duplicates. So we say 20% match are clerical match type. Therefore in order to identify which one is matching which, we need to first divide data into blocks where we are likely to find the duplicates

· How many passes we need to run the match process before finalizing the match weightage?

More details about quality stage could be found at below links

http://www.ibm.com/developerworks/data/library/techarticle/dm-1505duplicates-infosphere-qualitystage/index.html

http://bigpaathshala.blogspot.com/2014/10/investigate-stage-overview.html

Tuesday, 5 April 2016

IBM Infosphere datastage - Command Line interface commands for administration

Common administering or administration tasks performed by datastage admin

1. Managing the assets and asset migration (example exporting security assets example users and user groups)

a. ISTOOL à

istool is a command line interface. This is present in both client and engine tiers. The location if CLI is /opt/IBM/InformationServer/Clients/istools/cli

Istool is used for managing assets, i.e. build package, send or deploy package, import/export assets etc

Example:

Istool export -security '-securityUser -userident "*" -includeRoles'

-domain host:port -username user

-password password -archive ExportISFSecurityRoles.isx

2. Enable or disable product features or set/unset the product configuration properties

a. IISADMIN à

Use iisadmin can be used to activate or deactivate an edition or feature pack. More details about iisadmin could found in below link

https://www.ibm.com/support/knowledgecenter/SSZJPZ_11.5.0/com.ibm.swg.im.iis.found.admin.common.doc/topics/r_iisadmincommand.html

for example, if we want to see what features are active then we could use below command

./iisAdmin.sh -display
-key com.ibm.iis.datastage.license.*

Results..

com.ibm.iis.datastage.license.option.parallel=1

com.ibm.iis.datastage.license.option.qualitystage=0

above results indicate
that qualitystage is not enabled.

3. Administering services tier i.e. starting or stopping services

a. Starting services

i. First start the application server

1. Go to /opt/IBM/InformationServer/ASBServer/bin

2. Run “./MetadataServer.sh start” or “./MetadataServer.sh run”

3. run echos the output (i.e. runs in forground) and start runs in background

ii. Then start engine service

1. Go to /opt/IBM/InformationServer/Server/DSEngine

2. ./bin/uv -admin -start

iii. Then start ASB Agent

1. Go to /opt/IBM/InformationServer/ASBNode/bin

2. ./NodeAgents.sh start

b. Shutting down or stopping services

i. First stop the datastage engine services (Metadata Server services, DSRPC Server etc)

1. Go to /opt/IBM/InformationServer/Server/DSEngine

2. ./bin/uv -admin –stop

ii. Then stop agents

1. Go to /opt/IBM/InformationServer/ASBNode/bin

2. ./NodeAgents.sh stop

iii. Then stop application server

1. Go to /opt/IBM/InformationServer/ASBServer/bin

2. ./MetadataServer.sh stop

4. AppServerAdmin Command

a. Run this command whenever admin user account password is changed so that new password is reflected across all the Information Server suite components configuration

5. SessionAdmin command

a. Use this to manage and monitor the active sessions

6. DirectoryAdmin tool

a. Use this to access metadata repository and user registry, and complete variety of actions on user registry including adding new user, changing password, deleting user/group, changing the user role etc..

7. DirectoryCommand tool

a. Similar to directoryadmin but not same. This can be used to add/delete users/groups etc

8. Encrypt

a. Use this command to encrypt the user credentials

9. Orchadmin

a. Use this command to research on dataset.
ORCHADMIN CHECK -- Check configuration file

COPY, DELETE, DESCRIBE and TRUNCATE datasets

DUMP dataset data to readable format files

10. What is ASB agent

a. This is the back group process that conveys the requests from client tier to service tier.

11. What is dsrpcd (DSRPC Service)

a. This service allows client tier to connect to server engine.

12. Another very important command is “dsjob”, though administrator do not normally use this command, its very important to know and understand how dsjob works.

Search This Blog