IBM Infosphere server quality stage is part of information
server suit and its available with additional license. Buying QualityStage
will enable us to use more stages (plus match specification and rule designer,
for use in these stages).. that’s it.. Everything else is same and part of
DataStage..
Do not confuse with Information Analyzer and Data Rules stage is not part of QualityStage, its part of Information Analyzer.
Do not confuse with Information Analyzer and Data Rules stage is not part of QualityStage, its part of Information Analyzer.
What are the additional stages comes with QualityStage
1.
Investigate
5.
Match Frequency
6.
Survive
If we understand steps in the data quality assurance
process, then understanding these stages will be easy. Data quality process
steps and corresponding QualityStage Stages are
While working on any DataQuality stages, its important to
understand single domain column and free form columns
Single Domain columns represent one
specific business attribute… for example first name or customer id…
Free Form columns contains free
text, combination of many attributes.. for example name, it could contain first
name and last name or first, last and middle name… similarly address, it could
contain only the house number and street name or entire address along with post
code.
Analyzing single domain columns is easy because we upfront
know what data the field contains. But analyzing free form fields is not that easy
because we need to first set some rules expecting what free form field might
contains.. i.e. to analyses the free
form fields, we need to define set of rules (using rule designer). The rules work on the fact that free form
field is nothing but bunch of tokens and then we map each token to a pattern,
then decide output action when token is as expected (i.e. conformed to the
pattern)
To standardise single domain columns, we may need rule set
in some cases for example if we want to lookup correct value or spell correct
or change description to acronym or we could just use modify, transformer stage
to deal with nulls or change the format etc...
DataQuality Standardise stage normally used to standardise
the free form column to split the free form columns into single domain columns.
We need rule sets for this purpose.
Following quality stage use the Rule sets
·
Investigate
·
Standardize
·
Multinational Standardize (MNS)
Rule set contains
·
Classifications
·
Output Columns
·
Rules
o
Rules basically define the input patterns and
the actions when pattern matches. As part of this pattern matching we could use
pattern specification, classifications tables, look up tables
o
We can specify what action to be taken when
pattern matches and what output to be written to output columns
When it comes to matching process,
Matching is
nothing but comparing two records and checking if both records are duplicates
are not. Then we can use survive stage to keep one record and drop other record
in the duplicate pair.
Match stages take match specification as input and match
specification could be created using match specification designer.
Match specification tells,
·
On which keys we need to match
·
What are group by columns in order to divide
total records into blocks to operate on blocks first instead of operating on
entire data set. Dividing entire data into blocks in very important. Because we
are not trying to identify only 100% matched records. Two records match by 20%
may be good enough for us to decide these records may be duplicates and need to
be reviewed by someone before confirming these are duplicates. So we say 20%
match are clerical match type. Therefore in order to identify which one is
matching which, we need to first divide data into blocks where we are likely to
find the duplicates
·
How many passes we need to run the match process
before finalizing the match weightage?
More details about quality stage could be found at below
links
I really appreciate information shared above. It’s of great help. If someone want to learn Online (Virtual) instructor lead live training in IBM Information Analyzer.kindly contact us http://www.maxmunus.com/contact
ReplyDeleteMaxMunus Offer World Class Virtual Instructor led training on IBM Information Analyzer. We have industry expert trainer. We provide Training Material and Software Support. MaxMunus has successfully conducted 100000+ trainings in India, USA, UK, Australlia, Switzerland, Qatar, Saudi Arabia, Bangladesh, Bahrain and UAE etc.
For Free Demo Contact us:
Name : Arunkumar U
Email : arun@maxmunus.com
Skype id: training_maxmunus
Contact No.-+91-9738507310
Company Website –http://www.maxmunus.com
Very simple but, clear to grab details quickly about Quality Stage...Thank you
ReplyDeletelinux online courses
ReplyDeleteetl testing online courses
web methods online courses
business analyst online course
oracle adf online course