1) Define Data Stage?
A data stage is basically a tool that is used
to design, develop and execute various applications to fill multiple tables in
data warehouse or data marts. It is a program for Windows servers that extracts
data from databases and change them into data warehouses. It has become an
essential part of IBM WebSphere Data Integration suite.
2) Explain how a source file is populated?
We can populate a source file in many ways
such as by creating a SQL query in Oracle, or by using row generator
extract tool etc.
3) Name the command line functions to import and export the DS jobs?
To import the DS jobs, dsimport.exe is used
and to export the DS jobs, dsexport.exe is used.
4) What is the difference between Datastage 7.5 and 7.0?
In Datastage 7.5 many new stages are added for
more robustness and smooth performance, such as Procedure Stage, Command Stage,
Generate Report etc.
5) In Datastage, how you can fix the truncated data error?
The truncated data error can be fixed by using
ENVIRONMENT VARIABLE ‘ IMPORT_REJECT_STRING_FIELD_OVERRUN’.
6) Define Merge?
Merge means to join two or more tables. The
two tables are joined on the basis of Primary key columns in both the tables.
7) Differentiate between data file and descriptor file?
As the name implies, data files contains the
data and the descriptor file contains the description/information about the
data in the data files.
8) Differentiate between datastage and informatica?
In datastage, there is a concept of partition,
parallelism for node configuration. While, there is no concept of partition and
parallelism in informatica for node configuration. Also, Informatica is more
scalable than Datastage. Datastage is more user-friendly as compared to
Informatica.
9) Define Routines and their types?
Routines are basically collection of functions
that is defined by DS manager. It can be called via transformer stage. There
are three types of routines such as, parallel routines, main frame routines and
server routines.
10) How can you write parallel routines in datastage PX?
We can write parallel routines in C or C++
compiler. Such routines are also created in DS manager and can be called from
transformer stage.
11) What is the method of removing duplicates, without the remove duplicate stage?
Duplicates can be removed by using Sort stage.
We can use the option, as allow duplicate = false.
12) What steps should be taken to improve Datastage jobs?
In order to improve performance of Datastage
jobs, we have to first establish the baselines. Secondly, we should not use
only one flow for performance testing. Thirdly, we should work in increment.
Then, we should evaluate data skews. Then we should isolate and solve the
problems, one by one. After that, we should distribute the file systems to
remove bottlenecks, if any. Also, we should not include RDBMS in start of
testing phase. Last but not the least, we should understand and assess the
available tuning knobs.
13) Differentiate between Join, Merge and Lookup stage?
All the three concepts are different from each
other in the way they use the memory storage, compare input requirements and
how they treat various records. Join and Merge needs less memory as compared to
the Lookup stage.
14) Explain Quality stage?
Quality stage is also known as Integrity
stage. It assists in integrating different types of data from various sources.
15) Define Job control?
Job control can be best performed by using Job
Control Language (JCL). This tool is used to execute multiple jobs
simultaneously, without using any kind of loop.
No comments:
Post a Comment