Thursday 26 April 2018

DataStage Interview Questions

1) Define Data Stage?

A data stage is basically a tool that is used to design, develop and execute various applications to fill multiple tables in data warehouse or data marts. It is a program for Windows servers that extracts data from databases and change them into data warehouses. It has become an essential part of IBM WebSphere Data Integration suite.

2) Explain how a source file is populated?

We can populate a source file in many ways such as by creating a SQL query in Oracle, or  by using row generator extract tool etc.

3) Name the command line functions to import and export the DS jobs?

To import the DS jobs, dsimport.exe is used and to export the DS jobs, dsexport.exe is used.

4) What is the difference between Datastage 7.5 and 7.0?

In Datastage 7.5 many new stages are added for more robustness and smooth performance, such as Procedure Stage, Command Stage, Generate Report etc.

5) In Datastage, how you can fix the truncated data error?

The truncated data error can be fixed by using ENVIRONMENT VARIABLE ‘ IMPORT_REJECT_STRING_FIELD_OVERRUN’.

6) Define Merge?

Merge means to join two or more tables. The two tables are joined on the basis of Primary key columns in both the tables.

7) Differentiate between data file and descriptor file?

As the name implies, data files contains the data and the descriptor file contains the description/information about the data in the data files.

8) Differentiate between datastage and informatica?

In datastage, there is a concept of partition, parallelism for node configuration. While, there is no concept of partition and parallelism in informatica for node configuration. Also, Informatica is more scalable than Datastage. Datastage is more user-friendly as compared to Informatica.

9) Define Routines and their types?

Routines are basically collection of functions that is defined by DS manager. It can be called via transformer stage. There are three types of routines such as, parallel routines, main frame routines and server routines.

10) How can you write parallel routines in datastage PX?

We can write parallel routines in C or C++ compiler. Such routines are also created in DS manager and can be called from transformer stage.

11) What is the method of removing duplicates, without the remove duplicate stage?

Duplicates can be removed by using Sort stage. We can use the option, as allow duplicate = false.

12) What steps should be taken to improve Datastage jobs?

In order to improve performance of Datastage jobs, we have to first establish the baselines. Secondly, we should not use only one flow for performance testing. Thirdly, we should work in increment. Then, we should evaluate data skews. Then we should isolate and solve the problems, one by one. After that, we should distribute the file systems to remove bottlenecks, if any. Also, we should not include RDBMS in start of testing phase. Last but not the least, we should understand and assess the available tuning knobs.

13) Differentiate between Join, Merge and Lookup stage?

All the three concepts are different from each other in the way they use the memory storage, compare input requirements and how they treat various records. Join and Merge needs less memory as compared to the Lookup stage.

14) Explain Quality stage?

Quality stage is also known as Integrity stage. It assists in integrating different types of data from various sources.

15) Define Job control?

Job control can be best performed by using Job Control Language (JCL). This tool is used to execute multiple jobs simultaneously, without using any kind of loop.

More about DataStage:

No comments:

Post a Comment