Business Intelligence: 2012

Wednesday, September 19, 2012

Reset Sequence Generator Initial value through Informatica Repository DB schema

Have you faced a situation where a Informatica mapping fails because the Sequence generator is inserting duplicate surrogate keys into the dimension?
The obvious way to fix it is to set the next_value in sequence generator in the Informatica mapping. Value of next_value > Max(surrogate_key_value).
What if we do not have access to Informatica mapping ? One way is to get into the informatica repository db and change it directly on the tables.

Step 1 - Connect to the Informatica repository schema

Step 2 - Find the Widget ID of the sequence generator
SELECT * FROM REP_WIDGET_INST where instance_name like '%<Sequence generator Name>%' and WIDGET_TYPE_NAME = 'Sequence' ;

Step 3 - Set the Next_value using the widget ID

UPDATE OPB_WIDGET_ATTR SET ATTR_VALUE = ATTR_VALUE + 100000000 WHERE ATTR_ID = 4 AND WIDGET_TYPE = 7 AND widget_id = <Widget_ID>;
COMMIT;

Tuesday, September 11, 2012

ETL: How to select one row for a set of columns, in Informatica ?

There was a Informatica design problem i had to solve, as part of the solution i had find a way to select one row for a set of columns.
Here's the scenario

Col1 Col2 Col3 Col4
A 1 a1 a11
A 1 a1 a11
A 2 a1 a11
A 2 a1 a11
B 2 b2 b22

I have to select one record for a set of values in Col1 and Col2.

Solution - Use an Aggregator transformation, Select Col1 and Col2 as group by column. An aggregator transformation will select one record for the selected group by columns, the last record by design.

The output of the Aggregator transformation will be like this.

Col1 Col2 Col3 Col4
A 1 a1 a11
A 2 a1 a11
B 2 b2 b22

Note - We cannot achieve this by using a group by function in DB, thus making this feature of Aggregator unique.

Friday, August 17, 2012

Linux: How to find out the port number that DAC Server is using?

Have you faced issues when trying to restart DAC server? One of the reasons being there is already a server running on port 3141.
The solution of this is to bring down the java process running on Linux machine and start the DAC server. For this we need to find out the java process and it's PID, the below command will help us to locate it.

ps -aef | grep <user>

Where <user> is the user that is running the DAC server

Thursday, July 19, 2012

Steps to start an Execution Plan in Oracle DAC

What are the steps to follow while starting an Execution Plan in Oracle DAC?

Note: This is not a complete end to end steps to use Oracle DAC. These steps can be followed when Most of the ETL development is done, all tasks are included in DAC, Subject Areas created, Execution Plan is built. Development Cycle and Stage - Bug fix / Enhancements / UAT.

We need to check for configurations in DAC and in Informatica before starting the load.

DAC Configurations

Setup Tab

Check the Informatica server connections, is it pointing to the server that has the latest code/ required fixes? If not, then change the configurations and Test connection.
Check the Source DB and Target DB connections, are they pointing to the required DBs? If not, then change the configurations and Test connection.
Also check the table space for creating indexes.

Design Tab

Date parameters must be checked, if you are doing a full load then check on the Initial extract date, Snapshot dates etc. and set the required date.
Assemble Subject area if needed. Did u change anything at the task level and did not assemble the SA?

Execution tab

Regenerate the parameters and rebuild the Execution Plan if needed
Are you doing a full load? Then reset the sources. Right click on the EP and select the option to reset sources.
Do not start the load yet, we need to do some checks on Informatica

Informatica Configuration

Check the connections for source and target db. Ask yourself these questions -Do the connection names match with the ones in DAC? and are they pointing to the same db that DAC is pointing to? If any of the answers is a No then enter the correct details

Execution plan kick off

Now we are ready for starting the Execution plan, go ahead and start it :-)

Friday, July 6, 2012

Configuration Tags in Oracle DAC

How can we control the inclusion and exclusion of tasks in a Subject Area or EP in DAC?

So I have a task that should not be executed in an EP but should be included in another EP that i want, how do I do this? The answer is configuration tags!

What is a configuration tag?
By Oracle docs definition - A configuration tag is an object that controls the inclusion of tasks in subject areas. When a task is tagged, it is not eligible to be included in the collection of tasks for any subject area, unless the tag is part of the subject area definition "Include Task" property.

In simple terms,
If we do not want a task/ tasks into a SA, create a new configuration tag and add this task / tasks to it. Then, select SA into which we do not want these tasks, simply Assemble it once again and we can see that the task will not be included in the SA.
Select the Subject Area into which we want the tasks to be included, add the configuration tag and assemble the SA.

Monday, July 2, 2012

Steps to integrate a new task in DAC

Add all tasks, Synchronize Mappings
Add all source tables
Add all target tables and import columns in all tables in single shot
Create all indexes in DB then import them at a time

Monday, May 7, 2012

General OBIA ETL Standards

This is an extract from notes

Ø Check the box “Fail parent if this task fails” of session check box in workflow

Ø Check the box “Fail parent if this task does not run” of session check box in workflow

Ø Make all the sessions as reusable

Ø Remove the defualt value for $$DATASOURCE_NUM_ID & $$ TENANT_ID in the mappings after your unit testing is complete.

Ø Keep the session parameter $PMSessionLogFile for Session Log file

Ø Remove hard coded value in "Target Table Name" from session.

Ø Remove hard coded value in "Source Table Name" and “Owner Name” from session.

Ø Keep the session parameter $PMSessionLogFile for Session Log file

Ø Remove hard coded value from Target Table Name in session.

Ø DAC will create and assign the following parameters and values at the time of execution.

· PSFT Source Connection parameter is $DBConnection_OLTP

· BAW Target connection parameter is $DBConnection_OLAP

Assign above parameters for the following connections accordingly after unit testing is done manually.

o $Source in Session Properties

o Source table Connection in Session mapping parameters

o $Target in Session properties

o Target table Connection in Session mapping parameters

Ø Use Bulk mode as Target Load type for all the targets when ever truncating the tables. Usually most of the staging tables will be truncated always as well as all tables will be truncated as part of full load.

Ø Use Normal mode as Target Load type for all tables when tables are not truncated as part of incremental loads.

Ø Set “Stop on errors” value to 1 in sessions.

Ø Rename the Source Object groups to WC_OLAP & WC_OLTP if the source objects are not available in OOTB. Copy them from OOTB folder if they are already available.

Ø Few workflows named like “_Full” and few are like “_full”. One standard “_Full” is recommended.

Ø We have observed most of the mappings are pretty straight forward. Try to come up with Mapplets when ever the mappings are complex.

Ø Separate Workflow/session for Full load is required only when there is a difference like Diff SQL Override or Change in the load type (Bulk/Normal).

Ex: SDE Audience task is staging mapping (always truncate) and no difference is SQL override aslo. So separate session/workflow is not required for full load.

Ø Few sessions are still invalid in the workflows. Please refer the attached screenshot.

Wednesday, March 28, 2012

Copy Oracle Table from one schema to another

Do you want to copy a table and it's data from one oracle schema to another?

It can be done by using the GRANT command. Grant all/ required privileges on the object(table) and use the Create table command in the second schema.

By doing this table structure and the data will be transferred from 1st schema to 2nd schema.

Example :

SCHEMA1.TABLE1 is to be moved to SCHEMA2.

Connect to schema1 and execute the following command.
GRANT ALL ON SCHEMA1.TABLE1 TO SCHEMA2

Note - Complete Grant command definition can be found here -
http://docs.oracle.com/cd/B19306_01/server.102/b14200/statements_9013.htm#SQLRF01603

Now connect to schema2 and execute the following command.
CREATE TABLE TABLE1 AS SELECT * FROM SCHEMA1.TABLE1;

Monday, March 12, 2012

SQL Query to remove duplicates

During ETL development we come across duplicate records regularly, in order to maintain the uniqueness we need to remove the duplicates at the same time retain one record so that we do not loose data. I found a simple query for this, here it is.

Delete from
table_name a
where
rowid >
(select min(rowid) from table_name b
where
a.unique_column=b.unique_column
);