Wednesday, June 26, 2024

AZURE DATA FACTORY Q&A

 1. What are the major components of ADF
Integration Runtime, Linked services, Dataset,Triggeres, Activities, Pipelines 
2. what are diff types of Integration runtime available.
IR provides compute infrastructure to execute pipeline activities to diff networks using linked services.
Selfhosted-- Connecting to on premises, Private network or virtual network.
Auto resolved or Azure IR-- Cloud based, Auto resolved is default one. Azure IR is developer crated and it can be customised.
SSIS IR-- This is the host the ssis project on cloud and schedule in Azure data factory.

3. what are linked services.
Linked services are storing of connection information and establishing connection information to data sources with the help of IR.
4. What are datasets.
Datasets are preparation of data with the help of LS based on activity
5. what are the activities used in the DF in the project.
Copy data activity-- Simply copying data from source to destination. 
Get Metadata activity-- to get the metadata information like filename in folder,file exist or not, size of file.
Stored procedure activity--to execute any sql commands in SP, to call SP.
Web activity--To send email notification with the help of logic apps,
lookup activity--to extract data from any source and use that output as input to other activity.
if condition activity--based on condition expression true or false.
Set variables--to assign any value to variable.
Switch activity--multiple conditional based execution of activities.
Data flows--to perform ETL.
Wait

6. Did you use mapping flows. What are data transformations.
    source--connecting to any kind of data sources supported by data factory.    
1. Derived column-- to derive new columns based on expressions.
    2. Conditional split--loading data by splitting based on condition.
    3. Join-performing join based on inner,left,full,right,cross joins
    4. Lookup.--comparing two columns based on common columns find matched and unmatched records.
    5. Select--filter of columns,restricting of incoming columns.
    6. Filter--filtering of data or rows.
    7. Sort--sorting of data based on asc or desc in order
    8.union--combining or appending of two datasource.
    9.Window--to perform rownumber,rank,denserank,expressions.
    10 Aggregate--to group data and perform aggregations like min,max,sum...
    11. Alter row--identifying which records need to be insert and update-perform upsert operations
    12. New branch--incoming data can be duplicated into multiple destinations
    13. Surrogate key--to generate continous key values.
Sink--Loading data into any destination supported by data factory.

7. scenario- from onpremise to cloud while loading data which are the activities used .
on premise- copy data activtity to blob storage  or data lake-- use data flow to cloud db.

data flow activity can be used only when cloud db in source and destination. 
8. Can we use for each loop activity inside another for each loop activity i.e nested loop.
NO , we have to call for each activity in execute pipeline under for each loop activity.
9. Can we use for each loop activity inside another for each loop activity i.e nested loop.

Friday, June 14, 2024

Blocking script

 𝘞𝘐𝘛𝘏 𝘤𝘵𝘦𝘉𝘓 (𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥, 𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘵𝘩𝘦𝘴𝘦) 𝘈𝘚

(𝘚𝘌𝘓𝘌𝘊𝘛 𝘴.𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥, 𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘵𝘩𝘦𝘴𝘦 = 𝘹.𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘵𝘩𝘦𝘴𝘦 𝘍𝘙𝘖𝘔 𝘴𝘺𝘴.𝘥𝘮_𝘦𝘹𝘦𝘤_𝘴𝘦𝘴𝘴𝘪𝘰𝘯𝘴 𝘴
𝘊𝘙𝘖𝘚𝘚 𝘈𝘗𝘗𝘓𝘠 (𝘚𝘌𝘓𝘌𝘊𝘛 𝘪𝘴𝘯𝘶𝘭𝘭(𝘤𝘰𝘯𝘷𝘦𝘳𝘵(𝘷𝘢𝘳𝘤𝘩𝘢𝘳(6), 𝘦𝘳.𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥),'') + ', ' 
       𝘍𝘙𝘖𝘔 𝘴𝘺𝘴.𝘥𝘮_𝘦𝘹𝘦𝘤_𝘳𝘦𝘲𝘶𝘦𝘴𝘵𝘴 𝘢𝘴 𝘦𝘳
       𝘞𝘏𝘌𝘙𝘌 𝘦𝘳.𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥 = 𝘪𝘴𝘯𝘶𝘭𝘭(𝘴.𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥 ,0)
       𝘈𝘕𝘋 𝘦𝘳.𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥 <> 0
       𝘍𝘖𝘙 𝘟𝘔𝘓 𝘗𝘈𝘛𝘏('') ) 𝘈𝘚 𝘹 (𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘵𝘩𝘦𝘴𝘦)
)
𝘚𝘌𝘓𝘌𝘊𝘛 𝘴.𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥, 𝘣𝘭𝘰𝘤𝘬𝘦𝘥_𝘣𝘺 = 𝘳.𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥, 𝘣𝘭.𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘵𝘩𝘦𝘴𝘦
, 𝘣𝘢𝘵𝘤𝘩_𝘵𝘦𝘹𝘵 = 𝘵.𝘵𝘦𝘹𝘵, 𝘪𝘯𝘱𝘶𝘵_𝘣𝘶𝘧𝘧𝘦𝘳 = 𝘪𝘣.𝘦𝘷𝘦𝘯𝘵_𝘪𝘯𝘧𝘰, *
𝘍𝘙𝘖𝘔 𝘴𝘺𝘴.𝘥𝘮_𝘦𝘹𝘦𝘤_𝘴𝘦𝘴𝘴𝘪𝘰𝘯𝘴 𝘴
𝘓𝘌𝘍𝘛 𝘖𝘜𝘛𝘌𝘙 𝘑𝘖𝘐𝘕 𝘴𝘺𝘴.𝘥𝘮_𝘦𝘹𝘦𝘤_𝘳𝘦𝘲𝘶𝘦𝘴𝘵𝘴 𝘳 𝘰𝘯 𝘳.𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥 = 𝘴.𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥
𝘐𝘕𝘕𝘌𝘙 𝘑𝘖𝘐𝘕 𝘤𝘵𝘦𝘉𝘓 𝘢𝘴 𝘣𝘭 𝘰𝘯 𝘴.𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥 = 𝘣𝘭.𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥
𝘖𝘜𝘛𝘌𝘙 𝘈𝘗𝘗𝘓𝘠 𝘴𝘺𝘴.𝘥𝘮_𝘦𝘹𝘦𝘤_𝘴𝘲𝘭_𝘵𝘦𝘹𝘵 (𝘳.𝘴𝘲𝘭_𝘩𝘢𝘯𝘥𝘭𝘦) 𝘵
𝘖𝘜𝘛𝘌𝘙 𝘈𝘗𝘗𝘓𝘠 𝘴𝘺𝘴.𝘥𝘮_𝘦𝘹𝘦𝘤_𝘪𝘯𝘱𝘶𝘵_𝘣𝘶𝘧𝘧𝘦𝘳(𝘴.𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥, 𝘕𝘜𝘓𝘓) 𝘈𝘚 𝘪𝘣
𝘞𝘏𝘌𝘙𝘌 𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘵𝘩𝘦𝘴𝘦 𝘪𝘴 𝘯𝘰𝘵 𝘯𝘶𝘭𝘭 𝘰𝘳 𝘳.𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥 > 0
𝘖𝘙𝘋𝘌𝘙 𝘉𝘠 𝘭𝘦𝘯(𝘣𝘭.𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘵𝘩𝘦𝘴𝘦) 𝘥𝘦𝘴𝘤, 𝘳.𝘣𝘭𝘰𝘤𝘬𝘪𝘯𝘨_𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥 𝘥𝘦𝘴𝘤, 𝘳.𝘴𝘦𝘴𝘴𝘪𝘰𝘯_𝘪𝘥;