Wednesday, June 26, 2024

AZURE DATA FACTORY Q&A

 1. What are the major components of ADF
Integration Runtime, Linked services, Dataset,Triggeres, Activities, Pipelines 
2. what are diff types of Integration runtime available.
IR provides compute infrastructure to execute pipeline activities to diff networks using linked services.
Selfhosted-- Connecting to on premises, Private network or virtual network.
Auto resolved or Azure IR-- Cloud based, Auto resolved is default one. Azure IR is developer crated and it can be customised.
SSIS IR-- This is the host the ssis project on cloud and schedule in Azure data factory.

3. what are linked services.
Linked services are storing of connection information and establishing connection information to data sources with the help of IR.
4. What are datasets.
Datasets are preparation of data with the help of LS based on activity
5. what are the activities used in the DF in the project.
Copy data activity-- Simply copying data from source to destination. 
Get Metadata activity-- to get the metadata information like filename in folder,file exist or not, size of file.
Stored procedure activity--to execute any sql commands in SP, to call SP.
Web activity--To send email notification with the help of logic apps,
lookup activity--to extract data from any source and use that output as input to other activity.
if condition activity--based on condition expression true or false.
Set variables--to assign any value to variable.
Switch activity--multiple conditional based execution of activities.
Data flows--to perform ETL.
Wait

6. Did you use mapping flows. What are data transformations.
    source--connecting to any kind of data sources supported by data factory.    
1. Derived column-- to derive new columns based on expressions.
    2. Conditional split--loading data by splitting based on condition.
    3. Join-performing join based on inner,left,full,right,cross joins
    4. Lookup.--comparing two columns based on common columns find matched and unmatched records.
    5. Select--filter of columns,restricting of incoming columns.
    6. Filter--filtering of data or rows.
    7. Sort--sorting of data based on asc or desc in order
    8.union--combining or appending of two datasource.
    9.Window--to perform rownumber,rank,denserank,expressions.
    10 Aggregate--to group data and perform aggregations like min,max,sum...
    11. Alter row--identifying which records need to be insert and update-perform upsert operations
    12. New branch--incoming data can be duplicated into multiple destinations
    13. Surrogate key--to generate continous key values.
Sink--Loading data into any destination supported by data factory.

7. scenario- from onpremise to cloud while loading data which are the activities used .
on premise- copy data activtity to blob storage  or data lake-- use data flow to cloud db.

data flow activity can be used only when cloud db in source and destination. 
8. Can we use for each loop activity inside another for each loop activity i.e nested loop.
NO , we have to call for each activity in execute pipeline under for each loop activity.
9. Can we use for each loop activity inside another for each loop activity i.e nested loop.

No comments:

Post a Comment