How to deploy Azure Data Factory pipeline and its dependencies programatically using PowerShell

Since ADF doesn’t provide a built-in way for automated deployment, in order to do this you have to write a custom script. Then you can run it on build server such as VSTS or TeamCity.

If you’re in the interactive mode, you need to login and select the corresponding subscription. If you run the script on build server, the subscription should be selected for you automatically.

Login-AzureRmAccount
Select-AzureRmSubscription -SubscriptionName YourSubName

Here’s the parameters you can’t get programmatically so have to specify explicitly. Last parameter is a path to the JSON files that represent the resources to be deployed: linked services, datasets, and pipelines.

param(
    [Parameter(Mandatory=$true)][string]$ResourceGroupName,
    [Parameter(Mandatory=$true)][string]$Location,
    [Parameter(Mandatory=$true)][string]$DataFactoryName,
    [Parameter(Mandatory=$true)][string]$InputFolderPath
)

First, create the factory itself. Flag -Force helps to continue on error in case it’s already exists. Then load it into a variable:

New-AzureRmDataFactory -ResourceGroupName $ResourceGroupName -Name $DataFactoryName -Location $Location -Force -ErrorAction Stop
$dataFactory = Get-AzureRmDataFactory -ResourceGroupName $ResourceGroupName -Name $DataFactoryName -ErrorAction Stop

Next three loops read files based on the following convention:

  • Linked services start with LinkedService, e.g. LinkedService-SQL.json
  • Datasets start with Dataset, e.g. Dataset-Orders.json
  • Pipelines start with Pipeline, e.g. Pipeline-CopyOrders.json

First, create linked services. If any references a gateway then create it as well. Since the cmdlet doesn’t support flag -Force, we can use -ErrorAction Continue to continue on error in case it’s already exists:

$files = Get-ChildItem $InputFolderPath -Recurse -Include *.json -Filter LinkedService* -ErrorAction Stop
foreach ($file in $files)  
{
    Write-Output "Creating linked services from $($file.FullName)"
    New-AzureRmDataFactoryLinkedService -DataFactory $dataFactory -File $file.FullName -Force -ErrorAction Stop

    $json = Get-Content $file.FullName -Raw -ErrorAction Stop
    $svc = $json | ConvertFrom-Json
    $gwName = $svc.properties.typeProperties.gatewayName
    if ($gwName)
    {
        Write-Output "Creating gateway $($gwName) from $($file.FullName)" 
        New-AzureRmDataFactoryGateway -DataFactory $dataFactory -Name $gwName -ErrorAction Continue
    }
}

Then create datasets:

$files = Get-ChildItem $InputFolderPath -Recurse -Include *.json -Filter Dataset* -ErrorAction Stop
foreach ($file in $files)  
{
    Write-Output "Creating dataset from $($file.FullName)"
    New-AzureRmDataFactoryDataset -DataFactory $dataFactory -File $file.FullName -Force -ErrorAction Stop 
}

And finally, pipelines:

$files = Get-ChildItem $InputFolderPath -Recurse -Include *.json -Filter Pipeline* -ErrorAction Stop
foreach ($file in $files)  
{
    Write-Output "Creating pipeline from $($file.FullName)"
    New-AzureRmDataFactoryPipeline -DataFactory $dataFactory -File $file.FullName -Force -ErrorAction Stop 
}

That’s it. By this time all pipelines should be deployed, verified, and started. Happy data movement!

This entry was posted in Programming and tagged , . Bookmark the permalink.

44 Responses to How to deploy Azure Data Factory pipeline and its dependencies programatically using PowerShell

  1. Arunachalam S says:

    Hello,
    I need to do build continous integration using VSTS and use this powershell for deploying my linked services, datasets, pipelines etc. But in the powershell script what is the input folder path i need to specify as I ADF files are in VSTS..i tried giving my local system path and it was throwing an error.

    • abatishchev says:

      First you need to run msbuild on your ADF project (adfproj), this will merge environment configs and produce respective folder, e.g. bin\Debug\Dev. Then take the content of Dev and deploy it using the PowerShell script. Its path is one of the input parameters.

  2. Arunachalam S says:

    this inputfolderpath i can specify if I’m deploying manually..but since im doing this through VSTS using powershell and building continous deployment..im not sure how this will pick the ADF files..any pointers for this would help..

    • abatishchev says:

      There is no difference between building locally and using VSTS. It just executes commands in given order which you configure using a web UI. The path is on a build agent. All same. This script I’m using at work to build and deploy using same VSTS as you use.

      • Arunachalam S says:

        Thanks for your reply. this is the PowerShell script I’m using for deploying from my local PowerShell:

        foreach($file in Get-ChildItem "C:\Users\a.sivananthan\Documents\Visual Studio 2015\Projects\ADFAutomation2018\ADFAutomation2018\" -filter "LinkedService")
        {
          New-AzureRmDataFactoryLinkedService -ResourceGroupName "myresource1" -DataFactoryName "ADFAutomationjan2018" -Name $file.BaseName -File $file.FullName -Force | Format-List
        }
        foreach($file in Get-ChildItem "C:\Users\a.sivananthan\Documents\Visual Studio 2015\Projects\ADFAutomation2018\ADFAutomation2018\" -filter "DataSet")
        {
         New-AzureRmDataFactoryDataset -ResourceGroupName "myresource1" -DataFactoryName "ADFAutomationjan2018" -Name $file.BaseName -File $file.FullName -Force | Format-List
        }
        foreach($file in Get-ChildItem "C:\Users\a.sivananthan\Documents\Visual Studio 2015\Projects\ADFAutomation2018\ADFAutomation2018\" -filter "Pipeline")
        {
         New-AzureRmDataFactoryPipeline -ResourceGroupName "myresource1" -DataFactoryName "ADFAutomationjan2018" -Name $file.BaseName -File $file.FullName -Force | Format-List
        }
        

        in this if i need to do the same thing using VSTS – continous deployment what should be the path? how will i find the path in builagent?. Im struggling to find this out as im new to VSTS and azure

  3. Arunachalam S says:

    I’m not sure about building my project using MSBUILD..do you have some links which provides steps for this..

    • abatishchev says:

      Use the dedicated task, point it to either sln or adfproj, not sure which one. The only caveat is that you need to check-in (put under the source control) the ADF build target file which is installed by VS somewhere under %LocalAppData%.

  4. Arunachalam S says:

    Hello, this is my build log file:

    2018-02-22T06:58:18.1439156Z ##[section]Starting: Build solution ***.sln
    2018-02-22T06:58:18.1640432Z ==============================================================================
    2018-02-22T06:58:18.1640649Z Task : Visual Studio Build
    2018-02-22T06:58:18.1640831Z Description : Build with MSBuild and set the Visual Studio version property
    2018-02-22T06:58:18.1640961Z Version : 1.126.0
    2018-02-22T06:58:18.1641080Z Author : Microsoft Corporation
    2018-02-22T06:58:18.1641237Z Help : More Information
    2018-02-22T06:58:18.1641378Z ==============================================================================
    2018-02-22T06:58:21.2726403Z ##[command]”D:\a_tasks\VSBuild_71a9a2d3-a98a-4caa-96ab-affca411ecda\1.126.0\ps_modules\MSBuildHelpers\vswhere.exe” -version [15.0,16.0) -latest -format json
    2018-02-22T06:58:22.2161642Z ##[command]”C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\MSBuild\15.0\Bin\msbuild.exe” “D:\a\1\s\ADFCICD1402.sln” /nologo /nr:false /dl:CentralLogger,”D:\a_tasks\VSBuild_71a9a2d3-a98a-4caa-96ab-affca411ecda\1.126.0\ps_modules\MSBuildHelpers\Microsoft.TeamFoundation.DistributedTask.MSBuild.Logger.dll”;”RootDetailId=8c7618ee-ebf7-4b9d-8642-4b0a294a2bd9|SolutionDir=D:\a\1\s”*ForwardingLogger,”D:\a_tasks\VSBuild_71a9a2d3-a98a-4caa-96ab-affca411ecda\1.126.0\ps_modules\MSBuildHelpers\Microsoft.TeamFoundation.DistributedTask.MSBuild.Logger.dll” /p:VisualStudioVersion=”15.0″ /p:_MSDeployUserAgent=”VSTS_7d845707-12c0-4a8d-8e30-4b61d5b5ca09_build_16_95″
    2018-02-22T06:58:28.3918074Z Building the projects in this solution one at a time. To enable parallel build, please add the “/m” switch.
    2018-02-22T06:58:32.0398442Z Build started 2/22/2018 6:58:31 AM.
    2018-02-22T06:58:37.0086183Z Project “D:\a\1\s\ADFCICD1402.sln” on node 1 (default targets).
    2018-02-22T06:58:37.0095953Z ValidateSolutionConfiguration:
    2018-02-22T06:58:37.0096347Z Building solution configuration “Debug|Any CPU”.
    2018-02-22T06:58:37.0109589Z ValidateProjects:
    2018-02-22T06:58:37.0109996Z The project “ADFCICD1402” is not selected for building in solution configuration “Debug|Any CPU”.
    2018-02-22T06:58:45.4896659Z Project “D:\a\1\s\ADFCICD1402.sln” (1) is building “D:\a\1\s\ADFAutomation\ADFAutomation.pssproj” (2) on node 1 (default targets).
    2018-02-22T06:58:45.4897294Z CleanupEmptyRefsFolder:
    2018-02-22T06:58:45.4898459Z Directory “D:\a\1\s\ADFAutomation\bin\Debug\” doesn’t exist. Skipping.
    2018-02-22T06:58:45.5234134Z Done Building Project “D:\a\1\s\ADFAutomation\ADFAutomation.pssproj” (default targets).
    2018-02-22T06:58:45.5874974Z Done Building Project “D:\a\1\s\ADFCICD1402.sln” (default targets).

    do i need to refer “D:\a\1\s\ADFAutomation\” this path in my PowerShell script?

    Correct me if I’m wrong. Thanks for your help.

      • Arunachalam S says:

        Thanks. I tried this yesterday but with no luck. See if you can provide any help in this.

        foreach($file in Get-ChildItem "$(Build.ArtifactStagingDirectory)" -filter LinkedService*)
        {
            New-AzureRmDataFactoryLinkedService -ResourceGroupName "ADFAutomationResource" -DataFactoryName "ADFCICD190218" -Name $file.BaseName -File $file.FullName -Force | Format-List
        }
        

        ADF files i took MSBUILD and used copyfiles to copy into “Build.ArtifactStagingDirectory” and then published the artifacts.
        I have this PowerShell script as a another project in my solution and trying to use this path Build.ArtifactStagingDirectory – but it throws an error stating:

        2018-02-23T06:56:44.5596446Z ##[error]The term ‘Build.ArtifactStagingDirectory’ is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is.

        Please let me know how to proceed in this case? Thanks.

      • abatishchev says:

        Don’t put VSTS variables into PowerShell script’s body as it’s not aware of them. Put them when you invoke the said script in VSTS task. That it, see the initial post: the script uses the path from a parameter, the parameter is populated from the task.

        Task type: Azure PowerShell
        Azure Connection Type: Azure Resource Manager
        Script Type: Script File Path
        Script Path: $(System.DefaultWorkingDirectory)\Daily\drop\AzureDataFactory\Deploy\Deploy-AzureDataFactory.ps1
        Script Arguments: -ResourceGroupName Default-Factory-WestUS -Location WestUS -DataFactoryName TestFactory1 -InputFolderPath     $(InputFolderPath)
        

        Note that the path to the script is adjusted.

  5. Arunachalam S says:

    Thanks for the guidance. Please let me know if you can help in this.

    I need to do Continuous Integration and Deployment for my Azure Data factory. For this in a visual studio solution I have two projects one for ADF json files (linked services, datasets etc) and another one PowerЫhell script for deploying this ADF into a Azure subscription.

    Steps followed –

    Took MSBUILD of ADF codes and used copy files task to copy into $(Build.ArtifactStagingDirectory). Used Publish artifacts task to publish in VSTS.

    Publish artifacts for PowerShell script as a separate build.

    In my release I have a Azure PowerShell script which will invoke these ADF files and deploy it in Azure subscription. I’m using “Build.ArtifactStagingDirectory” for referring my ADF files. But I’m getting the below error –
    The term ‘Build.ArtifactStagingDirectory’ is not recognized as the name of a cmdlet, function, script file, or operable program

    foreach($file in Get-ChildItem “$(Build.ArtifactStagingDirectory)” -filter “LinkedService”) { New-AzureRmDataFactoryLinkedService -ResourceGroupName “ADFAutomationResource” -DataFactoryName “ADFCICD190218” -Name $file.BaseName -File $file.FullName -Force | Format-List }

    Let me know how to proceed in this case as there are no sufficient links explaining this.

  6. Arunachalam S says:

    Thanks for your reply. This is the first few lines of my powershell script

    param(
        [Parameter(Mandatory=$true)][string]$InputFolderPath
    )
    foreach($file in Get-ChildItem $InputFolderPath -filter "<em>LinkedService</em>")
    {
      New-AzureRmDataFactoryLinkedService -ResourceGroupName "ADFAutomationResource" -DataFactoryName "ADFCICD190218" -Name $file.BaseName -File $file.FullName -Force | Format-List
    }
    

    And in the Script Arguments i have specified as below:

    -InputFolderPath  $Build.ArtifactStagingDirectory
    

    But it throws me an error stating:

    Cannot bind argument to parameter ‘InputFolderPath’ because it is an empty string.

    I have tried with other options as well for Script arguments “‘D:\a\1\a\ADFCICD1402\” with no luck. Looking forward to your help in this. I’m not able to get any links which explains this. Sorry for the trouble.

    • abatishchev says:

      It should be $(Build.ArtifactStagingDirectory) from VSTS web UI.

      • Arunachalam S says:

        thanks.still no luck..getting the same error..

        The term ‘Build.ArtifactStagingDirectory’ is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.

        Now i have my ADF code and powershell scripts in a single build and in the release im trying to invoke this powershell script with arguments -InputFolderPath $(Build.ArtifactStagingDirectory). Any help on this regard will be great. Thanks.

        In the build these are the following steps i followed –
        MSBUILD
        Copy files – source folder – $(Build.SourcesDirectory) contents ** target folder – $(Build.ArtifactStagingDirectory)
        Publish artifacts – Path to publish – $(Build.ArtifactStagingDirectory)

        my script.ps1 is available along with the ADF codes.

  7. Ivnhoe says:

    Where can I get help in detail for CI through VSTS for Azure DataFactory?? I need the detail on CI and not CD.

    • abatishchev says:

      Do you mean to build ASF on VSTS? I was using plain msbuild task, that’s all what was available by that time, a year ago. What version are you using, v1 or v2?

  8. Ivnhoe says:

    I want a CI for json files ( ADF ) which are developed in Visual Studio 2015 by developers. These should go to artifacts folder of VSTS. I tried MSBUILD but failed. Now I just simply using the copy task, is this sufficient? As I thought CD will take care of deployment and in case of data factory build( MSBUILD) task not needed unlike USQL files. Please suggest.

    • abatishchev says:

      Please note that ADF version is very important as there were significant changed, basically a rewrite. I can speak only about v1. Also I have nothing to do with the ADF team so speak only from my experience with ADF.

      When you say failed what does this mean? What was the error? Did you try locally or in VSTS? Try locally first and let me know.

      Also please note that adfproj references a target file from your %LocalAppData% or something like that. It to work in VSTS you have to put it under the source control alongside your project, or create a nuget package and reference it from the packages folder.

      Why just copy won’t work? Because you likely have environment configuration files and some settings like connection strings vary from environment to environment. So you need to merge configs into linked services. And that’s what msbuild will do.

  9. Ivnhoe says:

    I used copy task instead of MSBUILD and it worked now. But I have 2 concerns here…
    1. It is just copying artifacts( json files ) of ADF ( version 1 ) projects to artifact folder. Not building anything actually. So it won’t be possible to trace if there any syntactical error in json files.
    2. Is it OK to use copy activity only during build instead of MSBUILD?

    • abatishchev says:

      If you have only 1 environment so keep all settings such as connection strings inside linked service file then you don’t need msbuild task. Because as I said all what it does is merge environment configs into actual files per environment, e.g. Dev, Test, Prod.
      If that’s the case the only thing you need to do is to run the PowerShell script to deploy the files you’ve got directly.

      • Ivnhoe says:

        I have multiple environment. Wanted to make sure proper json file which is syntactically OK goes for deployment. So wanted to build or parse before deploy to validate the Json. What do you suggest?

      • abatishchev says:

        Msbuild does both validation and environment configs merge. You can see this by running it locally, i.e. >msbuild MyADF\MyADF.adfproj. Then replicate that as a step in VSTS build.

  10. Arunachalam S says:

    Thanks Abatishchey for your help on this.Really appreciate it. I was able to resolve the problem after adding this to my code
    $ScriptFolder = $PSScriptRoot
    Write-Output “PSScriptRoot is: $PSScriptRoot”

    $InputFolderPath = $PSScriptRoot.

  11. Arunachalam S says:

    Hello..have you done CI/CD for Azure Data lake – USQL. If yes can you provide some links for it..

    • abatishchev says:

      Merge all your scripts into 1 file and then execute this PowerShell command: Submit-AzureRmDataLakeAnalyticsJob -Account $AnalyticsAccountName -Name $scriptName -ScriptPath $file -DegreeOfParallelism $DegreeOfParallelism. Otherwise it might take hours and cost hundreds of dollars to deploy an object per script per job. However beware of dependencies: you may need to deploy tables first, then types, then procs.

      • Arunachalam S says:

        Thanks.Have you taken build for U-SQL project. I’m facing build issues while taking the build using VSBuild. Have you faced any kind of error before while building U-SQL Code –

        [error]C:\Users\a.sivananthan\AppData\Roaming\Microsoft\DataLake\MsBuild\1.0\Usql.targets(33,5): Error MSB4062: The “Microsoft.Cosmos.ScopeStudio.VsExtension.CompilerTask.USqlCompilerTask” task could not be loaded from the assembly Microsoft.Cosmos.ScopeStudio.VsExtension.CompilerTask. Could not load file or assembly ‘Microsoft.Cosmos.ScopeStudio.VsExtension.CompilerTask’ or one of its dependencies. The system cannot find the file specified. Confirm that the declaration is correct, that the assembly and all its dependencies are available, and that the task contains a public class that implements Microsoft.Build.Framework.ITask.

      • abatishchev says:

        Do you get this error locally or in VSTS? If the latter then it means that you didn’t bring the target and its dependencies within your source code and rely that it’s installed on the build agent. But it’s not.

  12. Arunachalam S says:

    Thanks Abathishchev. Im able to take a build now with U-SQL code. thanks for your reply

  13. Arunachalam S says:

    While deploying this release into azure subscription I’m getting the below error – “The user is not authorised to perform this operation on storage.” What kind of access do i need to give to resolve this issue.

  14. Arunachalam S says:

    Thanks Abatish..I gave reader access to the app id i created

  15. Hi abatishchev,

    Scenario:
    In my ADF project, I have multiple pipelines, linked services, and datasets.
    In the Powershell script, it has -Force property which means existing linked services, datasets and pipelines will be replaced without prompt.

    Problem
    I want only one pipeline and its datasets to be deployed to ADF using PowerShell. Is it possible?
    Basically, it should compare the changes in the linked services, datasets and pipelines and then deploy only the changed objects to ADF.

    Reason
    I don’t want all my multiple pipelines, linked services and datasets to re-deploy again as I am not working on the same.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.