Azure Data Bricks Notebooks Deployment Using GitHub Actions(CICD).

M Karthik
2 min readJul 8, 2024

--

Deploying Azure Databricks notebooks using GitHub Actions (Continuous Integration/Continuous Deployment) involves automating the process of pushing notebooks from a GitHub repository to Azure Databricks.

Here’s a step-by-step description of how you can set this up:

Prerequisites:

  1. Azure Databricks Instance: Ensure you have an Azure Databricks workspace set up.
  2. GitHub Repository: Have a GitHub repository where your Databricks notebooks are stored.
  3. Azure Service Principal: Create a service principal in Azure to authenticate GitHub Actions with Azure.

Steps:

1. Set up Azure Service Principal:

Create a service principal in Azure using the Azure CLI:

az ad sp create-for-rbac --name ServicePrincipalName --role contributor \
--scopes /subscriptions/{subscription-id}/resourceGroups/{resource-group} \
--sdk-auth

Save the JSON output securely. This will be used in GitHub Actions secrets.

2. Configure GitHub Secrets:

In your GitHub repository, go to Settings > Actions>Secrets.
Add the following secrets:
AZURE_CREDENTIALS: Paste the JSON output of the Azure service principal.
DATABRICKS_HOST: The hostname of your Azure Databricks workspace.
DATABRICKS_TOKEN: Generate a Databricks token from your Databricks workspace.

3. Create GitHub Actions Workflow:

Create a workflow file (e.g., .github/workflows/AZ_YourProjectName_ADB_Non_Prod_Deployment.yml) in your repository.

name: AZ_YourProjectName_ADB_Non_Prod_Deployment

on:
# Triggers the workflow on push or pull request events but only for the "develop" branch
push:
branches: [ "**_develop" ]
paths:
- '**/Deploy/Variables.ps1'

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:

Build Pipeline

jobs:
# This workflow contains a Multiple jobs called "build" & "Release" to Dev and Test Environment
build:
name: AZ_YourProjectName_ADB_Non_Prod_Build
runs-on: ubuntu-latest
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- uses: actions/checkout@v3

- name: Upload a Build Artifact
uses: actions/upload-artifact@v3.1.2
with:
name: Build_Artifacts_NonProd
path: |
.
!**/.git/*
!**/.github/*
!**/Common_Functions/*
!**/Common/*
!**/README.md

release pipeline

  Deploy_Dev:
name: AZ_YourProjectName_ADB_Dev_Release
needs: build
runs-on: ubuntu-latest
environment: Dev
steps:
- name: Download Build Artifact
uses: actions/download-artifact@v3
with:
name: Build_Artifacts_NonProd
path: .

- name: Set Path where ADB notebooks need to copied in Workspace through Variables powershell file in Deploy Folder
run: Get-Content /home/runner/work/AZ_YourProjectName_ADB/AZ_YourProjectName_ADB/Deploy/Variables.ps1 >> $Env:GITHUB_ENV
shell: pwsh

- name: Connect to Azure Account and Select Subscription
run: |
az login --service-principal --username ${{ secrets.AKV_CLIENT_ID }} --tenant ${{ secrets.TENANT_ID }} --password ${{ secrets.AKV_CLIENT_SECRET }}
shell: bash

- name: Get KeyVault Secret
uses: Azure/get-keyvault-secrets@v1
with:
keyvault: ${{ vars.keyVaultName }}
secrets: ${{ vars.ADBTokenSecretName }}
id: Secret

- name: install-databricks-cli
uses: microsoft/install-databricks-cli@v1.0.0

- name: Upload Notebooks to Databricks Workspace in Dev Environment
uses: microsoft/databricks-import-notebook@v1.0.0
with:
# Databricks host
databricks-host: ${{ vars.ADBUrl }}
# Databricks token
databricks-token: "${{ steps.Secret.outputs[format('{0}', vars.ADBTokenSecretName)] }}"
# LOCAL_NOTEBOOKS_PATH
local-path: '/home/runner/work/AZ_YourProjectName_ADB/AZ_YourProjectName_ADB/${{ env.path }}'
# REMOTE_NOTEBOOK_PATH
remote-path: '/Shared/${{ env.path }}'

4. Commit and Push:

  • Commit the workflow file (AZ_YourProjectName_ADB_Non_Prod_Deployment.yml) to your GitHub repository and push it to the develop branch.

5. Verify Deployment:

  • GitHub Actions will automatically run the workflow when changes are pushed to the develop branch from your feature branch.
  • Check the Actions tab in your GitHub repository for the status of the workflow run.

By following these steps, you can automate the deployment of Azure Databricks notebooks using GitHub Actions, enabling a streamlined CI/CD pipeline for your data workflows.

--

--

No responses yet