Extracting Microsoft Graph data with Data Factory

Microsoft Graph is the gateway to data and intelligence in Microsoft 365 and comes with REST APIs so we can extract data from a long range of Microsoft cloud services like Microsoft 365 core services and Dynamics 365 Business Central.

In this blog post I will show you how to extract Users, Groups and Licenses as these can be useful to enrich the data from the Power BI Activity log and the Power BI Metadata (WorkspaceInfo / Scanner API).

If you have read my other blog posts on extracting data from the Power BI REST API’s you may recall, that the authentication can be handle quite nicely with basically giving Azure Data Factory the access. Unfortunately this option is not possible with the Graph REST API’s. Here you have to create an App Registration (Service Principal).

Create a App Registration in Azure AD

This might be a little tricky, as you probably don’t have the necessary permissions yourself. In this case you need to find an administrator who can help you with this part. This is the steps to go trough in the Azure Active Directory in the Azure Portal:

  1. Go to “App Registrations” select “New App” and leave the default options.
  2. Go to “Certificates & secrets” and generate a “New client secret”. Save the Secret Value – this is your only chance!
  3. Go to “API permissions” and Add a permission. Select Microsoft Graph and Application permissions. Select “Directory.Read.All” and add the permission. Lastly “Grant admin consent for…” and then turn the status green.
  4. Save the Application (client) ID and Directory (tenant) ID from the overview page of the newly created App registration (Service Principal)

Create a Linked Service

With the Tenant Id, App Id and Secret it’s now time to create a Linked Service in Data Factory of the REST type. Use this as the values

  • Name: LS_REST_Graph
  • Base URL: https://graph.microsoft.com/v1.0/
  • Authentication type: AAD Service Principal
  • Service principal ID: [Application (client) ID]
  • Service principal key: [Secret]
  • Tenant: [Directory (tenant) ID]
  • AAD resource: https://graph.microsoft.com

Extract the data with a Pipeline

As always I have created the Pipeline as a template, free to download and use. I created the pipeline in Azure Data Factory, but you can also import it as an Integration Pipeline in Azure Synapse Analytics.

The template has Azure Data Lake Storage set as the destination (sink) so you also need to setup a Linked Service for this if you don’t have one already. File system (container), folder and filename can be changed for your own needs. You can of course also completely change and use another type of destination.

You get four different copy activities. Security Groups, Users and SubscribedSkus is straight forward, while copying the members of the groups is a little more tricky at this requires looping through the groups. As you can have many many groups and might only be interested in few them, I have made an option to filter on the name and then only take a subset of the groups. I made a parameter “GroupsStartWith” where you can define your filter. The filter is used in an expression, that can be adjusted to your needs:

@startswith(item().displayName, pipeline().parameters.GroupsStartswith)

You can also entirely remove the filter if you want to have users from all groups. Just be aware that it could take some time to run the pipeline and potentially cost you some more money.

You might also be lucky and don’t need the looping at all. Because you can also get the groups with the “Copy Users” activity using the “expand=memberOf” property on the URL. Just be aware that you will “only” get the first 20 groups a user is member of… It might be totally fine in your AAD tenant, but you won’t get a warning if your missing data. You have to do your own tests.

After harvesting this valuable data it’s now your turn to transform it into useful information. Combining the data with the Power BI Activity Log will show who of your users is using their Power BI Pro license and maybe more importantly – who is not using it. The group membership can be used to see the actual users, who have access to Power BI artifacts and not only the name of the security groups. Be aware that groups can be nested, so you might have to make some funky join logic to get all the way to the users in the groups.

Leave a Reply

Your email address will not be published.