Skip to content

Extracting Power BI metadata with Data Factory (part 2)

You can now get the list of users as part of the WorkspaceInfo API. You just extend the PostWorkspaceInfo API call with getArtifactUsers=True. Much easier and also give you user details on datasets, reports etc.

UPDATE: 2023-01-03

This is a follow up on the blog post describing how to call the WorkspaceInfo API’s to get most important metadata from your Power BI tenant. Unfortunately these API’s don’t returns it all, so you also have to call two other API’s to get:

  • List of users in the Workspaces and their access level
  • List of Capacities

As always, I have shared the pipelines as templates in my GitHub repo: https://github.com/justBlindbaek/PowerBIMonitor

Let’s start with the most simple – the Get Capacities As Admin to get a list of capacities in your organization. If you have any? It will return both Premium, Embedded and the global Premium Per User capacity. You can then match the id with the capacityid you get in the result from the WorkspaceInfo API and/or Activity Events.

We will set up a Copy activity with a REST source dataset using the Linked Service already created. The URL should point to: https://api.powerbi.com/v1.0/myorg/admin/capacities?$expand=tenantKey. You can skip the last expand part, if you don’t use a encryption key or just keep it to be futherprof. As the destination (sink) you can configure Data Lake, SQL DB or any other service of your choice.

To get the users in the workspace you will have to call the “old” Get Groups As Admin API, that returns a list of workspaces in your organization with the option to expand related entities. In this case we will expand the users, but you can also expand reports, datasets etc. This was how we got all the metadata before we got the WorkspaceInfo API. The Groups API can “only” return 5000 workspaces in one request and does not support built in pagination. So we will have to construct the pagination yourself – in contrast to the WorkspaceInfo and Activity Event API’s. Many organizations have way less than 5000 workspaces, but I’m creating this solution to fit everyone ?

First up is a Web activity to call the API to get the total number of workspaces. Then we create a simple array in a variable for every 5000 workspace and then we pass this information to a ForEach activity that contains a Copy activity.

The API can only return the users from V2 workspaces, so the first API request include that as a filter and also, we don’t need the actual list of workspaces. Only the total number of workspaces, that we get back in the @odata.count field of the JSON. This gives us this URL to call with a GET method: https://api.powerbi.com/v1.0/myorg/admin/groups?$expand=users&$filter=type eq 'Workspace'&$top=1

Next step is creating an array with a row for every 5000 workspaces. I’m definitely not a master of building up dynamic content with the use of expressions, but I managed to get this working as the value of the variable: @range(1,div(sub(add(activity('Get number of workspaces').output['@odata.count'], 5000), 1), 5000))

Lastly we pass the variable into the “Items” of as foreach activity with a copy data activity. The destination (sink) can be your own choice, but most people will probably choose either a SQL DB or a Data Lake. The really interesting part of the copy activity is the URL of the source, where I reuse the generic dataset created earlier.

We call the exact same API as we started with, but now pass in different values for the “top” and “skip” parameters. The variable just returns 1,2,3 and so forth which means we have to do some basic math to get the right values for each call. Again I’m just glad that I got it working, because this is not one of my special skills: groups?$expand=users&$filter=type eq 'Workspace'&$top=@{mul(item(), 5000)}&$skip=@{sub(mul(item(), 5000), 5000)}

Now you might think this is the end in this series of blog posts? It could be, as we have now managed to extract all the most important Power BI tenant data. But what if we could combine these data with information from Microsoft 365? Like getting metadata on all the users, security groups and also license information? Would that be cool? Then look forward to the next blog post ?

2 thoughts on “Extracting Power BI metadata with Data Factory (part 2)”

  1. Pingback: Extracting Power BI metadata with Data Factory (part 1) – justB smart

  2. Pingback: Extracting Microsoft Graph data with Data Factory – justB smart

Leave a Reply

Your email address will not be published. Required fields are marked *