Extracting the Power BI activity log with Data Factory

If you want to follow along and get an understanding how your Power BI tenant is used, then it’s crucial to get your hands in all the activity events in the Power BI Audit Log. You basically have two options. First is the unified audit log in the Microsoft 365 Admin Center that you can access with a link from the Admin portal: https://app.powerbi.com/admin-portal/auditLogs

This is fine if you want to search for specific events, but not if you want to get a full overview. Also the audit log require some hefty permissions that most users don’t have. Furthermore you only have 90 days of history.

Thankfully we have a better option, as we can extract the data using either the REST API or PowerShell cmdlet. This way we can build up the history yourself and make reports on all the data. Microsoft and community members have created nice and powerful solutions. Here are a couple of examples:

So you might ask yourself: Why another solution? Great question indeed! My quick answer: Because I want to do it more simple and I want to use the prefered tool for data extraction and ingestion: Azure Data Factory. This way I can easily set up a schedule and ingest the data where needed – Data Lake Storage, SQL database or any of the other +80 destinations (sinks) supported.

This option is, in my opinion, superior as you can also handle the authentication quite easily with the new “Enable service principal authentication for read-only admin APIs” that’s currently in preview + you don’t need to create and handle a App Registration.

  1. You start by creating a new Security Group in Azure Active Directory
  2. Then you add the “Managed identity” (service principal) of your Data Factory ressource as a member of the security group
  3. And finally you add the security group to the Admin API tenant setting in the Power BI admin portal

With these three steps handled, you now have full access to call and read all the Power BI REST APIs from your Data Factory ressource using the MSI authentication method. Easy? Next step is then to create a Linked Service of the type REST with these settings:

  • Base URL: https://api.powerbi.com/v1.0/myorg/admin/
  • Authentication type: Managed Identity
  • AAD resource: https://analysis.windows.net/powerbi/api

And finally create a pipeline that calls the “Get Activity Events” REST API with the right set of URI parameters (startDateTime and endDateTime) + handle the continuationToken. This is fortunately also pretty easy if you already have some basic Data Factory skills. The API only supports extracting up to one day of data per request, so I have created a parameter called “DaysToExtract” and start with a ForEach activity. If I set the parameter to 1, it will extract yesterday’s data and setting it to 30 will extract the last 30 days. This is unfortunately the maximum you can get out of this API and therefore extremely important that you build up the history yourself.

The ForEach activity is setup with this as the Items in the settings: @range(1,pipeline().parameters.DaysToExtract) as it creates an array that the activity can loop over. Inside we just need a plain simple Copy Activity with a source dataset of REST that points to the Linked Service. I have created a dataset with a parameter to the Relative URL, so I can reuse it to call other AdminAPI’s

The value is setup with dynamic content to handle the startDateTime and endDateTime URL parameters based on the pipeline parameter, that comes from the array: activityevents?startDateTime='@{formatDateTime(getPastTime(item(), 'Day'), 'yyyy-MM-dd')}T00:00:00.000Z'&endDateTime='@{formatDateTime(getPastTime(item(), 'Day'), 'yyyy-MM-dd')}T23:59:59.999Z'

The continuationToken is handled with a simple AbsoluteUrl “Pagination rule” and finally the dataset is enriched with two additional columns to help with lineage, so you can couple the files/rows back to when and how it was extracted.

Next up is deciding your Sink (destination) where I choose a Azure SQL database because it gives we some flexibility to analyze the data straight away and it’s easy to import into a Power BI dataset. The downside is that I need to setup mapping and then need to figure out all the possible activityEventEntities in the returned JSON. I have identified 53 but as it’s not documented anywhere, I might have missed some and there will possible come new Entities in the future, that I will then miss to extract… To overcome this, another solution is storing the entire JSON document in the Azure SQL database or a blob storage / data lake storage. I will let it be up to you to choose your preferred solution.

I hear a lot of people claim that Data Factory is expensive. And yes, it can be, but not always and certainly not in this case. If you run this pipeline with DaysToExtract set to 30, it will do 31 activity runs and 0.5000 DIU-hours. This is a total price of around $0.15. This pipeline is also very quick and runs at around 30 seconds as it calls the API in parallel.

If you only extract one day of data, which is the normal use case when running the pipeline nightly, then you will have 2 activity runs and 0.0167 DIU-hours. This translates to $0.005 per run.

I have saved my pipeline as a custom template – free for you to use and modify as you like 😁 To import, you will have to use the “Pipeline from template” functionality and then click on either “Use local template” or “Import template” depending on, if you have connected your data factory to a code repository. The template includes both the pipeline and the two datasets, so you will only need to create the linked services yourself.

https://github.com/justBlindbaek/PowerBIMonitor/blob/main/DataFactoryTemplates/PL_PowerBIGetActivityEvents.zip

It’s uploaded to a GitHub repo, that I plan to extend with further pipelines that extract valuable data from the Power BI REST API’s. The next interesting part is all the artifacts you can get from the new WorkspaceInfo API. This includes Lineage and Endorsements details that’s not available in any other API as well as list of workspaces, reports etc. Hopefully I will have a blogpost ready soon – the pipeline is already created 😉

UPDATE 2021-02-23

After popular demand, I have now created an additional pipeline with Azure Data Lake Storage as the destination. Saving as JSON in a hierarchical folder structure. Download the pipeline template here: https://github.com/justBlindbaek/PowerBIMonitor/blob/main/DataFactoryTemplates/PL_PowerBIGetActivityEvents_ADLS.zip

2 thoughts on “Extracting the Power BI activity log with Data Factory”

  1. Excellent blog and tutorial. I am not experienced with ADF and I am able to create a working pipeline following your instruction. My wishlist is a way to define the output json file name so that it reflects the date of activity covered in the file. This will help with managing the json files in the data lake and avoid duplicated files if the pipeline is run multiple times. Would this be possible to include in your template? FYI I am going with the data lake option to store entire log data, to be cleaned and filtered later.

Leave a Reply

Your email address will not be published. Required fields are marked *