Extracting the Power BI activity log with Data Factory

If you want to follow along and get an understanding how your Power BI tenant is used, then it’s crucial to get your hands in all the activity events in the Power BI Audit Log. You basically have two options. First is the unified audit log in the Microsoft 365 Admin Center that you can access with a link from the Admin portal: https://app.powerbi.com/admin-portal/auditLogs

This is fine if you want to search for specific events, but not if you want to get a full overview. Also the audit log require some hefty permissions that most users don’t have. Furthermore you only have 90 days of history.

Thankfully we have a better option, as we can extract the data using either the REST API or PowerShell cmdlet. This way we can build up the history yourself and make reports on all the data. Microsoft and community members have created nice and powerful solutions. Here are a couple of examples:

So you might ask yourself: Why another solution? Great question indeed! My quick answer: Because I want to do it more simple and I want to use the prefered tool for data extraction and ingestion: Azure Data Factory. This way I can easily set up a schedule and ingest the data where needed – Data Lake Storage, SQL database or any of the other +80 destinations (sinks) supported.

This option is, in my opinion, superior as you can also handle the authentication quite easily with the new “Enable service principal authentication for read-only admin APIs” that’s currently in preview + you don’t need to create and handle a App Registration.

  1. You start by creating a new Security Group in Azure Active Directory
  2. Then you add the “Managed identity” (service principal) of your Data Factory ressource as a member of the security group
  3. And finally you add the security group to the Admin API tenant setting in the Power BI admin portal

With these three steps handled, you now have full access to call and read all the Power BI REST APIs from your Data Factory ressource using the MSI authentication method. Easy? Next step is then to create a Linked Service of the type REST with these settings:

  • Base URL: https://api.powerbi.com/v1.0/myorg/admin/
  • Authentication type: Managed Identity
  • AAD resource: https://analysis.windows.net/powerbi/api

And finally create a pipeline that calls the “Get Activity Events” REST API with the right set of URI parameters (startDateTime and endDateTime) + handle the continuationToken. This is fortunately also pretty easy if you already have some basic Data Factory skills. The API only supports extracting up to one day of data per request, so I have created a parameter called “DaysToExtract” and start with a ForEach activity. If I set the parameter to 1, it will extract yesterday’s data and setting it to 30 will extract the last 30 days. This is unfortunately the maximum you can get out of this API and therefore extremely important that you build up the history yourself.

The ForEach activity is setup with this as the Items in the settings: @range(1,pipeline().parameters.DaysToExtract) as it creates an array that the activity can loop over. Inside we just need a plain simple Copy Activity with a source dataset of REST that points to the Linked Service. I have created a dataset with a parameter to the Relative URL, so I can reuse it to call other AdminAPI’s

The value is setup with dynamic content to handle the startDateTime and endDateTime URL parameters based on the pipeline parameter, that comes from the array: activityevents?startDateTime='@{formatDateTime(getPastTime(int(item()), 'Day'), 'yyyy-MM-dd')}T00:00:00.000Z'&endDateTime='@{formatDateTime(getPastTime(int(item()), 'Day'), 'yyyy-MM-dd')}T23:59:59.999Z'

The continuationToken is handled with a simple AbsoluteUrl “Pagination rule” and finally the dataset is enriched with two additional columns to help with lineage, so you can couple the files/rows back to when and how it was extracted.

Next up is deciding your Sink (destination) where I choose a Azure SQL database because it gives we some flexibility to analyze the data straight away and it’s easy to import into a Power BI dataset. The downside is that I need to setup mapping and then need to figure out all the possible activityEventEntities in the returned JSON. I have identified 53 but as it’s not documented anywhere, I might have missed some and there will possible come new Entities in the future, that I will then miss to extract… To overcome this, another solution is storing the entire JSON document in the Azure SQL database or a blob storage / data lake storage. I will let it be up to you to choose your preferred solution.

I hear a lot of people claim that Data Factory is expensive. And yes, it can be, but not always and certainly not in this case. If you run this pipeline with DaysToExtract set to 30, it will do 31 activity runs and 0.5000 DIU-hours. This is a total price of around $0.15. This pipeline is also very quick and runs at around 30 seconds as it calls the API in parallel.

If you only extract one day of data, which is the normal use case when running the pipeline nightly, then you will have 2 activity runs and 0.0167 DIU-hours. This translates to $0.005 per run.

I have saved my pipeline as a custom template – free for you to use and modify as you like ? To import, you will have to use the “Pipeline from template” functionality and then click on either “Use local template” or “Import template” depending on, if you have connected your data factory to a code repository. The template includes both the pipeline and the two datasets, so you will only need to create the linked services yourself.

https://github.com/justBlindbaek/PowerBIMonitor/blob/main/DataFactoryTemplates/PL_PowerBIGetActivityEvents_ASQL.zip

It’s uploaded to a GitHub repo, that I plan to extend with further pipelines that extract valuable data from the Power BI REST API’s. The next interesting part is all the artifacts you can get from the new WorkspaceInfo API. This includes Lineage and Endorsements details that’s not available in any other API as well as list of workspaces, reports etc. Hopefully I will have a blogpost ready soon – the pipeline is already created ?

UPDATE 2021-02-23

After popular demand, I have now created an additional pipeline with Azure Data Lake Storage as the destination. Saving as JSON in a hierarchical folder structure. Download the pipeline template here: https://github.com/justBlindbaek/PowerBIMonitor/blob/main/DataFactoryTemplates/PL_PowerBIGetActivityEvents_ADLS.zip

17 thoughts on “Extracting the Power BI activity log with Data Factory”

  1. Excellent blog and tutorial. I am not experienced with ADF and I am able to create a working pipeline following your instruction. My wishlist is a way to define the output json file name so that it reflects the date of activity covered in the file. This will help with managing the json files in the data lake and avoid duplicated files if the pipeline is run multiple times. Would this be possible to include in your template? FYI I am going with the data lake option to store entire log data, to be cleaned and filtered later.

  2. Pingback: Extracting Power BI metadata with Data Factory (part 1) – justB smart

  3. Hi Followed this blog very clear and well explained. I am pretty new to ADF, and first time using a rest API as a data source, I have it sort of working. ADF is sinking data into my SQL server but only fields

    continuationUri,continuationToken,lastResultSet,ADF_PipelineRunld,ADF_PipelineTriggerTime.

    I think the source json activityEventEntities is empty as this isn’t returned, I have looked in the tutorial and don’t think I have missed anything, is their something else I need to do ?

    many thanks

    1. I was actually also struggling with this in the start. It was me not getting the the format correct on the Admin API URL when passing in the startDateTime and endDateTime URL parameters. But I don’t suppose you have change anything in this?

      Try changing it to this: activityevents?startDateTime=’@{formatDateTime(getPastTime(int(item()), ‘Day’), ‘yyyy-MM-dd’)}T00:00:00.000Z’&endDateTime=’@{formatDateTime(getPastTime(int(item()), ‘Day’), ‘yyyy-MM-dd’)}T23:59:59.999Z’. I have just put in a int() cast, that means you can use the “Preview data” feature on the source dataset. Then you can see what is returned from the REST call without needing to debug or trigger the pipeline. This could help you debug.

  4. Pingback: T-SQL Tuesday #137 - Jupyter Notebooks for the win? - Benni De Jagere

  5. Trutz-Sebastian Stephani

    Great blog article. Tried using the data factory before but couldn’t get the connection right. This helps a lot.

  6. Thank you so much for sharing this, really useful. I have a question here, when we are accessing data through this api, is it going to pull data out of Power BI activity log or Unified audit log? And, do we require permissions at Tenant level for this or is it fine if we create a security group and add this group in Power BI admin portal Tenant settings? Thank you Again.

    1. Hi Ananth. This solution uses the Power BI Activity log, so it’s enough with the security group added in the Power BI Admin Tenant settings as you suggest.

  7. Thanks a lot for the post. I am trying to understand the data, when the Activity=ViewReport would you or anyone else know what it means when ActivityID=00000000-0000-0000-0000-000000000000 when the RequestID is filled in with a lot of numbers.

  8. Carley Howell

    Great blog, managed to get this working to bring in 30 days of data then working on a trigger to load daily data 🙂 Thank you!

    Quick question – is there a way to amend this to bring in a specific time period of 30 days, instead of only the past 30 days? e.g. if I want to go back a bit further to get historical data?

    Thanks 🙂

  9. Anyone got any suggestions for consuming the resulting log files? I’ve managed to produce them and place them in an ADLS store. I can download one of them and “open” it in PBI-desktop, although PBI does a few steps automatically.
    However, if I try to consume a log file from it’s ADLS location Power BI complains that the file is malformed. It does not seem that you have any control of the API output :/

    regards

    Emil

Leave a Reply to Trutz-Sebastian Stephani Cancel reply

Your email address will not be published. Required fields are marked *