In our modern era, where data holds immense significance, the ability to observe and respond promptly to live data streams becomes indispensable. In this blog post, I will introduce a technique for constructing a representative real-time Semantic model [dataset] in Power BI using Power Automate. This method is specifically tailored to ensure efficient scenario testing. I will delve into two approaches to accomplish this task: Firstly, I will explore utilizing a streaming semantic model within Power BI, constantly updating it with no less than one fresh record every second. Secondly, I will escalate the pace by integrating a SQL database, consistently pushing 2 to 3 records per second. Subsequently, I will establish a connection between the database and a Power BI report employing DirectQuery.
Enhance your Power BI skills and transform your data workflow with practical insights and step-by-step instructions provided in this guide, whether you're looking to refine your data strategy.
Video:
Here's what I'll cover in this blog to guide you through each step of the process:
Overview of the data source's structure
Real-time streaming in Power BI allows you to push data instantly into a dashboard, enabling live updates without the need for manual refreshes. This capability is crucial for scenarios where real-time data is essential, such as monitoring financial markets, tracking website analytics, or even overseeing manufacturing processes.
In this solution, I will us the structure of the 'Microsoft Financial Sample Excel file' as the foundation for our data source.
That file has 16 columns but I will use key columns such as 'Segment', 'Country', 'Product', 'Units Sold', 'Discounts', 'Sales', 'COGS', and 'Date and Time'. These selected metrics offer a diverse selection of columns with different type of values.
Building an Scheduled Cloud Flow and Designing Variables for Text Columns in Power Automate
In Power Automate, click on Create button and select Scheduled Cloud Flow.
Name the flow, choose today and the current time as the starting date, and set it to repeat every 180 seconds (I will explain how to set that repeat time in this blog) and click on Create.
(I still prefer the classic designer format, so after creating the flow, I switched to it.)
In this flow and method, the records will be selected randomly from a range of data that needs to be defined by using variables for columns that have Text values, such as Segment, Country, and Product. For columns with number values, we can use RAND formula with the following syntax:
rand(minValue:integer,maxValue :integer)
To define the values for Segment, Country, and Product columns, add three variables to the flow in parallel. For each of those three variables, add a name and choose Array as the type, like this:
In the next step, the values for each variable need to be established. However, before doing that, it is important to understand the advantages of using variables with array values in this scenario. In our methodology, the values need to be selected randomly for each column. This requires defining an array format list of values for the text columns. We can then choose a random value from this list.
To illustrate how to select a value from an array variable, consider this example: Suppose we have a variable named 'VSegment'. To select the first item from this array, we use the formula variables('VSegment')[0], where [0] denotes the first item in the array. Similarly, to pick the third item, the formula would be variables('VSegment')[2].
Now, to randomly select a value from this array, we first need to determine the total number of items in it (remember, counting starts from 0). Suppose our array 'VSegment' contains 5 items. To select a value at random, we combine this with the RAND formula like: variables('VSegment')[rand(0, 4)]. This method efficiently randomizes the selection of values from our predefined list, adding dynamism to our data set.
To form the Array, you have the option to utilize values from the Finance Excel sheet or define them yourself. For example, you might choose specific country names you wish to include in your Semantic model [dataset], such as Germany, Italy, France, Japan, etc. A convenient way to transform these into a JSON array is by using online 'Convert to JSON' tools.
For instance, let's take a look at the Finance file which includes values like Government, Midmarket, Channel Partners, Enterprise, and Small Business. To convert these into a JSON array, you would proceed as follows:
[
"Government",
"Midmarket",
"Channel Partners",
"Enterprise",
"Small Business"
]
For the Product and Country variables, I used the following arrays
Product | Country |
[ "Phone", "Laptop", "Ram", "CPU", "Keyboard", "Mouse", "Power", "Graphic Card", "Monitor", "Motherboard", "Fan" ] | [ "Egypt", "Italy", "Brazil", "Columbia", "United States of America", "Germany", "Australia", "Mexico", "Argentina", "China", "India", "Russia", "Canada", "France", "South Africa", "Japan", "Spain", "Denmark", "Iran", "South Africa" ] |
As a result, the flow will be as follows:
Creating and Continuously Updating a Streaming Semantic model in Power BI with New Records
Create a Streaming Semantic model [Dataset] in your Power BI service, ensuring the column names and data types match your requirements. It's important to enable 'Historic data analysis' for saving data. For detailed guidance on configuring streaming datasets and tips on restarting them, I recommend reading this blog post: Restart Power BI Hybrid Dataset with Power Automate and APIs.
Once your streaming Semantic model is ready, the next step in Power Automate is to integrate an action for adding rows to this dataset. Look for the action titled 'Add rows to a dataset' within the Power BI actions. When setting up this action, select your workspace, the Streaming Semantic model, and the table name, which is typically 'RealTimeData'. Upon selection, the column names will appear within the action, allowing you to define the values for each column.
The Semantic model has 8 columns with different type of values, so to assign a random value to each column in this action, you can set it up as follows:
The variables related to the text value columns have already been added into the flow. As previously mentioned, the following formula can be used for those columns:
Segment: variables('VSegment')[rand(0, 4)]
Country: variables('VCountry')[rand(0, 19)]
Product: variables('VProduct')[rand(0, 10)]
For those 4 columns with Number values, the following formula can beused: Unit_Sold: rand(121, 3521) --You can set any numbers in rand formula
Discounts: rand(0, 10000) --You can set any numbers in rand formula Sales: rand(12865, 125369) --You can set any numbers in rand formula
COGS: rand(100000, 200000) --You can set any numbers in rand formula
For the 'Date' column, my aim is to insert the current date and time 'AUS Eastern Standard Time' zone, so the following formula can be used for that column:
Date: convertFromUtc(utcNow(), 'AUS Eastern Standard Time', 'u') --You can change the time zone to any time zone
It will look like this when everything is set up:
The flow is ready to use but the only issue is that we are allowed to recur at most once every '60' seconds. To fix this issue we need to add a loop to the flow.
Implementing a Loop to Generate Records Every Second
Creating a loop for this flow using the 'Add rows to a dataset' action is little bit tricky because of many limitations on that action and Power Automate items.
Clik here to see the details about some limitations of the streaming Semantic model [Dataset], including the limitation that you cannot add more than 100 rows to the Power BI streaming semantic model in less than a minutue using that Power Automate action. The other limitations are using the Recurrece in less than 60 seconds in Power Automate, and using the delay action to wait 1 second (to avoid adding more than 100 rows in a minute), The provided 'Wait' action 'interval' value '1' 'Second' is not supported for the 'Consumption' SKU. The value must be between '5' seconds and '30' day(s).
Given these constraints, I've developed a solution that may seem somewhat unconventional but it works very well 😉.
First, add an "Initialize variable" action before 'Add rows to dataset', set the name as ForLoop, type as Array and add a sample array with 120 items to the value section.
Here is the file to download the sample array:
After the ForLoop action, add a 'Apply to each' action, and set ForLoop as the output from previous steps:
Move the 'Add rows to a dataset' into the Apply to each action:
The flow is now set up, but a challenge remains: the number of rows being added to the semantic model per second and per minute exceeds the allowed limit. As a result, the flow will be on hold after approximately 95 to 100 rows are added
To resolve this issue, some actions need to be added in order to save some time and reduce the number of rows to be added to the semantic model. As I mentioned before, we can't use the Delay action because the minimum value for that is 5 second and I want to add at least 1 record to our real time semantic model.
Before Apply to each, add a new variable and name it ForLoopNumber, set its type as Integer, and set its value to 1.
Add a set variable action inside Apply to each action after the 'Add rows to a dataset' action and select the ForLoopNumber variable and set the value to 1, and after that add a 'Do until' action and set that to stop when the ForLoopNumber is equal to 0 and inside that 'Do Until' action add 'Decrement variable' action to reduce 1 from ForLoopNumber, like this:
Though it might seem strange at first, it will fix all of those issues and limitations, and I will update this blog once I find a better solution.
Important Note:
To determine the ideal interval for the Recurrence flow trigger, run the flow and note the duration of each execution. Based on this timing, you can adjust the interval to prevent any concurrent runs and facing with any limitations.
In my case, each run takes around 90 to 110 seconds, so I added 120 seconds as the interval for the triger.
This method with streaming semantic model is great for testing scenarios which need real time streaming data, like creating a dashboard:
Update the flow to add more than 1 rows per sec into a SQL database
you might need a SQL database with a real-time table capable of receiving multiple new records per second for senarios that you want a Power BI report with direct query connection. If this is the case, we can use the previous version of the flow and take out the parts that we added because of the limits of the streaming semantic model and replace the 'Add rows to dataset' with SQL 'Insert row (V2)' action.
In order to use that SQL 'Insert row (V2)' action, you need a proper license because that's a premium action. You also need a table in your database that has the same column names and types of values (those 8 columns I picked from the Finance excel file. For the loop, you can still using the 'Apply to each' action and add that 'Insert row (V2)' action into it, or use the number variable and 'Do until' action (I used the Do until method).
The Flow for that will be like this:
And you can connect that database to a report with direct query connectivity mode, and use "Power BI Visuals Auto-Refresh with new Browser Extension" to refresh the visuals of that report automaticaly like this:
Conclusion
In this blog post, I guide you through the process of creating a dynamic, real-time streaming semantic model in Power BI using Power Automate. From setting up streaming semantic models to integrating with a SQL database for more complex scenarios using different range of techniques. Whether testing scenarios with Power BI dashboards using live data or exploring advanced data handling with SQL live databases, these methods can be helpful. Remember, the key to success in this dynamic field is experimentation and adaptation. So, I encourage you to try these techniques, tweak them to fit your specific needs, and let me know if you have any comments or questions.
Vahid
Comments