Following a wide demand, I also wrote a post about getting data from Excel files in a folder. You can read it here.
It seems that lately Excel users can be separated into two groups. One group that gets the new Excel (2010 or 2013) but still uses the application as if it’s Excel 2003 and those are users that may have heard of the whole Power BI thing but think that the whole Power BI story can only benefit certain people. That is not true. Power BI is all in all a set of brilliant commands. This post will show you only one of them and that is a Get Data From Folder command in Power Query. But it will also show you how Get Data From Folder command in Power Query can make your life easier. Take the time and read this post and think about how to implement this in your work. Trust me, you will not regret it! Here’s what this post will talk about.
If you get new data periodically in any form (txt, csv, xlsx…) then you know that what that means is that every time you have to go through the same process to get that data into Excel. Now this will be reduced to copying that file to a certain folder and clicking refresh in Excel. It doesn’t get much better than that. But first, let’s explain what is Power Query.
Power Query is an Excel Add-In that was created to help you get data into Excel or Power Pivot. It’s part of Microsoft Power BI and is even incorporated into the new Microsoft Power BI Designer. Everything you read about in this post refers to Excel 2010 and Excel 2013 for which Power Query is available. And now for the Get data from folder command.
When I first found this command, my enthusiasm about Excel doubled over night (and that is saying a lot). Here’s where it can really help you. Working with Excel will sooner or later get you in a position where you will be repeating one and same actions periodically. In the beginning that is great, but in time you would much rather be creating great dashboards and charts and not repeating the same imports or copying over and over again. Before Power Query was “born”, the only tool at your disposal for easing of the process was VBA. You could write a macro that did the import and accounted for all specifics in your data. But now, things got even better. What follows is a nice example of that.
Let’s say that every month, we get a new txt file that is just an export of data that we want in Excel so we can analyze it. This is how those txt files look like.
So every month we have to remove the top six rows, correctly import the columns and add them to previously imported data and analysis. Here’s how you do it with Get data from folder command in Power Query.
On the Power Query tab select From File and then From Folder. This will open a Choose a folder dialog box, where you can choose any folder you like (even folders on SharePoint).
Once you choose the folder, you get what will soon become your favorite window in Excel. It’s the Query Editor Window. And here the fun begins.
The above picture also shows two buttons we will use magnified, since one hides the dropdown menu of most used commands in Power Query. And we will use many of them during this process. But the other is even more important in this first stage and that is the double down arrow in the Content column.
The reason we need this button is that at this point, we are seeing the list of files in the folder but not the content of those files. But pressing this button will show exactly that.
Here’s what we get after pressing it.
And now the fun begins. We will do some data transforming by mostly using the dropdown menu button we talked about before.
First off, we will delete the top six rows. So from the dropdown menu we choose Remove Top Rows…
As the three dots at the end of the command suggest, you get a dialog box, and all that it needs from you is the number of top rows, you wish to remove (in our case that is 6).
After that we will choose a simple Use First Row As Headers command.
Now if you can imagine, this took care of the first six rows from the first txt file. But the same six rows from other txt files in that folder are still there, appended bellow. So the next step must be removing those. We will be using the filter in the name column to do that. First a little trick. At the beginning, the filter only shows you a limited amount of records and you must use the Load more command shown below.
Now we remove the blanks (since all rows contain the name, than these must be the six rows at the beginning of all txt files)…
…and then remove the rows containing Name (these are the header rows of other txt files)
Now we have the data we need, we just have to tell Excel what that data is. So we set the data types of the columns. So we select the column and go to Home/Data Type and select the appropriate data type.
With that done, we attend to the specifics in our data. In the City column, a question mark has replaced the apostrophe so we should revert those changes. We will use the Transform/Replace Values command.
The dialog box will remind you of a classic Find and Replace Window.
And this is what you get
Now we will use the Close & Load command.
Ok, so now we did all that we do every time we get a txt file. More or less a same amount of work, so the question is, what we have gained or what is so special about this.
First and foremost, at this point we have imported four txt files at once and now have 4000 rows. But now we just got the new txt file with new data.
All we need to do is to copy that file to the same folder as the others.
And just right click the data in Excel and choose Refresh
And right away we get 1000 new rows from the new txt.
So Power Query will take all the files in that folder and repeat all the steps that we have defined before. Brilliant. But there’s more. What if we get a txt file that needs a new rule or a change in the old set of rules? All we need to do is to right click the Query on the right and choose Edit and we are back in the Query editor and on the right you can delete or modify any of the steps that we defined in the original query.
Pure Brilliance. But if the new txt file would be totally different than the previous ones, you could define a new query for that one and then use the Append command to add the new data to our old Query.
Now do you believe me, that this is truly a game changer. Just imagine. You only do the hard work once and then it’s just a matter of refreshing your data. And that is why its called POWER BI and POWER QUERY 🙂