Excel Unplugged

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

I recently posted on how to make a process of getting data from a folder bulletproof. You can read it here. Whereas that post was about getting data from multiple TXT files within one folder, this one will tell you how to  get data from folder full of Excel files. There is a key difference here since Excel files are different than TXT files in a way that they can contain multiple Sheets and Tables. So the final step of Get Data From Folder will have to use a function of more than just one parameter.

This post is obviously not regular, since it’s not Tuesday (now I feel like Julianne Moore in Laws of Attraction) J but it’s a spinoff of a post I wrote in February of 2015 called Get Data From Folder with Power Query. Almost all comments referring to the mentioned blog post asked for exactly this, how do I do it with Excel files. This requires a few extra techniques and some high level Power Query stuff, so I decided that rather than trying to fit it all in a comment section of that post, I would rather do an incognito blog post.

I will draw inspiration from the last comment posted bellow the post mentioned above that goes like this

Is there any chance to pull the values from a specific range of equally looking Excel files (all forms) and list them?

Example: The cell C9 of all forms contains one product characteristic. This characteristic shall be now listed as “Characteristic A”.

We start with a folder of Excel files (I also planted a txt document in there just for fun)

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

That look like this

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

All Files have a different number in cell C9 of a sheet called Special Sheet. If you want to follow along, you can get the files here. Unzip all files to one folder and remember the folder location since we will need that further along the line.

And just for reference, and so we can keep our bearing at all times, here are all the steps needed for this to work

  1. We must create a Query based on one of our files that will “teach” Excel how to work with our Excel Files (how to extract the right value from the desired Sheet).
  2. Next we turn that Query into a Power Query function that will take two parameters (the filepath and sheet name)
  3. That is followed by creating a new query using the Get Data From Folder command and getting to our Excel files and the right sheets by using filters
  4. To wrap things up, we use our function on the right Sheets of the right files and create the desired column.

The Base Query

So we start in a blank Excel file and launch the Power Query (Data/New Query in Excel 2016)/From File/From Excel (From Workbook in Excel 2016)

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

 

Then find the folder and choose any of the Excel files. I will choose File1.xlsx, so if you chose differently some steps may differ in view.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

Then choose the Special Sheet in the Navigator followed by the Edit command on the bottom right.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

And now the fun begins J

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

So just to keep the goal in mind, we need Cell C9 which at this point is number 64 in column Column3. We will achieve this by pressing the fx button in the formula bar (if you don’t see the formula bar, you can turn it on by ticking the formula bar on the View tab)

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

And then adding the [Column3]{8} to what was there by default.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

After you enter this, you will get the desired value. The syntax is classic M, but the 8 might take you by surprise, since it was obvious that this value was in row 9. To understand this, you should know that Power Query has a 0 based system. This means that 0 parameter gives you the first value, 1 gives you the second and so on…

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

Now that we have the value we need, we’re just going to rename the last step (not necessary, but will make things easier). This is done by right clicking the last step in the APPLIED STEPS window on the right and selecting Rename. The name I will give to this step is Value.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

To wrap this Base Query up, we choose the fx again and this time we write a simple formula

= Table.FromValue(Value)

Where Value in the parenthesis points to the name of the last step. If you didn’t change that, you must input the name of the last step that you see in the APPLIED STEPS.

At this point you should see this.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

This last step completes the Base Query. Don’t close the Query we created because we’re going to need it in step two, where we’re going to change this Base Query into a Power Query function.

Creating the import function

This brings us to phase 2, creating the function with two parameters, filepath and Sheet Name. To start this off, we go to View/Advanced Editor and this will open up the Advanced Editor Window.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

Since our goal is to create a function that could be applied to all files, we need to remove the specific data and make it general. This specific data is highlighted below.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

Ok, the first thing we do, is to add a blank row above the let command and write

(filepath,SheetName)=>

So Parenthesis, the name of the first parameter followed by comma and the name of the second parameter and closed parenthesis. Then an equal sign followed by the “greater-than” sign.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

Then we need to change the filepath in line three and the name of the sheet with the parameters we named. This part is CASE SENSITIVE!!!!! So be careful to replace the parts correctly.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

Now press Done in the Advanced Editor Window, and just change the name of the Query to fnImportExcel. Now you should see this:

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

After completing that, you choose Home/Close and Load,

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

You will see the function you created

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

This brings us to Step 3.

Get Data From Folder To get Excel Files

So we’re still within the same Workbook, and we’re going to make another Query to get data from folder. This time we choose Power Query (Data/New Query in Excel 2016)/From File/From Folder

And then choose the folder where you have your Excel files and choose Ok. This is what you get:

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

At this point the Get Data From Folder lists all the files within a selected folder.

First off, we need to make sure, that all further steps will only be applied Excel files as input. Remember the txt file from the beginning. It’s time to say farewell to it. To do this we do two things. First we select the Extension column and choose Transform/Format/UPPERCASE

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

This looks like a redundant step, but it’s needed to be able to process xlsx and XLSX equally. This is followed by Filtering the Extension Column to only keep the Excel Files. So we choose The Filter, then select Text Filters and Contains. Now we write XLS and with this we enable the Query to keep the xls, xlsx, xlsm and all other Excel formats.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

So now we get:

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

Even if this last process seems redundant at this point, keep in mind that we want to create a query that will work in the future. Remember that Get Data From Folder will list all the files in that folder and all the steps that are about to come, will only work on Excel files. So id someone was to put a jpeg of themselves on vacation within our folder, the Get Data From Folder will pick that up. But the steps we just did will make it disappear. I wrote a whole Post on making your Get Data From Folder queries bulletproof. You can read it here.

Now you select the Name and Folder Path columns, right click on one of them and choose Remove Other Columns.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

At this point we have

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

Now we need to create a filepath column that will concatenate the two columns. Go to Add
Column/Add Custom Column

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

We name the column filepath and write a simple formula same as in Excel.

= [Folder Path]&[Name]

You can get the Column Names into the formula by simply clicking them in the Available Columns list on the right. (Try to ignore the strangest of phenomenon, where the dialog box is half English and half Slovenian).

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

At this point we can get rid of all columns except for filepath. We do this the same as before, i.e. right clicking the filepath column and choosing Remove Other Columns. Now for a bit of stardust, we choose Add
Column/Add Custom Column again and this time we write a formula

= Excel.Workbook(File.Contents([filepath]))

So we insert the filepath column into the Excel.Workbook(File.Contents()). This will give you the following column

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

And by clicking the double headed arrow next to the Column name you get a few columns to choose from. We need columns…

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

Name and Kind, so deselect all others and also untick the Use original column name as Prefix. After pressing Ok you should see

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

Now this is great, we see all the sheets in those workbooks and the names of those Sheets. You can easily guess what comes next. We apply the filter to the Name Column selecting only those called “Special Sheet”. Now we’ve got everything we need. We have the function that requires filepath and SheetName and now we have that data for all Excel files in the folder.

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

Now for the last time we choose Add
Column/Add Custom Column and add a special formula

= fnImportExcel([filepath],[Name])

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

This means we will use the function we created on the two parameters and we get

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

And after pressing the two headed arrow, we only have the one column called value…

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

…that gives us exactly what we need:

USE POWER QUERY’S GET DATA FROM FOLDER TO GET DATA FROM MULTIPLE EXCEL FILES

So the value of cell C9 on the Special Sheet of those files in the folder. Purely brilliant! At this point we could get rid of all the columns we don’t need, and then load the data to wherever we want.

Comments 17

  1. MF says:

    Although I haven’t had a chance to try this out… thanks so much for showing what Power Query could do! 🙂
    btw, seems there is an typo in “just to keep the goal in mind, we need Cell C7 which at this point is number 64 in column Column3. “… Should Cell C7 be Cell C9? ;p

    1. Thanks for pointing that out… It was a typo and it was corrected.

  2. Vasyl Melnychuk says:

    Thank you, Gašper, for this lovely article!

    I used it for loading both structured Excel tables and raw Excel files from different folders simultaneously. It took slightly different technique, but 98% of your info was used!

    Well done!

  3. Sam says:

    Hi, Gasper,
    Thank you for this.
    Question: your article is pulling a specific cell, is there a way to pull a specific range?
    For example: B8:C9 in Excel term.

    Regards
    Sam

    1. Hi Sam,
      The easiest way is to add an index column (Add Column/Add Index Column) and use a filter to get the desired rows and just remove unwanted columns.

  4. Trevor says:

    Hi,
    I have files organized by folder. I have these in a dynamic table so that the address can be easily created. I need to iterate over this list and query the files (in the different folders, for previous months) for the same tables. Dumping them into the same folder will not work.

    Is it possible to iterate over the folders, pull up the file and integrate into a query?

    1. If you do have the list of file paths somewhere in Excel, and you have created the PQ function that knows how to work with these files, you can instead of using a Get Files From Folder command, use a simple Get Data From Excel (From Workbook), go through your Excel list of files, apply any filters you need for the desired import and then use your function (in a custom column) on the remaining files…

  5. Michael P says:

    Is there a way to pull multiple cells from a sheet and create a table with all the values in that?

    1. You could by naming those cells, using Excel.CurrentWorkbook() and careful naming that could allow you to Append or Merge the gathered data from cells, but I don’t know if it’s worth the effort.

  6. Kapil says:

    Hey Jasper, do you think if we create query and use the option ‘From folder’ we will get the same result ?

    1. That would more or less depend on the data that you have but the steps you would take would more or less be the same.

  7. Richard Stone says:

    Hi there Gašper,

    Just found your article and have been working through it using your sample files. Despite ensuring that I have followed every step correctly I get an error when I try to run the last custom column – the one that calls the function.

    I am using PQ 2016.

    Any ideas?

    1. I can repeat all the steps no problem. The only think where our Power Query versions could differ is if you are using Office 2016 Volume License since those do not get the updates that regular Office 2016 or Office 365 get.

      But do check the base query again. If it works on all the files.

  8. Radovan Kubis says:

    Hi I am trying to pull data from one folder – the workbooks though don’t have identical structure meaning there is always only one sheet but with different name. Is there a function (or can you share formula ) that would do this job?

    Thank you

    1. Radovan, the fact that the Sheets have different names is covered by SheetName being a parameter in the Query. Please read the whole post and you will see that SheetName is not hard-coded in our function and therefor can be anything.

  9. Eddy says:

    Thank you very much for putting time and effort to share what you can do in Power Query.
    A great post which helps me a lot.
    Of course, since this post was written, some things in PQ were improved and so some thing can be even easier done now. E.g. merging by selecting columns in the order the have to be merged and then right click oen of the columns and choose ‘merger columns’. But this post’s still stands strong!
    Kind regards.

    1. Thanks Eddy. Appreciate the comment!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.