Using Excel filter to analyse a subset of the dataset

Thursday, 14 August, 2008

While answering a customer e-mail earlier this week, it occurred to us that some Analyse-it users don’t know how to filter data with Analyse-it. Filtering lets you restrict analysis to just a subset of your dataset, for example females only, or females over 40 years old, and is invaluable for exploratory data analysis.

How does Analyse determine what to analyse?

To let you analyse a subset of your dataset, Analyse-it only extracts the visible data from the Excel worksheet. Analyse-it ignores observations on hidden rows, and does not include them in analysis.

You can manually hide rows on the worksheet to exclude them from analysis. Simply highlight the worksheet rows to hide, right-click over the selection, then choose Hide from the pop-up menu.

More often though you want to filter to specific criteria, such as Females only, or females over age 40. Rather than hide the rows containing those cases, it’s easier to use Excel’s built-in filter. Called AutoFilter until Excel 2007, filter lets you choose the values to filter to, such as Male or Female, or use a simple boolean expression, such as values greater-than 40. Excel then hides the rows that don’t match the criteria – excluding them from analysis by Analyse-it.

Using filter to analyse a subset

You can use filter with any Analyse-it dataset in list format. List format simply means each variable is in a separate column. The top-most cell contains the variable name, and the observations follow in the column below. See the screenshot below for an example.

You can enable filter on any list dataset:

  1. Click a cell in the dataset. 
  2. Click Filter on the Analyse-it toolbar to enable filter on the dataset. Click Filter again if you want to remove the filter, and show all the data in the dataset.

    Analyse-it toolbar filter button
  3. Click the drop-down button to the right of the variable you want to filter. You can choose individual values to filter to, or use a boolean condition.

    For example, to show just females, click the drop-down button next to Sex, tick the checkbox alongside Female, and then click OK.

    Filter by sex=female

    To filter to cases over 40 years old, click the drop-down button next to Age, then choose Number Filters, Greater Than, enter 40, and then click OK.

    Filter age greater than 40
  4.  Any subsequent analysis with Analyse-it will analyse only the visible rows.

You can use filter on any variable in the dataset – not just the variables you’re analysing. In the above example, you could filter the dataset to females over 40 years old and then use the Mann-Whitney test to compare salary for those with Higher and Standard education.

The only limitation, imposed by Excel, is you can only use filter on one dataset at a time on each worksheet. This isn’t a problem if you use separate worksheets for each dataset.

Using filter to choose the groups to test

You can use filter to choose the groups to analyse in two-sample tests such as the Independent t-test, F-test, or Mann-Whitney test. Each test expects your dataset to contain two groups. When there are three or more, and you want to analyse just two groups, you can use filter so Analyse-it only finds the two groups in which you’re interested.

As an example, imagine you’ve measured a sample of widget weights from three production lines: A, B & C. Let’s assume the operator on production line B weighed the wrong widgets, making the data useless for comparison with A & C. Rather than abandon the analysis you might decide to compare the weights from lines A & C only:

  1. Select a cell in the dataset 
  2. Click Filter on the Analyse-it toolbar to enable filter on the dataset.
  3. Click the drop-down button next to Production line, tick A, C, clear B, and then click OK.

    Filter to A vs C
  4. Choose Compare Groups > Mann-Whitney from the Analyse-it toolbar, and then choose to analyse Weight by Factory. Analyse-it recognises you want to compare just the two production lines, A vs. C, as you can see in the Alternative Hypothesis below:

    Mann Whitney test in Analyse-it
  5. Because of the filter, Analyse-it will only compare the weights for production line A vs. C, ignoring the data hidden for line B.

In Analyse-it 3 we’ve choosing the groups to analyse easier. Instead of using filter you can choose the groups to analyse when you run the test. Simply click the Filter button alongside the factor, then choose the groups to analyse:

Analyse-it 3 filter to A vs C

What happens when updating a report?

When we developed Analyse-it, the question arose of what to do when updating a report. Should the same subset of data be analysed, using the same filter criteria to select what's analysed? Or should the dataset be analysed as it currently stands, using whatever filter is in effect at the time.

At the time we had no choice. Microsoft Excel didn’t provide programmatic access to the filter currently in effect. There was no way to see if the data was filtered, never mind how to get or set the filter in effect. We had no choice than to make Analyse-it work with the subset of data currently visible.

All versions of Analyse-it up to now have worked the same. When you click Refresh report, Analyse-it repeats the analysis with the data visible in the dataset. The benefit is you can analyse a subset of data, change the filter, and then update the analysis to see the effect. The downside is most users expect update to repeat analysis with the same data – taking account of new or changed data – but still using the same filter criteria.

The latest versions of Microsoft Excel do let add-ins like Analyse-it access the current filter settings. In Analyse-it 3, we’re planning to save the filter in effect when you run a test, and then let you choose if you want to re-apply it when updating a report. We’ve yet to resolve some issues, but generally think this approach is best. What do you think?

2 comments / Post comment Add to: del.icio.us del.icio.us / digg it! digg it! / Stumble upon Stumble upon / Subscribe to RSS feed Subscribe to RSS feed

Comments

Good post, thanks for the tips. Your screen shots are very helpful and track along with the text nicely. Very easy to read. I agree that being able to re-apply the filter when updating your reports is extremely important, especially if you do alot of filtering and are producing lots of reports.

I would like to mention that I am quite concerned about this problem: "The only limitation, imposed by Excel, is you can only use filter on one dataset at a time on each worksheet."

This limitation is extremely troublesome for large complex datasets. For example, I am routinely looking at 50,000-60,000 row datasets with 3-20 continuous variables, and another 5+ nominal catagories for each row. I know that you state, "This isn’t a problem if you use separate worksheets for each dataset," but it is a pain to make a new worksheet every time I need to graph the data in another fashion.

Nice post, keep up the good work. Let me know if you find a way to filter multiple catagories!!
Thanks for your kind comments re: the blog Mark. We're trying to provide useful tips and insight into features built-in to Excel and how they can be used with Analyse-it. So far the blog is proving very popular and is appreciated by a lot of users.

One thing I'd like to clarify -- when we say you can only apply filter to a single dataset on each worksheet, each filter does allow you to set criteria to filter multiple columns. In your reply it sounds like you misunderstood that point? We do not recommend you create copies of your worksheet, as that's likely to introduce inconsistencies between the copies when make a change on one worksheet but not on the copies.

Admittedly Excel is not strong on filtering. One weakness, especially for someone like yourself who has to filter the data in many ways, is that Excel doesn't let you save filters so you can re-apply them in future. Instead you have to setup the filter criteria for each column each time you want to re-apply it. We are considering addressing this limitation in Analyse-it 3.0, or a future update though. How soon we offer this feature depends on how much demand there is for it.

Post a comment

Let us know your thoughts and opinions on this post. Did you like the post, or was it too complex, or maybe too simplistic? You feedback helps improve both the blog and Analyse-it.

Name
E-mail address
Your e-mail address will not be published but is required to post a comment. We guarantee your privacy.
Comment