Analyse-it
You are viewing part of the Analyse-it for Microsoft Excel user-guide. When you have read this topic we recommend you view contents of the user-guide or you can read more about Analyse-it, our statistical analysis software for Microsoft Excel.

Summary

Summary presents a statistical and visual overview of a sample. A histogram and a combined dot-, box-, mean-, percentile- and SD- plot give a visual summary and statistics such as the mean, standard deviation skewness, kurtosis and median, percentiles summarise the sample numerically.

Normality of the distribution of the sample can be visually assessed with the histogram, or normal quantile plot or statistically using a normality test.

The requirements of the test are:  



Arranging the dataset

Data in existing Excel worksheets can be used and should be arranged in a List dataset layout. The dataset must contain a continuous scale variable.    

When entering new data we recommend using New Dataset to create a new 1 variable dataset ready for data entry.

Using the test

To start the test:

  1. Excel 2007:
    Select any cell in the range containing the dataset to analyse, then click Describe on the Analyse-it tab, then click Summary
  2. Excel 97, 2000, 2002 & 2003:
    Select any cell in the range containing the dataset to analyse, then click Analyse on the Analyse-it toolbar, click Describe then click Summary.

  3. Click Variable and select the variable to analyse.
  4. Tick Parametric - Mean, SD, SE to show parametric statistics.
  5. Tick Non-parametric - Median, Percentiles to show non-parametric statistics.
  6. Click OK to run the test.

The report shows the number of observations analysed and summary statistics.

A frequency histogram, box plot, and mean plot are shown in addition to a normal quantile plot and Shapiro-Wilk normality test (see below).

The mean is a measure of the central location of the sample and the standard deviation is a measure of the dispersion of observations. The shape of the distribution is described by the skewness, a measure of the asymmetry, and kurtosis, a measure of the peakedness.

The median is a measure of the central location of the sample with half the observations above and half below the median. The percentile table shows the minimum, maximum and quartiles in addition to any other percentiles shown on the percentile plot (see below).

METHOD Percentiles are calculated using Tukey's method which approximates the percentiles as (i - 1/3) / (n + 1/3) (see [4] and [5]).

Confidence intervals are calculated for the mean, median and standard deviation.

To change the confidence interval:

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Enter Confidence interval to calculate for the mean, median and standard deviation. The level should be entered as a percentage, between 50 and 100, without the % sign.
  3. Click OK.

Customising the frequency histogram

The frequency histogram shows the distribution of the sample. The bins used are chosen automatically, based on the number and range of the observations, or can be entered manually.

To change the bins used by the histogram:

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Click Histogram bins and select Fixed.
  3. Enter Start of the first bin, Bin count, and Width of the bins. The bins must be sufficient for every observation to be classified into a bin, with no observations lying outside.
  4. Click OK.

Normality can be visually assessed by comparing the height of the frequency histogram bars to a normal curve.

To show the normal curve overlay:

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Tick Overlay Normal distribution
  3. Click OK.

Examining the observations with a dot plot

Dot plots show the observations to allow visual assessment of the distribution and clustering of observations, and to spot possible outliers or data entry errors. Observations are jittered (Y axis) to minimise overlapping points.

To show a dot plot:

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Tick Dot plots
  3. Click OK.

Customising the box and percentile plots

Box and percentile plots show the non-parametric central tendency, dispersion and distribution shape of the sample. Box plot styles vary between publications with the most common styles differing mainly in how the whiskers are drawn.

The box plot styles are:

 

To change the box plot:

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Click Box plot then select Skeletal or Outlier
  3. Click Style then select Basic, Notched, or Notched / Basic. Notched / Basic shows a notched box plot when the median confidence interval is within the quartiles, otherwise reverts to a basic box plot to avoid an ugly plot with the median notch extending beyond the quartiles.
  4. Click OK.

To hide box plots:

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Click Box plot then select None
  3. Click OK.

Percentile plots (see below) show the range within which a percentage of the observations lie. The calculated percentiles are also shown in the percentile table.

To change the percentiles plot:

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Click Percentile plot then select None, 80% of distribution, 90% of distribution, 95% of distribution or 99% of distribution.
  3. Click OK.

Customising the mean and SD plots

Mean and SD plots show the parametric central tendency and dispersion.

The mean plot (see below) shows the mean as a vertical line, and optionally, the confidence interval for the mean as a diamond shape.

 To change the mean plot: 

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Click Mean plot then select Mean line or Mean + CI diamond.
  3. Click OK.

SD plots (see below) are similar to non-parametric percentile plots, but show the parametric dispersion of the sample.

 To change the SD plot: 

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Click Std Deviation plot then select ±1 SD, ±2 SD,±3 SD or 80%, 90%, 95% or 99% of distribution.
  3. Click OK.

 To hide the mean and/or SD plot:  

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Click Mean plot then select None.
  3. Click Std Deviation plot then select None.
  4. Click OK.

Assessing normality

Normality can be visually assessed from the frequency histogram, or a Normal Quantile plot and a statistical hypothesis test can be used.

The normality tests available are:

The normality test statistic and hypothesis test are shown. The p-value is the probability of rejecting the null hypothesis, that the sample is from a normally distributed population, when it is in fact true. A significant p-value implies that the sample is from a non-normally distributed population.

The Normal quantile plot shows the observations of the sample against the expected normal quantile. The expected quantile is the number of SDs from the mean where such an observation would be expected to lie in normal distribution with the sample mean and standard deviation. When the sample is normally distributed the points will form a straight-line. Deviation from the line indicates non-normality.

To perform a Normality test and show the Normal Quantile plot: 

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Click Normality test then select Shapiro-Wilk, Anderson-Darling or Kolmogorov-Smirnov.
  3. Click OK.

 To hide the Normality test and Normal Quantile plot: 

  1. If the Summary statistics dialog box is not visible click Edit on the Analyse-it tab/toolbar.
  2. Click Normality test then select None.
  3. Click OK.

References to further reading

  1. Handbook of Parametric and Non-Parametric Statistical Procedures (3rd edition)
    David J. Sheskin, ISBN 1-58488-440-1 2003.
  2. Goodness of Fit Techniques
    Ralph D'Agostino, Michael Stephens, ISBN 0-8247-7487-6 1986.
  3. Approximating the Shapiro-Wilk W-test for non-normality

    Royston P, Journal Statistics and Computing, Vol 2 No. 3 1992; 117-119.
  4. Some Implementations of the Boxplot
    Michael Frigge, David C. Hoaglin, Boris Iglewicz, The American Statistician Vol 41, No. 1 1989; 50-55.
  5. Sample Quantiles in Statistical Packages
    Rob J. Hyndman, Yanan Fan. The American Statistician, Vol. 50, No. 4 1996, 361-365.



Part of the Analyse-it for Microsoft Excel user guide. For more information about Analyse-it, the statistics add-in for Microsoft Excel, or to download a free 30-day trial please visit http://www.analyse-it.com/. © Analyse-it Software, Ltd. Analyse-it® is a registered trademark. Microsoft® Excel® is a registered trademark of Microsoft.