Extension methods must be contained in a public static class. This solution uses the following declaration for its StatisticsExtensions class:
public static class StatisticsExtensions
{
...
}
Extension methods must also be declared as public and static. Their first parameter should be marked with the this keyword to indicate that the parameter is the object that is being extended.
The following code shows the TruncatedMean extension method:
// Return the truncated mean of an IEnumerable of numbers.
// Set discardNumber to the number of values to discard at the
// top and bottom. For example, set discardNumber = 5 to
// discard the 5 largest and smallest values.
public static double TruncatedMean<T>(this IEnumerable<T> values,
int discardNumber)
{
// Convert the values into an enumerable of doubles.
IEnumerable<double> doubles =
values.Select(value => Convert.ToDouble(value));
double[] doubleArray = doubles.ToArray();
// Sort the doubles.
Array.Sort(doubleArray);
// Find the values that we want to use.
int minIndex = discardNumber;
int maxIndex = doubleArray.Length - 1 - discardNumber;
// Copy the desired items into a new array.
int numRemaining = maxIndex - minIndex + 1;
double[] remainingItems = new double[numRemaining];
Array.Copy(doubleArray, minIndex, remainingItems, 0, numRemaining);
// Calculate and return the truncated mean.
return remainingItems.Average();
}
This method has a generic type parameter, T, between its name and its parameter list. The first parameter has type IEnumerable<T>, so the method extends that type. Because both arrays and lists implement IEnumerable, this means that the method applies to both arrays and lists.
The method's second parameter indicates the number of largest and smallest items that should be removed for the truncated mean.
Even if the input values are integers, their mean might not be an integer, so the method returns a double.
In order to discard the largest and smallest items, the method must sort the inputs. It cannot do that with objects that have the generic type T, so the code uses a LINQ query to convert the items into a list of double. The method uses the query to make an array of double and sorts it.
If the values are not numeric, this code will throw an exception when it tries to convert the values into doubles.
Next, the code calculates the indices of the first and last items that it should keep when it discards the largest and smallest values. It uses Array.Copy to copy those values into a new array, uses the Average LINQ extension method to calculate the mean of the remaining values, and returns the result.
This extension method takes, as its second parameter, the number of largest and smallest values that it should discard. The following overloaded version of the method takes a discard fraction as a parameter instead:
// Return the truncated mean of an IEnumerable of numbers.
// Set discardFraction to the fraction of values to discard at the
// top and bottom. For example, set discardFraction = 0.05 to
// discard the 5% largest and smallest values.
public static double TruncatedMean<T>(this IEnumerable<T> values,
double discardFraction)
{
// Calculate the number of items to remove at the top and bottom.
int discardNumber = (int)(values.Count() * discardFraction);
// Invoke the previous version of TruncatedMean.
return TruncatedMean(values, discardNumber);
}
This method uses the discard fraction to calculate the number of values that it should discard. It then invokes the previous version of the method.
The following code shows the Median extension method:
// Return the median of an IEnumerable of numbers.
public static double Median<T>(this IEnumerable<T> values)
{
// Convert into an enumerable of doubles.
IEnumerable<double> doubles =
values.Select(value => Convert.ToDouble(value));
double[] doubleArray = doubles.ToArray();
// Sort the doubles.
Array.Sort(doubleArray);
// Calculate and return the median.
int numValues = doubleArray.Length;
if (numValues % 2 == 1)
{
// There are an odd number of values.
// Return the middle one.
return doubleArray[numValues / 2];
}
// Return the mean of the two middle values.
double value1 = doubleArray[numValues / 2 - 1];
double value2 = doubleArray[numValues / 2];
return (value1 + value2) / 2.0;
}
In order to find the value in the middle of the others, the method must sort the values. To do that, the method converts the values into an array of double and then sorts it, just like the first version of the TruncatedMean method did.
Next, if the resulting array contains an odd number of values, the method calculates the index of the middle value and returns that value.
If the double array contains an even number of values, the method calculates the indices of the two middle values and returns the average of those values.
The following code shows the Modes extension method, which finds the values' modes:
// Return the mode(s) of an IEnumerable of numbers.
public static List<T> Modes<T>(this IEnumerable<T> values)
{
// Make a dictionary to hold value counts.
Dictionary<T, int> counts = new Dictionary<T, int>();
// Count the values.
foreach (T value in values)
{
if (!counts.ContainsKey(value))
counts.Add(value, 1);
else
counts[value]++;
}
// Find the largest count.
int largestCount = counts.Values.Max();
// Find the value(s) with that count.
List<T> modes = new List<T>();
foreach (KeyValuePair<T, int> pair in counts)
if (pair.Value == largestCount) modes.Add(pair.Key);
return modes;
}
This method creates a dictionary to hold counts for the values. The dictionary's keys are the original values, and the associated values are the counts.
After it creates the dictionary, the code loops through the values. When it comes to a value that is not already in the dictionary, the code adds it to the dictionary, setting its initial count to 1.
If the dictionary already contains a value, then the code increments its count.
After it has counted all of the values, the code uses the Max LINQ extension method to find the largest count.
The code then loops through the key/value pairs in the dictionary. If a pair has a count equal to the largest count, the code adds it to the list of modes.
After it has processed all of the values, the method returns the modes list.
This method returns the items in the values list that occur the most, even if that value is non-numeric. For example, if the values are names, the method will return the names that occur the most.
The following code shows the final method in the StatisticsExtensions class, StdDev:
// Return the standard deviation of an IEnumerable of numbers.
//
// If the second argument is True, evaluate as a sample.
// If the second argument is False, evaluate as a population.
public static double StdDev<T>(this IEnumerable<T> values,
bool asSample = false)
{
// Convert into an enumerable of doubles.
IEnumerable<double> doubles =
values.Select(value => Convert.ToDouble(value));
// Get the number of items and the mean.
int numValues = doubles.Count();
double mean = doubles.Average();
// Get the sum of the squares of the differences between
// the values and the mean.
var squaresQuery =
from double value in doubles
select (value - mean) * (value - mean);
double sumOfSquares = squaresQuery.Sum();
// Return the apppropriate type of standard deviation.
if (asSample)
return Math.Sqrt(sumOfSquares / (numValues - 1));
return Math.Sqrt(sumOfSquares / numValues);
}
This method converts the values into a double array as usual. It then gets the number of values and their mean.
Next, the code makes a LINQ query that selects the square of the difference between a value in the array and the mean. It then uses the Sum method to add all of those differences squared.
Finally, the method divides by the number of values, or one less than the number of values depending on whether it is calculating a sample or population standard deviation.
Now the main program can use extension methods to calculate statistical values. For example, it uses the following code to display the median of the values in the array named valuesArray:
arrayMedianTextBox.Text = valuesArray.Median().ToString("0.00");
The following code shows a useful technique that the program uses to display the mode, which is a list of values:
arrayModeTextBox.Text = string.Join(" ",
valuesArray.Modes().ConvertAll(i => i.ToString()));
This statement calls the Modes extension method to get the modes. It uses the ConvertAll LINQ extension method to convert the list of mode values into a list of strings. It then uses string.Join to combine the strings into a single string with the values separated by space characters.
Download the StatisticalFunctions example solution to see additional details, such as how the program uses labels to build its histogram.