StatsVocabulary

IB Math SL statistics is not that hard, but there is a fair amount of vocab to learn. Below are some of the best definitions I could find on the interwebs.

### General Concepts

Population - This is the ENTIRE collection from which we may collect data. Even if you can't find them. Can't count them. A population is everything you are trying to describe.

Sample - A selected sub-group from the population. This is generally the group that data has been collected from and is only part of the population.

Random Sample - Pretty much what is sounds like. The sample of the population is choosen with no bias, i.e. randomly. This is easy to say, but in actuality very very hard.

Discrete Data - Data that can only take on particular values. For example numbers of children in a household. You can't have 1.7 children in a house… despite that being the national average in the States. You can have 1 or 2, but no where in between.

Continuous Data - Data that can take on a full range of values. For example height of people. It would be very strange if height data was discrete - people in a growth spurt would suddenly lurch from 120 cm to 121 cm. If we were collecting height data we would expect to see any value in a range of numbers.

Quartile(s) - Quartiles refer to the idea of "breaking up the data" into quarterss. The boundary that separates the first quarter from the second quarter is called the "1st quartile" and can be referred to as the 25th percentile. The "2nd quartile" is the 50th percentile. The "3rd quartile" is the 75th percentile. This is frequently used in context with box-and-whisker diagrams, but not always.

Interquartile Range - The IB loves the IQR. Just loves it. The IQR is the difference between the 1st quartile and the 3rd quartile. This is used to define outliers.

Outlier - Data that is strange. Odd. I mean that kid that sits in the back with a booger collection. The IB defines an outlier as a data value that is more than 1.5 times the interquartile range (IQR) from the nearest quartile.

Upper and Lower Interval Boundaries - Basically the value(s) that marks the boundary of an interval. So if the data was grouped into 3 groups (or classes) with values from 0-100 and 101-200 and 201-300. Then the 101 would be the lower interval boundary for the second interval.

Mean - This is essentially the average. We can talk about the mean of the population or the mean of the sample. In general it is assumed that the mean of a sample is a good approximation of the population mean.

Median - This is the "middle" number in a data set. For example 1,1,2,7,100. The median is 2.

Mode - The mode is the most common or frequent number. In the example above the mode is 1.

Modal Class - If data is grouped into classes (or bins) then the class with the most data in it is the "modal class." Same idea as the "mode" just in reference to grouped data.

Range - This is the difference between the largest value and the smallest value in a data set. Again for the example above the range is 99.

Variance - A measure of how far each value in the data set is from the mean. Basically if the variance is small then the data is clumped together - i.e. the values are similar. If the variance is large the data is spread out - i.e. the values are dis-similar. The variance is given the mathematical symbol of $\sigma^2$.

Standard Deviation - Like the variance the standard deviation is a measure of how far each value in the data set is from the mean - sometimes talked about as a "measure of central tendency." The standard deviation is simply the square root of the variance - thus is given the mathematical symbol of $\sigma$.

What did I miss? Any more you'd like? Leave a comment below and I'll get on it.