Statistics Representations

Statistics can get represented in many ways. Below are the most common representations in IB Math SL.

Frequency Distributions

That's basically a fancy name for table of data. The one difference is rather than simply list all the data - which might take A LOT of space - they list the values of the data and the frequency or how often they occur.

Number of Siblings 0 1 2 3 4
Frequency 3 8 5 2 1

From this data table we can't immediately see how many data points there are. But if we add the values in frequency row we can see that $3+8+5+2+1=19$

Its not uncommon to be given a table like this and asked to calculate the mean. There is an easy(er) way and a hard way. Lets do the easy. The IB gives the somewhat confusing formula for a mean of a set of data as:

(1)
\begin{align} \overline{x}=\frac{\sum_{i=1}^n f x_i}{\sum_{i=1}^n f} \end{align}

The format is a little off. Latex doesn't seem to like the sums in the fraction.

Lets look at that one piece at a time. The bottom is simply the total number of the data points, i.e. the sum of the frequencies.

The top is simply the sum of the frequency multiplied by the value. This is the same as if we added all the data points… Which would have taken much longer.

We can then write:

(2)
\begin{align} \overline{x}=\frac{0 \cdot 3 + 1 \cdot 8 + 2 \cdot 5 + 3 \cdot 2 + 4 \cdot 1}{3+8+5+2+1} =\frac{28}{19} \end{align}

This method is particularly useful if the frequencies of the data are much higher…

Frequency Histograms (with equal class intervals)

This is essentially a graph of the frequency distribution. Data can be put into groups (also called bins or classes) to simply the analysis. For example the data above could be grouped like so:

Number of Siblings 0-1 2-3 4+
Frequency 11 7 1

While not particularly useful given the limited data set this can be very helpful if the data was say height or weight… Think about it. Continuous vs. Discrete.

This data can then be graphed.

pub?w=667&h=464

I just noticed the classes are not equal intervals… Oops. The 4+ category is not the same. I think you can deal with it.

Frequency Histograms with TI-84

It's a bit unclear if you'll have to do this with your GDC on an IB test, but just in case…

Note that this video does not create a histogram with frequencies. However! When you turn the stat-plot on and tell the GDC which list to plot the option at the bottom is the frequency. The frequencies can be entered as a second list - the value and frequencies that correspond must be in the same row of their lists.

Box-and-Whisker

These are awesome? Your GDC can produce them. You may have to sketch them or at least read them. I think the picture below is self-explanatory.

images?q=tbn:ANd9GcT1zuHTDLEarKIDEt6yVGondPOByTipQ6__rAgCdVF67ARo3HGe3w

If you don't like that one Try one of these.

Box and Whisker on a TI-84

Here's a 4 minute video showing you how to create a box and whisker diagram.

Cumulative Frequency Graphs

Call be a big nerd, but I love these. Not sure why. Lets return to the very "real world" data of number of siblings from above. (The term "real world" makes me want to puke every time I hear it… but lets not get side tracked into something more interesting than statistics?)

This time I'm going to add a third row. That being the cumulative frequency. Or the "running total."

Number of Siblings 0 1 2 3 4
Frequency 3 8 5 2 1
Cumulative Frequency 3 11 16 18 19

Now if we graph cumulative frequency vs. number of siblings we have a "cumulative frequency graph."

pub?w=546&h=583

While that one's not so pretty… The IB ones tend to have the more typical "S" shape as shown below. Note that in this CFG the quartiles are clearly marked.

graph_40.gif

An IB Trick seems to be giving CFG's with a total number of data points just over 100, say 120. Then to ask for the student to mark the quartiles. Its all to easy to mark the quartiles at 25, 50 and 75… Which would be wrong. If the total number of data points is 120 then the 1st quartile is at 30, the 2nd at 60 and the 3rd at 90. So use your brain.

Also when we make a CFG we have in some ways assumed that the data is continuous rather than discrete. If you have discrete data and use and calculate the quartiles and then draw a CFG and find the quartiles based on the graph your answers will not agree - because of an assumption of continuity.


Want to add to or make a comment on these notes? Do it below.

Add a New Comment
or Sign in as Wikidot user
(will not be published)
- +
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License