# 1.9. Represent, analyse and interpret data to investigate real-life and work problems

[responsivevoice_button rate=”0.9″ voice=”UK English Female” buttontext=”Listen to Post”]

After completing this section, the learner will be able to represent, analyse and interpret data using various techniques to investigate real-life and work problems, by successfully completing the following:

- Ensure that graphical representations and numerical summaries are consistent with the data, are clear and appropriate to the situation and target audience
- Compare different representations of aspects of the data to take a position on the issue
- Ensure that calculations and the use of statistics are correct and appropriate to the problem
- Justify interpretations of statistics and apply it to answer questions about the problem
- Discuss new questions that arise from the modelling of the data

In this section we will learn how to represent, analyse and interpret the data we have collected in order to investigate real-life and work problems, such as:

- Determining trends in societal issues such as crime and health;
- Identifying relevant characteristics of target groups such as age, range, gender, socio-economic group, cultural belief and performance;
- Considering the attitudes or opinions of people on issues

By studying patterns and making calculations, we will be able to draw conclusions, make decisions and predict trends.

We will also keep in mind that information can be unreliable and the resultant representations, such as graphs, can be distorted, resulting in inaccurate interpretations.

**Graphical representations and numerical summaries**

We now come to Step 4 of your research process:

**Step 4: Represent data**

It is necessary for you to use tables, graphs or charts to show your reader what your data is saying.

Included is the need to calculate certain values (called statistical measures or simply “statistics”). Ensure that any calculations you do and the use of statistics are correct and appropriate to the problem you have chosen to research. Ensure that you represent your data in a way that is appropriate to your audience as well.

Employees attending a meeting don’t want to be bored with lots of numbers that they may not understand, but show them a graph and they most probably will. Showing a performance graph with an arrow pointing upward will make the employees very happy, because it shows clearly and unambiguously that performance has improved.

Of course, we have to ensure that graphical representations and numerical summaries are consistent with the data, and are clear and appropriate to the situation and our target audience.

*Techniques for statistically modelling a situation*

*Techniques for statistically modelling a situation*

Techniques for statistically modelling a situation include:

- Tables, graphs or charts
- Histogram
- Frequency polygon
- Stem-and-leaf plot

Once you have collected your raw data, you need to represent it in a diagram.

*Represent a normal distribution of data *

*Represent a normal distribution of data*

A normal distribution of data means that most of the examples in a set of data are close to the “average,” while relatively few examples tend to one extreme or the other.

Let’s say you are researching nutrition. You need to look at people’s typical daily calorie consumption. Like most data, the numbers for people’s typical consumption probably will turn out to be normally distributed. That is, for most people, their consumption will be close to the mean, while fewer people eat a lot more or a lot less than the mean.

When you think about it, that’s just common sense. Not that many people are getting by on a single serving of kelp and rice; or on eight meals of steak and milkshakes. Most people lie somewhere in between these extremes.

If you looked at normally distributed data on a graph, it would look something like this:

The **x**-axis (the horizontal one) is the value in question… calories consumed, dollars earned or crimes committed, for example. And the **y**-axis (the vertical one) is the number of data points for each value on the **x**-axis… in other words, the number of people who eat **x** calories, the number of households that earn **x** dollars, or the number of cities with **x** crimes committed.

Now, not all sets of data will have graphs that look this perfect. Some will have relatively flat curves, others will be pretty steep. Sometimes the mean will lean a little bit to one side or the other. But all normally distributed data will have something like this same “bell curve” shape.

The **standard deviation** (see Module 1) is a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data. When the examples are pretty tightly bunched together and the bell-shaped curve is steep, the standard deviation is small. When the examples are spread apart and the bell curve is relatively flat, that tells you there is a relatively large standard deviation.

Computing the value of a standard deviation is complicated, as you saw, but let’s look at a graphical representation of a standard deviation:

One standard deviation away from the mean in either direction on the horizontal axis (the red area on the above graph) accounts for somewhere around 68% of the people in this group. Two standard deviations away from the mean (the red and green areas) account for roughly 95% of the people. And three standard deviations (the red, green and blue areas) account for about 99% of the people.

If this curve were flatter and more spread out, the standard deviation would have to be larger in order to account for those 68% or so of the people. So that’s why the standard deviation can tell you how spread out the examples in a set are from the mean.

Why is this useful? Here’s an example: If you are comparing test scores for different schools, the standard deviation will tell you how diverse the test scores are for each school.

Let’s say Springs High has a higher mean test score than Benoni High. Your first reaction might be to say that the kids at Springs are smarter.

But a bigger standard deviation for one school tells you that there are relatively more kids at that school scoring toward one extreme or the other. By asking a few follow-up questions you might find that, say, Springs’ mean was skewed up because the school district sends all of the gifted education kids to Springs. Or that Benoni’s scores were dragged down because students who recently have been “mainstreamed” from special education classes have all been sent to Benoni.

In this way, looking at the standard deviation can help point you in the right direction when asking why information is the way it is.