Skittles Term Project

Dania Garcia

2014

When purchasing a 2.17-ounces bag of Skittles, do you ever ask yourself if each bag contain an equal amount of colored candies in the bag? Does every bag in the box have an equal amount of candies in each bag? After reading this report you will be able to answer those questions. My goal is show through graphs and calculation how every Skittles bag is not the same.

In order to complete my goal, I purchased a 2.17-ounce of Skittles along with 24 other classmates. I grouped the candies by color and calculated the totals. The information was sent to my instructor where she created a spreadsheet. The spreadsheet shows the entire class totals of Skittles broken down by color.

The data below is from the candy I purchased. As you can see from my bag,it did not contain an equal amount of color candies per bag. So, if you’re favorite color is red and you’re least favorite is yellow or orange,youcan keep the red and share your least favorite color.

To show the proportional difference of my bag of candy, I created two charts. The Pie Chart shows the relative proportions of the colored candies. For example, red and purple are my favorite Skittles candies. Calculating the difference between the proportions, I found my bag had 4% more redcandies than purple candies.

Data from personal Skittles Bag:

Red
candies / Orange candies / Yellow candies / Green candies / Purple candies / Total
21 / 14 / 10 / 10 / 18 / 63

The Pareto Chart also displays my data values from largest to smallest. It clearly shows my bag contains more red candies compared to the other colors.

Categorical Data:

25 students participated in this project. The total of number of Skittles was 1511. Now you have to wonder what is theproportion of each color in the sample data. How different is my bag of Skittles compared to my classmate’s? Reviewing the data, I’m able to see the colors of candies are not proportional per bag.

Bag / RED / ORANGE / GREEN / YELLOW / PURPLE / Candies per Bag
1 / 15 / 11 / 15 / 15 / 5 / 61
2 / 14 / 16 / 11 / 14 / 4 / 59
3 / 15 / 16 / 10 / 11 / 8 / 60
4 / 16 / 9 / 10 / 8 / 17 / 60
5 / 13 / 17 / 13 / 17 / 17 / 77
6 / 8 / 11 / 10 / 25 / 9 / 63
7 / 9 / 9 / 13 / 11 / 16 / 58
8 / 16 / 11 / 11 / 9 / 14 / 61
9 / 13 / 10 / 11 / 2 / 18 / 54
10 / 16 / 19 / 7 / 12 / 7 / 61
11 / 11 / 15 / 13 / 10 / 14 / 63
12 / 6 / 9 / 17 / 16 / 11 / 59
13 / 21 / 14 / 10 / 10 / 8 / 63
14 / 9 / 10 / 19 / 8 / 7 / 53
15 / 11 / 7 / 6 / 19 / 17 / 60
16 / 10 / 11 / 16 / 13 / 8 / 58
17 / 9 / 11 / 17 / 10 / 13 / 60
18 / 19 / 10 / 11 / 9 / 6 / 55
19 / 10 / 10 / 19 / 8 / 12 / 59
20 / 13 / 7 / 12 / 15 / 14 / 61
21 / 12 / 11 / 18 / 8 / 11 / 60
22 / 16 / 9 / 11 / 19 / 8 / 63
23 / 13 / 14 / 9 / 11 / 14 / 61
24 / 14 / 15 / 12 / 15 / 7 / 63
25 / 12 / 10 / 15 / 11 / 11 / 59
Column Totals / 321 / 292 / 316 / 306 / 276 / 1511

I created a pie chart to show the proportional data for the total sample of candies, comparing the pie chart of personal candies to the class total’s pie chart. The red candies contain a higher percentage number. The data in the pie chart below is more evenly distributed compared to my Skittles bag chart. This is also reflected in the Pareto Chart. As a result, not every Skittles bag has an equal amount of colored candies per bag.

Quantitative Data

The data collected from the 25 students totaled 1511 Skittles. The information from the 5-number summaryshowshow spread out the data points is. To illustrate this information, I created a frequency histogram and boxplot. The histogram is skewed right due to the distribution of the data going left of the tail. This means most of the class totals are to the left the mean or average.

The sample data from my own Skittles bag is 63 lies in upper half of the data points. The boxplot shows the 5-summary and several outliers. I did expect an outlier from reviewing the data sheet. This was expected because some bags of candies contained less or more than the mean number of Skittles candies per bag.

Skittles Sample 5 - Summary:

Column / n / Mean / Std. dev. / Min / Q1 / Median / Q3 / Max
Candies per bag / 25 / 60 / 4.46 / 53 / 59 / 60 / 62 / 77

Reflection

The Skittles Project is both categorical and quantitative data. Categorical data consist of names or labels. Pie charts or bar graphs are used to represent categorical data. Pie charts are good for showing the proportion of categorical data. Bar charts allow you to quickly identify important categories and understand relative sizes of data. Quantitative data can be either discrete or continuous. Discrete data are the number of Skittles per bag. Continuous data is the weight of the Skittles bag. The types of graphs used to represent quantitative data are frequency histogram or dotplots. Frequency histogram represents the frequency of ranges and intervals. Dot plots express graphical data using stacked dots along the horizontal scale of a graph.

It is important to know what type of data is being collected. Also, how it is going to be graphed. Pie charts are best for categorical data, but not for quantitative data. Quantitative data tend to be large values, which are countable, or measurements. This type if information cannot be visualized very well on a bar graph. Comparatively, categorical data should not be graphed on a scatterplot. Scatterplots are used to show a relationship between two numerical variables.

Ultimately, it is very important to knowand understand what type of data you are collecting. Knowing what type of data you wantrepresented on a graph makes the information more understandable.

Confidence intervals are performed to make estimateson the population parameter based on the sample information gathered. I am looking at a particular characteristic of a sample, meaning I want to know what is the proportion of yellow Skittles in the sample data collected from the class. In order to perform a confidence interval, certain conditions need to be met. First, the sample is a simple random sample. Second, the sample needs to meet the binomial distribution conditions. Third, a sample needs to be greater than 30. And, lastly the proportion, the mean and the standard deviation has to fall within the margin of error. The margin of error is how close “plus or minus” what the sample statistic needs to be to the population parameter of the sample information. Three things affect the margin of error: sample size, confidence level (the probability a confidence interval does contain the population parameter), and standard deviation. These conditions are for estimating the population proportion, population mean and population standard deviation.

Below are three confidence interval estimates the formulas contain symbols corresponding to what you are estimating:

Name of Measure / Symbol for Sample Statistic / Symbol for Population Parameter
Mean / /
Standard Deviation / /
Proportion / /

Confidence Interval Estimates

Construct a 99% confidence interval for the true proportion of yellow candies.

I am sure that99%of the time the estimated proportion of yellow candies falls between 0.1764 to 0.2296.

Construct a 95% confidence interval estimate for the true mean number of candies per bag.

One can be 95% confident that the interval 58.64 to 62.24 contains the true value of .

Construct a 98% confidence interval estimate for the standard deviation of the number of candies per bag.

Based on the results, we have 98% confidence that the limits of 3.25 and 6.48 contain the true value of .

A hypothesis test is testing the claim of the population parameter or population characteristic. To perform the test, several conditions need to be met, just like interval estimates. The sample must be a simple random sample. The population is normally distributed or greater than 30. These conditions are needed to test the claim of the population mean. To calculate the population proportion, a sample again needs to be a simple random sample. Plus, meet the binominal distribution conditions.

A null and alternative hypothesis needs to be identified. The null is the value of the population parameter bases on the claim value. The alternative hypothesis opposes the claim of the null. A significance level is the likelihood of obtaining a given result by chance. The claim needs to be tested in order draw a conclusion to support or reject the claim.

As in confidence intervals symbols are found in the formulas. The hypothesis test use similar symbols representing what it being tested.

: Also none at the null hypothesis is the claim about the population parameter.
: Is the alternative hypothesis to the null hypothesis. It opposes the claim of the null hypothesis.

The null hypothesis is always equal to the claim. And, the alternative hypothesis has to be equal to, greater than or less than the claim.

Hypothesis Test

Test I

Use a 0.05 significance level to test the claim that 20% of all Skittles candies are red.

Null and Alternative hypotheses:

-1.64CV:1.21CV: 1.211.64

Test Statistic:

z=1.21

Critical Values:

Conclusion about the Null Hypothesis:

Fail to reject the Null.

Conclusion about the claim:

There is sufficient evidence to support the claim 20% of Skittles candies in a bag are red.

Test II

Use a 0.01 significance level to test the claim that the mean number of candies in a bag of Skittles is 55.

Null and Alternative hypotheses:

CV: -2.797-2.332.33CV: 2.797

Test Statistic:

Critical Values:

Conclusion about the Null Hypothesis:

Reject the Null.

Conclusion about the claim:

There is sufficient evidence to warrant rejection of the claim a 2.17oz Skittles bag has 55 candies per bag.

Reflection

In order to answer the questions proposed at the beginning of my project, I solved the statistical questions with the class data information. The data met each of the requirements to perform the calculations. The confidence intervals results fell within the margin of error.

In the case of the hypothesis test, I compared the significance level to the critical values. The critical value separates the critical region from the test statistic. These values determine if the claim should be rejected or supported when viewed within a bell curve.

Consequently, errors can be made when calculating these formulas. For example, using the wrong or , using a zstatisticformula instead of a t statisticfor the hypothesis test, orinterpreting a claim incorrectly.

After solving the problems proposed in the research project, I have level of confidence that not every Skittles bag has an equal proportion of colors. Also, each bag may or may not have an equal amount of candies, but know the bag will weigh the same.