THIS IS THE LAST ASSIGNMENT THAT THE CURRENT ASSIGNMENT REFERS TO:

BEGIN ASSIGNMENT / INFORMATION TO BE USED IN THE NEXT PROBLEM:

Correlation:

Among the at least 50 variables given, I considered only 3 continuous variables among them, and they are Age, Wealthscore and MedianSchoolYears. Though in some way, age can also be considered a discrete variable, but for computational purposes, we will consider it as continuous one. Shown below is the correlation matrix computed using the excel data analysis.

Age / WealthScore / MedianSchoolYears
Age / 1
WealthScore / 0.268140749 / 1
MedianSchoolYears / 0.120160926 / 0.719236932 / 1
critical t value
0.1
df / 1998
alpha ( tailed) / 0.05
critical t value (2 tailed) +/- / 1.9612
T test for correlation Age & WealthScore
r / 0.2681
n / 2000
t value / 12.441222
T test for correlation Age & MedianSchoolyears
r / 0.1202
n / 2000
t value / 5.410273
T test for correlation WealthScoreMedianSchoolyears
r / 0.7192
n / 2000
t value / 46.273448

Since the critical t value at df = 1998, and alpha = 0.05 (2 tailed) are t =  1.9612

And since the computed t value for significant correlation for pairwise (among the 3 variables) are 12.44, 5.41 and 46.27 respectively, then we can conclude that there are sufficient statistical evidences to support the claim that all three variables are significantly correlated with one another (has a correlation that is significantly different from 0)

Regression:

I chose Age, WealthScore, Region and NumberOfChildren to estimate and make a multiple regression model to estimate MedianSchoolYears. I chose the four independent variables, since they are numerical in nature, quantitative, and are at least interval level of data. Also, logically speaking, these 4 variables somehow have direct impact to one’s median or average number years in school, that is, one’s age, one’s wealth , number of children, and his or her region of locality will definitely affect the number of years one will have in school.

The result of the excel data analysis for multiple regression is shown below:

SUMMARY OUTPUT
Regression Statistics
Multiple R / 0.731296
R Square / 0.534793
Adjusted R Square / 0.533813
Standard Error / 0.972937
Observations / 1903
ANOVA
df / SS / MS / F / Significance F
Regression / 4 / 2065.406503 / 516.3516 / 545.4767 / 0.0000
Residual / 1898 / 1796.658615 / 0.946606
Total / 1902 / 3862.065118
Coefficients / Standard Error / t Stat / P-value / Lower 95% / Upper 95% / Lower 95.0% / Upper 95.0%
Intercept / 10.04167627 / 0.091213302 / 110.09 / 0.0000 / 9.8628 / 10.2206 / 9.8628 / 10.2206
Age / -0.006687771 / 0.00112403 / -5.94982 / 0.0000 / -0.0089 / -0.0045 / -0.0089 / -0.0045
WealthScore / 0.011514753 / 0.000250916 / 45.8908 / 0.0000 / 0.0110 / 0.0120 / 0.0110 / 0.0120
Region / 0.026775292 / 0.00853239 / 3.138076 / 0.0017 / 0.0100 / 0.0435 / 0.0100 / 0.0435
NumberOfChildren / -0.138960664 / 0.022442496 / -6.19185 / 0.0000 / -0.1830 / -0.0949 / -0.1830 / -0.0949

The global F test result for ANOVA for multiple regression has an F value = 545.4767 (highlighted in green) which has a p value of less than 0.0000, and this only means that the Multiple regression model is highly significant. This can also be seen from the high value of the R Square= 0.534793, which means that 53.4793% of the variations of the medianschoolyears can be attributed with the variations of the 4 independent variables. Also we can see from the t test for significant coefficients that all 4 the coefficients including the intercept, has p value of less than 0.00, and so, this only means that all the four independent variables are highly significant. Finally, the S (Standard Error of the multiple regression) is 0.972937, which tells us that the average distance of the data points from the fitted line is about 0.972937% is significantly small.

The multiple regression equation is shown below:

Y = -0.006687771*(Age) + 0.011514753*(WealthScore) + 0.026775292*(Region) – 0.138960664*( NumberOfChildren)

Where Y = NumberOfChildren

As a whole, the model for the median school years has a very good fit.

END ASSIGNMENT / INFORMATION TO BE USED IN THE FOLLOWING PROBLEM

CURRENT ASSIGNMENT POSTED TO JUST ANSWER /

BEGIN ASSIGNMENT TO BE DONE:

Ongoing Data Exploration
Your final project entails systematicextraction of decision-aiding insights from a dataset (SampleDataSet.xlsx) provided to you in the Doc Sharing area. The goal of this project is to provide you with hands-on experience in conducting and interpreting different types of statistical analysis. The focus of your analysis will be on marketing strategies and analysis-related topics. At times, you will be expected to conduct additional research on topics that are not adequately covered in your text, for example, data due diligence.
In the Week 4 assignment (LAST QUESTION THAT YOU DID FOR ME), you were asked to build a multiple regression model to explain the variability in the median school year, using a minimum of seven independent variables. Using the same model, thoroughly assess your model's diagnostics. Identify all relevant assessment dimensions, briefly outline their purpose and importance, and provide an assessment of your model in terms of the identified diagnostic measures.

END ASSI