Statistical Learning Aids

JP-003

Eric Lieser

Paul Whitford

Joseph Petruccelli

1/28/2001

Abstract:

The purpose of this project is to rewrite/redesign selected lab activities as described in Applied Statistics for Engineers and Scientists, by Petruccelli, Nandram and Chen. Through review of existing applets and educational research, we will gather new ideas and methods to design more effective learning tools than are currently available. This conversion from the current SAS based labs to Internet based JAVA script will culminate in a comprehensive web site including all necessary lab instructions, providing a new learning environment. Through student surveys and student-program observations, we will assess the potential for success when incorporated in the introductory statistics sequence at WPI.

Table of Contents

PageTitle

2Abstract

4Introduction

5Background

11Methodology

13Advantages (JAVA vs. SAS)

14Bibliography

Introduction:

Our IQP is based around the redevelopment of the lab series in use with MA 2611 and MA 2612, at WPI. Our goal is to provide a new appropriate web-based environment for presenting these labs. This involves reviewing the current labs and redesigning, when appropriate, to incorporate our research on human-computer interaction. This will be accomplished by rewriting the current SAS labs in the JAVA language. To ensure the success of these labs in statistics courses, we will test them. There will be three methods used to do this. First, we will have student surveys. Second, we will study how students interact and work with the new materials. We will then publish prototype labs on the web for other statistics departments nationwide to inspect and react to them. To conclude this project, we will publish our revised labs on the web for use with Applied Statistics for Engineers and Scientists based courses.

Background:

We would first like to introduce the background information with an explanation of its organization. With the purpose of the IQP being to relate technology and society, we felt there were two areas to research in order to be adequately prepared. First, we researched the technological elements of our project, i.e. What is the standard of technology?; How is this technology utilized? The second component of research was in the area of educational qualities of computer programs. This involved finding information pertaining to creating educationally effective computer programs. In doing this, we also determined how this project is going to benefit students.

Technological Aspects

In this day and age, there is a major trend towards computer based learning in academia. There are thousands of short lessons, as well as entire college courses and programs offered through the Internet. So, to no surprise, we found many sites that offer interactive statistics activities. Most of these are based on the JAVA language. We have found many ways for presenting ‘JAVA-based’ applets. One option, as illustrated by WebStat[1], is to create simple database tools. These tools offer the user an array of options to analyze any data set. While interesting, this interface doesn’t offer insight of how to simplify the lab process.

The majority of the sites we found have individual applets that illustrate single concepts, accompanied by a lesson. These have provided a good number of outside ideas for the design of our labs. Many good examples of this type are at Rice University’s[2] web page.

The projected final product of this IQP is to develop a new educational environment through the use of small, interactive applets, like those found at the Rice site, accompanied by demos, interactive activities, instructions, introductions, summaries, and additional text.

One objective of these online applications is to make our materials accessible to anyone who needs a lab, visual aid, or graphical representation of statistical principles. This type of interactive (emphasis on the active) approach is often used to illustrate concepts in the college classroom. Some instructors announce sites to their students to aid coursework. Individual students also use them for personal ends, often to clear up confusion, or to satisfy curiosities.

Some course laboratory experiences are not based upon JAVA, but rather on commercial program packages, such as SAS. SAS programs can be powerful tools for those involved in advanced studies of Statistics. But, these are not necessarily the best tools for those only seeking a better understanding of introductory statistics.

The applets we have found are generally easy to use, but none have had a unifying theme; such as if they were keyed to a text. While this allows the applets to be used for a variety of needs, it would be advantageous to a course if these applets followed the textbook. Through our project, this will occur.
Educational Aspects

What aspects make a better computer based learning environment?

Rather than listing hints and tips for making better programs, this section will state an idea and then explain how it may affect the design of our labs.

The first important detail is, give explicit directions.[3] There should be no misunderstandings about interface use. An immediate instance of where there could misunderstandings is in Lab 7.2. This lab requires the students to transform a data set to fit a linear regression model. In this applet, we could include a textfield that will require the student to input a function for transformation. Specific syntax is required in order for the applet to execute properly. If the student is unfamiliar with the syntax, it is highly unlikely that an appropriate function will be invoked. We could give very explicit directions. Though, to avoid this problem, we could have an alternate transformation method that doesn’t require specific syntax. Perhaps using a slide bar, the power (exponent) of the transformation function will be adjusted. This would make lab instruction simpler to follow.

Having the students produce their own data gives better insight to the significance of the results. This can be a simple task when working with statistics labs. An example can be found on the Rice University web page. To compute a significance test, the user had to click on a square and then on the “again” button. Each square alternated in size from small to large. This was repeated about forty times and the time between successive clicks was recorded. This data was used to produce a significance test, comparing the times. This concept may also be extended into other applets as in the current Lab 1.1.

Keeping the student’s interest, without being overwhelming, is important. Interest is easily lost when too little is required of the student. For example, if there is only one possible command, the student might not think about what they are doing. Introducing several buttons for different commands can prevent this. With only one choice, the students may lose valuable intuition.[4] Likewise, it is important to not have too many choices. Too many options confront the student with the task of learning how to operate an interface.

The next aspect of the labs that we researched was, how to compose a purposeful introduction and finish. It is important to have an introduction that is stimulating and a finish that provides closure to the lab.[5] In other words, there should be a title page and a closing page. In the title page, there will be a brief summary of the material to be covered and the objectives of the lab. One suggestion was to include a statistics joke to catch the students’ interest. More effective yet would be to include an opening and closing activity. The final page will contain a recap of the important topics covered in the lab.

It is important to remember when creating courseware that the student must learn for himself[6]. This is closely related to the previous topic about endings. Being able to do the lab doesn’t mean the student fully understands the material. The student must reflect on what they did. To conclude a lab, it is important to include self-test questions that assess the student’s qualitative understanding. These questions will be more productive if directed towards what is wanted in a lab report.

Finally, presentation order is very important[7]. Good organization will help the student make connections about how the analysis’ outputs are interrelated. For example, in lab 8.1(Multicolinearity), it is important to group VIF and tolerance together. Since VIF if equal to 1/tolerance, grouping them would reinforce this relationship.

Why our approach is better.

Currently at WPI, the computer components of the labs are written in the SAS programming language. SAS macros take the burden of programming off the student. SAS/EIS (Executive Information System) serves as an x-window interface between the student and the SAS macro. There are several difficulties associated with using SAS based macros. First, the student must know how to activate the macro. The next difficulty is viewing the output. Unfortunately, these difficulties impede the students’ abilities to learn.

We believe that the proposed web-based interface will be easier to use. With a simpler interface, energy used to learn the interface will be conserved and redirected toward understanding the material[8]. In our applets, the input and output will share a window, the Internet browser. This simpler interface will allow the student to concentrate more on the material presented.

As we all know, people operate better in their own personalized environment. This holds true for computer work also. Through a more personalized learning environment, students can become more comfortable and learn more easily[9]. The present SAS based labs can only be run on campus, which reduces the options available to students. With JAVA the labs will be operable on any computer running Internet Explorer or Netscape web browsers. With this feature, students can work virtually anywhere they choose. This added comfort will surely provide better results in the students’ knowledge.

Timeline and Methodology:

The first step in writing a lab is to produce a storyboard. Using this method, we can assure that the lab will have a good flow throughout instruction. It is nearly impossible to write a program with any direction if the design is not considered in advance.

Feb 19 – March 19 Lab 7.1 How to fit a regression line

There are two base components to begin creation of a lab that demonstrates the difficulties of fitting a regression line by eye. First, is an interactive Java script that allows the user to set/adjust the linear regression line to the data set. Second, the applet will calculate values for the intercept, slope, MSE, SSE and R^2. And, additional plots of the residuals will be produced.

Another feature that may be incorporated in this lab is, a rotating regression line. The Applet will automatically identify and fix a line through the point (x-mean, y-mean). As the student rotates the line, they will observe the SSE, intercept and slope.

March 19 – March 29 Labs 2.1 and 8.2 Outliers; viewing effects

These two labs both require linked graphical representations of a changeable data set. Again there are two necessary elements to creating our versions: First, designing a class that creates several graphs from a single data set; Second, is creating the interface that allows a click and drag function for changing points in the data set. We will investigate the possibility of 3-D modeling with these activities.

March 29 –April 15 Lab 7.2 The effect of Transformations on Data Sets

This lab will require a method to randomly perform an ‘anti-transformation’ on the data being presented to the user. Two ideas are: To allow the students to input functions to transform the data; or include slider bars that allow the graph to be transformed, along either axis, by any power from –5 to 5. This will include all the possible transformations covered in the section the lab is for. The output will be a plot for this new transformed data set, along with the values of the intercept, slope, MSE, SSE and R^2. In addition, plots of the new residuals will be displayed.

Lab 8.1 Multicollinearity (time permitting)

This lab will introduce 3 dimensional graphical representations. We will need to produce a class that will plot two regressors on the x-y plane with a response variable in the z direction. The input will be a value from 0-1 representing the correlation coefficient between the two regressors. A new set of points will be produced and plotted in 3 dimensions. Additionally, there will be an output for the VIF, tolerance, overall f-statistic, and p-values.

Newly designed regression effect lab (time permitting)

As of now, there isn’t a lab that helps the students understand regression effect. Pending on time, we hope to solve this problem. Here is the idea: We would first need to produce a standardized plot. Then we could show the regression line of this plot. The slope of this regression line corresponds to the value for the correlation for the data set. We are not yet sure of all the details of this lab. There will be some sort of interface that helps the students determine the slope of the regression line and

April 1 – April 22 Web Pages

This part consists of compiling all the lab activities and their written explanations into a comprehensive web site that can be used in the classroom setting. This will be mostly HTML scripting, with the JAVA scripts, introductions, instructions, and closing pages for each lab.

April 8 – May 1 Final Report

Here we will discuss each lab, individually explaining how the new JAVA based format has advantages over the old SAS version.
Advantages (JAVA vs. SAS):

2.1

The real-time ability to drag points and see the relative movement of the plot is more visually stimulating compared to the SAS redraw method.

7.1

The real-time movement of the regression line will allow the students see what they are doing, rather than attempting to numerically assign values in SAS.

7.2

JAVA will eliminate the user interface difficulties associated with SAS.

8.1

With the three dimensional representation available through JAVA the students will be able to visually understand the relationships involved with multiple regression.

8.2

The real-time ability to drag points and see the relative movement of the plot is more visually stimulating compared to the SAS redraw method.

Bibliography:

Books:

Atkinson, Richard C. and H. A. Wilson. Computer-Assisted Instruction. New York, Academic Press, 1969.

Fairweather, Peter G. and Andrew S. Gibbons. Computer-Based Instruction: Design and Development. Englewood Cliffs: Educational Technology Publications, 1998.

Futrell, Mynga K. and Paul G. Geisert. Teachers, Computers, and Curriculum: Microcomputers in the Classroom. Boston: Allyn and Bacon, 1995.

Harasim, Linda. Online Education: Perspectives on a New Environment. Westport: Praeger Publishers, 1990.

Landa, Ruth K. Creating Courseware. New York: Harper & Row Publishers, 1984.

Nievergelt, Jay. Interactive Computer Programs for Education: Philosophy, Techiques, and Examples. Reading: Addison-Wesley Publishing Company, 1986.

Richey, Rita and Barbara Seeles. Instructional Technology: The Definition and Domains of the Field. Washington DC: Association for Educational Communications and Technology, 1994.

Tomek, I. ComputerAssisted Learning. New York: Springer-Verlag, 1992.

Videos:

Teaching & Learning in the Computer Age. Satellite Videoconference, March 28, 1997.

Web sites:

1

[1]

[2]

[3] Creating Courseware, pg 57

[4] Computer Based Instruction, pg 321

[5] Creating Courseware, pg 75, 86, 95

[6] Computer Assisted Instruction, pg 88

[7]Interactive Computer Programs for Education, pg 67

[8] Online Education, pg 91

[9] Online Education, pg 26

*This is not entirely accurate. SAS is operable, though a large amount of features can’t be accessed.