Skills Workshop #4: Statistical & Uncertainty Analysis

Speaker: Dr. Lilit Yeghiazarian, Assistant Professor of Environmental Engineering, University of Cincinnati

Date: June 20, 2014

Time: 10:00 AM- 12:00 AM

Venue: Baldwin 749

Prepared by:

Victoria Sumner- Junior, Chemical Engineering, University of Cincinnati

Stephanie Palmer- Pre-Junior, Chemical Engineering, University of Cincinnati

Dorien Clark- Sophomore, Chemical Engineering, University of Cincinnati

REU Participants for Project 1: Interaction of Nanoparticles with Microbial Biofilm

in Water Treatment Facility Processes

Dr. Yeghiazarian presents on Statistical and Uncertainty Analysis to the students participating in the REU Program.

Dr. Lilit Yeghiazarian gave a presentation on the Statistical and Uncertainty Analysis to the students participating in the 2014 Summer REU Program.Dr. Yeghiazarian has obtained degrees in electrical, operations research and biological engineering. She has completed her doctoral research at Cornell University on stochastic modeling of waterborne pathogen transport and risk assessment in complex environmental systems. Later at UCLA, Dr. Yeghiazarian was involved in research in epidemiological, first as a Postdoctoral Fellow at the Department of Biostatistics and then as a Research Assistant at the Department of Epidemiology. Her research involved the development of multi-scale models of HIV transmission and computationally intensive evolutionary models of influenza based on large international datasets.

Dr. Yeghiazarian is currently an Assistant Professor of Environmental Engineering at the University of Cincinnati. Her research interests range from water quality modeling, environmental surveillance, and situational awareness for biosecurity to rapid decision-making and policy development. It also covers the uncertainty and multi-scale nature of complex environmental systems and processes.

The main purpose of the presentation was to inform and educate the REU participants on how to perform statistical analysis on the data from experiments. Collecting accurate and precise data is important for any research project or lab work, such as the REU projects. Knowing how to estimate errors and their sources can help in interpreting the research data more appropriatelyand to draw valid conclusions. Without a statistical analysis of the data, it remains limited and inconclusive. In her presentation Dr. Yeghiazarian outlined different types of errors, uncertainty in measurements, and statistical methods for curve fitting.

The first thing Dr. Yeghiazarian explained was the different types of errors. She grouped them in two broad categories: general errors and numerical errors. General errors occur due to three reasons: human errors, which are called blunders, formulation or model errors, which result due to use of incomplete mathematical models, and data uncertainty, which are limited to significant figures considered in recording physical measurements. Numerical errors occur due to two reasons: round-off errors, which are due to computer limitations in representing numbers, and truncation errors, which are due to mathematical approximations.

Dr. Yeghiazarian then went further into explaining round-off errors as it applies to computer systems. Such errors occur in digital computers due to their limited ability to represent numbers. All computers store code using binary digits called bits: in a computer numbers are stored as a word, and a word consists of binary digits. The limitation occurs due to the size of the data path of computer. Anything bigger or smaller than the limit would cause underflow and overflow. For example, computers recognize real numbers to be between -1.797693134862316x10308 and -2.225073858507201x10-308and 2.225073858507201x10-308and 1.797693134862316x10308.If numbers are used that go below the first range, they result in an underflow error, and if numbers are used above the second range, they result in an overflow error. Anything in between the two ranges will be computed as zero. To test a computer for its range, Dr. Yeghiazarian, suggested trying the following functions within MATLAB: format long and realmaxandrealmin.

An important point Dr. Yeghiazarian mentioned is that when it comes to precision, computers cannot represent certain numbers with significant digits because they have an infinite sequence, such as e, , . Round-off errors can also occur when not properly adding or subtracting and not paying attention to significant figures. Sometimes adding a small and large number together will contribute to this problem or even subtracting two numbers that are close to each other in value.

Dr. Yeghiazariannext talked about truncation errors. These errors occur when exact mathematical operations are represented by approximations. For example, in the Taylor Series, the more terms are considered, the more accurate the result will be. But, considering beyond a certain number of terms does not add to the significant number of decimal places considered for the result.

Dr. Yeghiazarian next described errors that result due to uncertainties in measurements, which are bound to occur. She described them as interchangeable to numerical errors. An error in scientific measurement means the inevitable uncertainty that accompanies all measurement, which must be taken into account at all times. These errors are thus not mistakes, for they cannot be eliminated by just being careful. Some rules for how to report and use uncertainties are: to always use the best estimate uncertainty (, where x represents uncertainty, which is error or margin of error. Note that x is always positive. Also, x cannot be known with too much precision and the last significant figure in any stated answer should usually be of the same order of magnitude as the uncertainty.

Dr. Yeghiazariannext spoke about the modeling of experimental data using regression analysis or curve fitting. She mentioned three different curve-fitting techniques:least-squares regression of data with scatter plots; linear interpolation regression for precise data; and curvilinear interpolation regression for precise data. Trend analysis is conducted to see which statistical regression technique best fits the experimental data, which compares existing mathematical models with measured data. Before discussing regression techniques, Dr. Yeghiazarianwent over basic terminology and descriptive statistics, which included: arithmetic mean, standard deviation, variance, coefficient of variance, histogram of data, and confidence intervals. For some students it was review, but for others, it was new information.

The linear least-squares regression is the simplest example of a least-squares approximation because it fits the data as a straight line, which makes a very easy mathematical equation. Dr. Yeghiazarianstated that one can assume, that each x has a fixed value and the y values are independent random variables with some variance and must be normally distributed. Minimizing the sum of squares of the residuals (which represents the vertical distance between a data point and the regression line) will optimally give the best fit line for a set of data. This can be done with equations, derivation of normal equations, as well as graphing due linearization transformation. Dr. Yeghiazarian also emphasized the importance of using computer programs such as MATLAB. She said that MATLAB is a useful tool which utilizes simple commands to perform a variety of tasks. She gave some examples showing how to use the built-in functions of MATLAB. She also mentioned that MATLAB offers a linear, polynomial and higher regression analysis features.

In addition to descriptive statistics on errors and linear regression, Dr. Yeghiazarian discussed comparing the means of two different data sets which can be done mathematically by using a t-test. For cases when more than two sets of data are to be compared, she described a tool called ANOVA, which stands for Analysis of Variance that can be used to compute the F ratio statistic for analysis similar to the t-test.

Dr. Yeghiazarian covered a lot of information on these topics that are usually covered within a semester long class. The REU participants will now be able to collect, analyze, and present datausing the guidelines and information that Dr. Yeghiazarian gave in this presentation. The REU students now have multiple ways to go about analyzing their data and can use this information when presenting their data for their REU project and any other future endeavors like, pursuing graduate school, working on co-op jobsas a student or working with an engineering company after graduation.