The Applied Statistics Handbook was developed to serve as a quick reference for undergraduate and graduate liberal arts students taking research methods. This is an introductory textbook for a first course in applied statistics and probability for un- dergraduate students in engineering and the physical or chemical. PDF | The Applied Statistics Desktop Reference Guide provides an overview of hypothesis testing and the interpretation of output generated by.
|Language:||English, Spanish, Japanese|
|ePub File Size:||28.57 MB|
|PDF File Size:||10.36 MB|
|Distribution:||Free* [*Sign up for free]|
Chapter 2. Statistics of Location. Statistics of location describe the position (e.g. mean). Statistics of dispersion describe the variability (e.g. standard deviation). Applied Statistics. James Reilly scretch.info - 5 -. 1. Observing Data. Having completed this chapter you will be able to: – interpret graphs;. Applied Statistics is a compendium, an elementary introduction to the growing field of statistics. Probability Density Function in the continuous case, or pdf.
International Journal of Data Science and Analysis. International Journal on Data Science and Technology. Mathematics and Computer Science. International Journal of Statistical Distributions and Applications. Mathematical Modelling and Applications. Biomedical Statistics and Informatics.
Engineering Mathematics. International Journal of Discrete Mathematics. American Journal of Mathematical and Computer Modelling. International Journal of Systems Engineering. American Journal of Electrical and Computer Engineering. American Journal of Information Science and Technology. Industrial Engineering. ISSN Print: ISSN Online: Latest Articles.
Mathare Slums, Nairobi County in Kenya. PDF KB. Adimias Wendimagegn. Noora Shrestha. In both examples, it would be possible to analyse the intermediate response variable as a response, assessing, for example, the effect of treatments on it. If, however, we use the intermediate response as an explanatory variable we are addressing the more subtle question as to the extent to which the effect of treatment on the main response variable is accounted for by the action of the intermediate variable.
A more elaborate example concerns growth curves, say in an animal feeding experiment. Here possible response variables are: i live weight after some fixed time, measured directly; ii an estimate of asymptotic weight; iii some estimate of rate of approach to that asymptote; iv some measure of the efficiency of conversion of food into body weight.
The last 10 Applied statistics [1. Where necessary such new variables will be called derived variables. Examples C, E and M all illustrate simple instances of the formation of a derived variable, in the first and last by combination of a set of frequencies into a single summarizing quantity.
While the same technique of analysis, e. It is helpful to distinguish between the following: i Experiments, in which the system under study is set up and controlled by the investigator. Typically, one of a number of alternative treatments is applied to each individual, or experimental unit, and responses measured. If the allocation of treatments to experimental units is organized by the investigator, and especially if an element of objective randomization is involved, it will be possible to conclude that any clear-cut difference in response between two treatments is a consequence of the treatments.
While it may be possible to detect from such data clear effects, such as differences between different groups of individuals, interpretation of such differences will nearly always call for much caution.
Explanatory variables that would provide the 'real' explanation of the differences may not have been measured, and may even be unknown to the investigator. Conclusions can be drawn with confidence about the descriptive properties of the population in question, but the interpretation of, for example, relationships between variables raises problems similar to ii. Control of data quality may be stronger than in a pure observational study. To the extent that all important explanatory variables can be measured, and of course this is never totally possible, these studies have some of the virtues of an experiment.
Experiments are strongly interventionist, the investigator having in principle total control over the system under study, and lead to the clearest interpretation. While very often experiments can be arranged tq lead to simple 'balanced' sets of data, this is not the crucial point.
In principle virtually any method of statistical analysis might be relevant for any style of investigation; it is the interpretation that differs. An outline example will clarify the distinction between an experimeilt and a pure observational study. Consider the comparison of two alternative medical treatments A and B. In an experiment each eligible patient is assigned a treatment by an objective randomization procedure, each patient having equal chance of receiving each treatment.
Suppose that subsequent patient care is closely standardized, identically for the two treatments, and that a clearly defined response variable is measured for each patient, e. Suppose that, as judged by an appropriate statistical procedure, the two groups of response variables differ by more than can reasonably be ascribed to chance.
We can then conclude that, provided the experiment has been correctly administered and reported, the difference between the groups is a consequence of the difference between A and B; for the two groups of patients differ only by the accidents of random assignment and in virtue of the difference between A and B.
Contrast this with a pure observational study in which, from hospital records, information is assembled on the same response variable for two groups of patients, one group having received treatment A and the other treatment B.
Note first that the structure of the data might be identical for the experiment and for the observational study. Suppose that again there is a clear difference between the two groups. What can we conclude? The statistical analysis shows that the difference is unlikely to be a pure chance one. There are, however, initially many possible explanations of the difference in addition to a possible treatment effect.
Thus the groups may differ substantially in age distribution, sex, severity of initial symptoms, etc.
Specific explanations of this kind can be examined by suitable statistical analysis, although there always remains the possibility that some unmeasured explanatory variable differs very substantially between the two groups.
Further, we rarely know why each patient was assigned to his or her treatment group: the possibility that conscious or unconscious assessment of the patient's prognosis influenced treatment choice can rarely be excluded.
Thus the interpretation of the difference is more hazardous in the observational study than in the experiment. First there are those which represent, or which could conceivably have represented, treatments. Then there are those that give intrinsic properties of the individuals; we call the latter type intrinsic variables.
For example, suppose that in a medical investigation the explanatory variables are dose of a drug, patient sex and patient initial body weight, response being some measure of success of treatment.
Now analysis would usually be directed towards relating response to dose. The role of body weight might be to indicate an appropriate scale for dose, e. Similarly, it may be necessary to estimate different response-dose relations for men and for women.
It would usually be meaningful to think of dose as causing response, because it is possible to contemplate an individual receiving a different dose from the one he or she in fact did receive.
But it. This is because it is not usually meaningful to contemplate what response would have been observed on an individual had that individual been a woman rather than a man. The point is related to the physicists' well-known dictum that passage of time cannot be regarded as a cause of change. A rather different division of investigations can be made on the basis o their broad purpose.
In one sense, of course, it is trite to remark that the purpose of the investigation is to be borne in mind, particularly in determining the primary aspects of the model.
Indeed, in some applications the objectives may be very specific and of such a kind that the quantitative techniques of decision analysis may be applicable. Nevertheless, it is useful to. The terms 'scientific' and 'technological' might be used. We shall prefer 'explanatory' and 'pragmatic', partly to avoid the misunderstanding.
Thus an investigation in nuclear physics to calibrate a technique or to compare alternative experimental procedures might b1! The distinction has bearing on the kinds of conclusion to be sought and on the presentation of the conclusions. For example, a pragmatic application of multiple regression might aim to predict one or more response variables from suitable explanatory variables.
Then if there were a number of alternative predicting equations giving about equally good results, the choice between them could be made on grounds of convenience or even essentially arbitrarily. It is then dangerous to choose essentially arbitrarily one among a number of different but equally well-fitting relations. The question of balance between explanatory and pragmatic approaches, i.
Even in the much narrower context of multiple regression as outlined in the previous paragraph, the distinction between the two approaches is important but not to be taken too rigidly. There is some hope that a prediction equation based on an understanding of the system under study may continue to perform well if the system changes somewhat in the future; any predicljon technique chosen on totally empirical grounds is at risk if, say, the interrelationships between the explanatory variables change.
Questions of the specific purpose of the investigation have always to be considered and may indicate that the analysis should be sharply focused on a particular aspect of the system under study, e. Chapter 2 Some general concepts Types of observation We now discuss briefly some of the types of observation that can be made, by far the most important distinction, however, being that made in Section 1.
The first distinction depends on the physical character of the measurements and is between extensive and nonextensive variables. An extensive variable is one which is physically additive in a useful sense: yield of product, count of organisms and length of interval between successive occurrences of some repetitive event are all examples.
In all these, regardless of distributional shape, the mean value has a physical interpretation in terms of, for example, the total yield of product from a large number of runs; see Example M connected with the yield of cauliflowers. Thus for extensive response variables, however the analysis is done, the mean value of the variable is among the quantities of interest.
Note especially that yield has this property, whereas log yield, or more generally any nonlinear function of yield, does not. An example of a nonextensive variable is blood pressure: the sum of the blood pressures of two individuals has no direc;t physical interpretation. The next distinctions depend rather more on the mathematical character of the variable, and in.
The main possibilities are: i an effectively continuous measurement on a reasonably well-defined scale, i. Analysis will normally be done in terms of the variable itself or some simple function of it; ii an effectively continuous measurement on a relatively ill-defined scale; for example, 'merit' may be scored subjectively on a scale 0 to , there being no guarantee that the difference, say 5 to 10, is meaningfully comparable with the difference 80 to 85; iii an integer-valued variable, usually in effect counting numbers of occurrences in some form; iv a discrete variable, often in effect integer-valued, scoring something on an ordered but relatively ill-defined scale, e.
This is broadly equivalent to ii 14 2. Sometimes the quantitative values, which are essentially conventional, are omitted. Examples Nand W illustrate this kind of variable; v a qualitative variable in which the possible values are not ordered, e. Possibilities i , iii and vi are the easiest to handle and probably the most widely occurring. In the social sciences, iv - vi are relatively con'lmon.
Any kind of measurement can be reduced to binary form by merging categories, although serious loss of information may be incurred by doing this injudiciously. Under descriptive statistics we include the tabulation of data for inspection and the use of graphical techniques.
The latter are particularly important, both in the preliminary inspectiop of data and in the final presentation of conclusions. Current developments in computer graphics may lead to improved ways of dealing with complex relations, especially in several dimensions. The distinction between descriptive and probabilistically based methods is not a rigid one.
Often a probabilistic argument will suggest the calculation of certain quantities which can then be plotted or summarized in a table and regarded as meaningful independently of the original argument which led to their calculation.
The method of least squares, which is central to a big part of advanced statistical methods, has various sophisticated probabilistic justifications. It can also very often be regarded as a qualitatively plausible method of fitting.
While sometimes this can be done informally, probabilistic arguments normally play a central role in measuring uncertainty, especially via the calculation of limits of error for unknown parameters.
Most of Part II illustrates methods which have a quite direct probabilistic justification, but it is always important to consider the extent to which the quantities calculated are directly useful as reasonable summaries of the data regardless of the probability model. The typical form which a probabilistically based analysis takes is as follows. We have observations on one or more response variables and we represent 16 Applied statistics [2.
Next we consider a family of probability distributions for the observations, i. We call this family of distributions a model. In simple cases the model involves the standard distributions normal, exponential, Poisson, binomial, etc.
In the general discussion of this part it aids clarity to distinguish between the observations y and the random variable Y.
Some particular models arise so often in applications that the methods associated with them have been extensively developed.
It is, however, very important that the formulation of a model and of scientifically relevant questions about that model are made properly, bearing in mind the unique features of each application; standard models and questions may not be appropriate.
It is useful to have simple examples in mind. Example 2. Further, depending on the context, the distribution may be taken as of simple functional form, e. An important special case is the simple linear regression model, 2. I is Example U in which 2.