1、DATA ANALYSIS FOR CHEMISTRYAn Introductory Guide for Studentsand Laboratory Scientists.D.Brynn HibbertJ.Justin Gooding2006Oxford University Press,Inc.,publishes works that furtherOxford Universitys objective of excellencein research,scholarship,and education.Oxford New YorkAucklandCape TownDar es Sa
2、laamHong KongKarachiKuala LumpurMadridMelbourneMexico CityNairobiNew DelhiShanghaiTaipeiTorontoWith offices inArgentinaAustriaBrazilChileCzech RepublicFranceGreeceGuatemalaHungaryItalyJapanPolandPortugalSingaporeSouth Korea Switzerland Thailand Turkey Ukraine VietnamCopyright?2006 by Oxford Universi
3、ty Press,Inc.Published by Oxford University Press,Inc.198 Madison Avenue,New York,New York 10016Oxford is a registered trademark of Oxford University PressAll rights reserved.No part of this publication may be reproduced,strored in a retrieval system,or transmitted,in any form or by any means,electr
4、onic,mechanical,photocopying,recording,or otherwise,without the prior permission of Oxford University Press.Library of Congress Cataloging-in-Publication DataHibbert,D.B.(D.Brynn),1951Data analysis for chemistry:an introductory guide for students and laboratory scientists/D.Brynn Hibbert and J.Justi
5、n Gooding.p.cm.ISBN-13:978-0-19-516210-3;978-0-19-516211-0(pbk.);0-19-516210-2;0-19-516211-0(pbk.)1.ChemistryStatistical Methods.2.Analysis of variance.I.Gooding,J.JustinII.Title.QD39.3.S7H53 20055400.72dc2220040311249 8 7 6 5 4 3 2 1Printed in the United States of Americaon acid-free paperPrefaceTh
6、e motivation for writing this book came from a number of sources.Clearly,one was the undergraduate students to whom we teachanalytical chemistry,and who continually struggle with data analysis.Like scientists across the globe we stress to our students theimportance of including uncertainties with an
7、y measurement result,but for at least one of us(JJG)we stressed this point without clearlyarticulating how.Conversations with many other teachers of sciencesuggested JJG was not the exception but more likely the rule.Themajority of lecturers understood the importance of data analysis butnot always h
8、ow best to teach it.In our school,like many others itseems,the local measurement guru has a good grasp of the subject,but the rest who teach other aspects of chemistry,and really only usedata analysis as a tool in the laboratory class,understand it poorly incomparison.This is something we felt neede
9、d to be rectified,a secondmotivation.In conversation between the pair of us we came to the conclusionthat the problem was partly one of language.In writing this book wealso came to the conclusion that another aspect of the problem wasthe uncertainty that arises from any discipline which is still evo
10、lving.Chemical data analysis,with aspects of metrology in chemistry andchemometrics,is certainly an evolving discipline where new and betterways of doing things are being developed.So this book tries to makedata analysis simple,a sort of idiots guide,by(1)demystifying thelanguage and(2)wherever poss
11、ible giving unambiguous ways of doingthings(recipes).To do this we took one expert(DBH)and one idiot(JJG)and whenever DBH stated what should be done JJG badgeredhim with questions such as,What do you mean by that?,Howexactly does one do that?,Cant you be more definite?,What isa rule of thumb we can
12、give the reader?The end result is the com-promise between one who wants essentially recipes on how to performdifferent aspects of data analysis and one who feels the need to give,at the very least,some basic information on the background principlesbehind the recipes to be performed.In the end we bot
13、h agree that fordata analysis to be performed properly,like any science,it cannotbe treated as a black box but for the novice to understand how toperform a specific test how to perform it must be unambiguous.So who should use this book?Anybody who thinks they dont reallyunderstand data analysis and
14、how to apply it in chemistry.If youreally do understand data analysis,then you may find the explana-tions in the book too simple and the scope too limited.We see thisas very much an entry level book which is targeted at learning andteaching undergraduate data analysis.We have tried to make it easyfo
15、r the reader to find the information they are seeking to perform thedata analysis they think they need.To do this we have put the glossaryat the beginning of the book with directions to where in the booka certain concept is located.We also add in this initial Readers Guidefrequently asked questions(
16、FAQs)with brief answers and directionsto where more detailed answers are located,and a list of usefulMicrosoft Excel functions.Hopefully together these three sectionswill help you find out how to do things like when your lecturer tellsyou to measure a calibration curve and then determine theuncertai
17、nty in your measurement of your unknown.If after lookingthrough this book,and then sitting down to work through the exam-ples,you still are saying How?then we havent quite achieved ourobjective.viiiPrefaceContentsReaders Guide:Definitions,Questions,and Useful Functions:Where to Find Things and What
18、to Do11.Introduction211.1.What This Chapter Should Teach You211.2.Measurement211.3.Why Measure?211.4.Definitions221.5.Calibration and Traceability231.6.So Why Do We Need to Do Data Analysis at All?231.7.Three Types of Error241.8.Accuracy and Precision311.9.Significant Figures351.10.Fit for Purpose37
19、2.Describing Data:Means and Confidence Intervals392.1.What This Chapter Should Teach You392.2.The Analytical Result392.3.Population and Sample402.4.Mean,Variance,and Standard Deviation412.5.So How Do I Quote My Uncertainty?492.6.Robust Estimators612.7.Repeatability and Reproducibility of Measurement
20、s643.Hypothesis Testing673.1.What This Chapter Should Teach You673.2.Why Perform Hypothesis Tests?673.3.Levels of Confidence and Significance683.4.How to Test If Your Data Are Normally Distributed723.5.Test for an Outlier773.6.Determining Significant Systematic Error823.7.Testing Variances:Are Two V
21、ariances Equivalent?873.8.Testing Two Means(Means t-Test)903.9.Paired t-Test943.10.Hypothesis Testing in Excel974.Analysis of Variance994.1.What This Chapter Should Teach You994.2.What Is Analysis of Variance(ANOVA)?994.3.Jargon1014.4.One-Way ANOVA1014.5.Least Significant Difference1054.6.ANOVA in E
22、xcel1064.7.Sampling1124.8.Multiway ANOVA1154.9.Two-Way ANOVA in Excel1164.10.Calculations of Multiway ANOVA1254.11.Variances in Multiway ANOVA1255.Calibration1275.1.What This Chapter Should Teach You1275.2.Introduction1275.3.Linear Calibration Models1295.4.Calibration in Excel1475.5.r2:A Much Abused
23、 Statistic1535.6.The Well-Tempered Calibration1545.7.Standard Addition1555.8.Limits of Detection and Determination160Appendix165Bibliography169Index173xiiContentsReaders Guide:Definitions,Questions,andUseful FunctionsWhere to Find Things and What to Do.This chapter is called Readers Guide because ch
24、apter 1 is clearly theproper start of the book,with introductions and discussions of whatmeasurement really is and so on.This chapter was compiled last,andattempts to be the first stop for a reader who does not want theedifying discourse on measurement,but is desperate to find out howto do a t-test.
25、In the glossary,we define terms and concepts used in thebook with a section reference to where the particular term or conceptis explained in detail.If you half know what you are after,perhaps thememory jog from seeing the definition may suffice,but sometimereturn to the text and reacquaint yourself
26、with the theory.There follows frequently asked questions that represent justthatquestions we are often asked by our students(and colleagues).The order roughly follows that of the book,but you may have to dosome scanning before the particular question that is yours springs outof the page.Finally we h
27、ave lodged a number of Excel spreadsheet functionsthat are most useful to a chemist faced with data to subdue.The listhas brought together those functions that are not obviously dealt withelsewhere,and does not claim to be complete.But have a look thereif you cannot find a function elsewhere.1Glossa
28、ryThe definitions given below are not always the official statistical ormetrological definition.They are given in the context of chemicalanalysis,and are the authors best attempt at understandabledescriptions of the terms.aThe fraction of a distribution outside a chosen value.(Section2.5.2)AccuracyF
29、ormerly:the closeness of a measurement result tothe true value;now:the quality of the result in terms of truenessand precision in relation to the requirements of its use.(Section 1.8;figure 1.6)Analytical sensitivityThe linear coefficient representing the slope ofthe relationship between the instrum
30、ent response and the concentra-tion of standards.In other words,the slope of the calibration plot.(Section 5.3)ANOVA(analysis of variance)A statistical method for comparingmeans of data under the influence of one or more factors.Thevariance of the data may be apportioned among the different factors.
31、(Chapter 4)Arithmetic mean?x xThe average of the data.The result of summingthe data and dividing by the number of data(n).(Section 2.4.1)BiasA systematic error in a measurement system.(Section 1.7)CalibrationThe process of establishing the relation betweenthe response of an instrument and the value
32、of the measurand.(Section 5.2)Calibration curveA graph of the calibration.(Section 5.2)Central limit theoremThe distributions of the means of n data willapproach the normal distribution as n increases,whatever the initialdistributions of the data.(Section 2.4.6)Certified reference material(CRM)A sta
33、ndard with a quantity valueestablished to a high metrological degree,accompanied by a certificatedetailing the establishment of the value and its traceability.Used forcalibration to ensure traceability,and for estimating systematiceffects.(Section 3.3)Confidence intervalA range of values about a sam
34、ple mean which isbelieved to contain the population mean with a stated probability,such as 95%or 99%.The 95%confidence interval about the mean?x xof n samples with standard deviation s is:?x x?t0:0500,n?1s=ffiffiffinp:t0:0500,n?12Readers Guide:Definitions,Questions,and Useful Functionsis the 95%,two
35、-tailed Student t-value for n?1 degrees of freedom.(Section 2.5.1)Confidence limitThe extreme values defining a confidence interval.(Section 2.5.1)Correction for the meanSubtraction of the grand mean from eachmeasurement result in ANOVA.This quantity is also known as themean corrected value.(Section
36、 4.4)Corrected sum of squaresSee total sum of squares.(Section 4.4)Cross-classified systemIn a multiway ANOVA when the measure-ments are made at every combination of each factor.(Section 4.8)Degrees of freedomThe number of data minus the number of param-eters calculated from them.The degrees of free
37、dom for a samplestandard deviation of n data is n?1.For a calibration in which anintercept and slope are calculated,dfn?2.(Sections 2.4.5,5.3.1)Dependent variableThe instrument response which depends on thevalue of the independent variable(the concentration of the analyte).(Section 5.2)Detection lim
38、itSee limit of detection.(Section 5.8)Effect of a factorHow much the measurand changes as a factor isvaried.(Section 4.3)ErrorThe result of a measurement minus the true value of themeasurand.(Section 1.7)FactorIn ANOVA a quantity that is being investigated.(Sections4.2;4.3)Fisher F-testA statistical
39、 significance test which decides whetherthere is a significant difference between two variances(and thereforetwo sample standard deviations).This test is used in ANOVA.Fortwo standard deviations s1and s2,F s21=s22where s14s2.(Sections3.7,4.4)Fit for purposeThe principle that recognizes that a measur
40、ementresult should have sufficient accuracy and precision for the user of theresult to make appropriate decisions.(Section 1.10)GrandmeanThemeanofallthedata(usedinANOVA).(Section4.2)Gross errorA result that is so removed from the true value that itcannot be accounted for in terms of measurement unce
41、rtainty andknown systematic errors.In other words,a blunder.(Section 1.7)Grubbss testA statistical test to determine whether a datum is anoutlier.The G value for a suspected outlier can be calculated usingG jxsuspect?x xj=s.If G is greater than the critical G value for astated probability(G0.0500,n)
42、the null hypothesis,that the datum is notReaders Guide:Definitions,Questions,and Useful Functions3an outlier and belongs to the same population as the other data,isrejected at that probability.(Section 3.5)Heteroscedastic dataThe variance of data in a calibration is notindependent of their magnitude
43、.Usually this is seen as an increase invariance with increasing concentration(e.g.,when the relativestandard deviation is constant for a calibration).(Section 5.3.1)Homoscedastic dataThe variance of data in a calibration isindependent of their magnitude(i.e.,the standard deviation isconstant).(Secti
44、on 5.3.1)Hypothesis testWhere a question about data is decided upon basedon the probability of the data given a stated hypothesis.(Section 3.1)Independent measurementsMeasurements made on a number ofindividually prepared samples.(Section 2.7)Independent variableA quantity that is under the control o
45、f theanalyst.In calibration,it is the quantity varied to ascertain therelationship between this quantity and the instrumental response.Typically in a calibration model the independent variable isconcentration.(Section 5.2)Indication of a measuring instrumentThe instrumental response oroutput.(Sectio
46、n 5.3)Indication of the blankThe instrumental response to a test solutioncontaining everything except the analyte.If this is not possible tomeasure,it may taken as the intercept of the calibration curve.(Section 5.3)Influence factor(quantity)Something that may affect a measurementresult.For example,
47、temperature,pressure,solvent,analyst.Incalibration,influence quantities refer to quantities that are not theindependent variable but that may affect the measurement.(Sections4.2,4.3,5.3)Instance of factorParticular example of a factor in an ANOVA.For example,in an experiment performed at 20,30,and 4
48、0?C,the three temperatures are instances of the factor temperature.(Section 4.2)InteractionIn a multiway ANOVA an effect of one factor on theeffect of another factor on the response.For example if a reaction rateis increased more by an increase in temperature at short reaction timesthan longer react
49、ion times,then there is said to be a temperature bytime interaction.(Section 4.8)InterceptThe constant term in a calibration model.See indication ofblank.(Section 5.3)4Readers Guide:Definitions,Questions,and Useful FunctionsInterquartile rangeThe middle 50%of a set of data arranged inascending order
50、.The normalized interquartile range serves as a robustestimator of the standard deviation.(Section 2.6.2)Intralaboratory standard deviationThe standard deviation of meas-urement results obtained within the same laboratory but not underrepeatability conditions,for example by different analysts usingd