Part-III Treatment of Data
1
Part-III Treatment of Data 1 OVERVIEW (1) Units of measurement - - PowerPoint PPT Presentation
Part-III Treatment of Data 1 OVERVIEW (1) Units of measurement (a) must be indicated in tables/graphs. (b) use scientific notation Examples 2.2x10 -6 2.2 V V 7.1x10 -2 71 m m m 8.34x10 4 83.4 kJ J 2.1x10 5 0.21 MW W 7.3x10 -11
1
1
2
1 2
6
Table : Dependence of electrical resistance on temperature of a Copper wire (Kirkup : Experimental Methods, Wiley 1994).
8
11
Reduce to 4 SF Reduce to 3 SF Reduce to 2 SF
2 2
2
2
2 2 3
3
14
CAVEAT Volume
3 significant figures 8 significant figures 36.5
3
3
3 3 3
3
x y 9.3 19.0 7.3 15.0 9.8 20 1.8 4 5.3 11
P Q
S1 S2 S3 298 315 325
T (K) L (m) 273 1.1155 298 1.1164 323 1.1170 348 1.1172 373 1.1180 398 1.1190 423 1.1199 448 1.1210 473 1.1213 498 1.1223 523 1.1223
Time (s) ± 5 s Temperature (°C) ± 4 °C 10 125 70 116 125 104 190 94 260 87 320 76 370 72 Uncertainty in y- value Uncertainty in x- value Cooling of an object
NOTE 1: If uncertainties are not constant, the size of the error bars must also vary. NOTE 2: % error is not same for each data point.
Concentration (kg/m3) Relative density (-) 1.005 50 1.034 100 1.066 150 1.095 200 1.122 250 1.150
(1) Do not ignore outliers - investigate them further (2) Interpolation of results is justified when there is a sufficient number of data points (3) Extrapolation should always be avoided.
? ?
For two data points, one can only draw one line and this is also the best line : (x1,y1), (x2,y2) and C = value of y at x = 0 But when we have a large number of data points (xi ,yi) (i=20 say) which pair of data points should be used to calculate the values of m and C? BEST FIT What are the uncertainties in the best fit values of m and C?
intercept slope
2 1 2 1
y y m x x − = −
T (h) Size (mm)
0.5 1.5 ± 0.3 1 2.3 ± 0.3 1.5 3.3 ± 0.3 2 4.3 ± 0.3 2.5 5.4 ± 0.3
Prior knowledge about the expected form of
Original data might exhibit a non-linear relationship. Transform one of the variables in such a fashion that
M
Period of oscillation (T) ~ mass of the body (M)
M (kg) T (s)
0.02 0.7 0.05 1.11 0.10 1.6 0.20 2.25 0.30 2.76 0.40 3.18 0.50 3.58 0.60 3.97 0.70 4.16 0.80 4.60 Dependence is not linear
Let us recall our high school physics:
M T ∝ M = 2 π k
(k : spring constant)
∴ T ∝ M
2 T M k π =
We expect C = 0 Linear graphs great advantage
Constant acceleration (a) Initial Velocity @ t = 0 u = u s : distance travelled in time t a = 9.81 m/s2 u = 1 m/s
2
Relationship between “s” and “t” is quadratic
y C mx
m = a/2
s/t
C = u
t 5
For a radioactive material, N = N0 exp(-λt) N : undecayed nuclei @ time t N0: Initial value of N @ t = 0 λ : characteristic constant of material
ln N = ln N0 + (-λt) ln e ln e = 1 ln N = (-λ)t + ln N0 y = mx + C “Semi-log” plot
C = ln N0
ln N t
−λ
Now we must estimate uncertainity in lnN? Finally, let us come to log-log graphs: Motivation Sometimes no matter what we do, it is not possible to choose suitable scales for linear graphs.
Table : Current-Voltage relationship for a silicon diode
Voltage (V) I (Amperes) 0.35 9 x 10-7 0.40 3 x 10-6 0.45 5 x 10-5 0.50 2 x 10-4 0.55 1.7 x 10-3 0.60 1.5 x 10-2 0.65 7.5 x 10-2 0.70 0.55 0.75 3.5
Semi-Log Linear
V I
0.3 0.4 0.5 0.6 0.7 0.8 0.5 1 1.5 2 2.5 3 3.5 4
V I
0.3 0.4 0.5 0.6 0.7 0.8 10
10
10
10
10
10
10
10 10
1
When both variables entail several
Use double log coordinates
ln y = ln a + b ln x
ynew C m xnew
40
log-log scale
Uncertainty is an inevitable evil, both in experimental and numerical studies. Let us look at a simple test: Same object, constant value of S, same operator/equipment.
Time(s) 0.74 0.71 0.73 0.63 0.69 0.75 0.70 0.71 0.74 0.81
What do you make of these measurements?
s
water
(i) Single measurement:
activity, CERN, on the surface of the moon, etc.
(a) Resolution of instruments: What is the minimum value the instrument can measure? Length: 1 mm graduations 0.5 mm For better resolution, one can use a micrometer or Vernier callipers, but these also have their least count.
375 1.2mm 373.8 truevalue 376.2 mm ± → ≤ ≤
This is what is used for a single test, i.e., the uncertainty introduced by the instrument.
N = 0 T Corresponds to the value at a fixed point. N ≠ 0 Thermometer shows wild fluctuations.
T water
heating
So, if you make a single measurement, we can not evaluate the uncertainty arising from the heating process.
All instruments require benchmarking or calibration which can change over a period of time !!
The mean or average comes in handy-returning to our earlier example.
Time(s) 0.74 0.71 0.73 0.63 0.69 0.75 0.70 0.71 0.74 0.81
min
t = 0.63 s
max
On average, this is the result we can expect.
n min i i 1
1 0.74 0.71 0.73 .... t t n 10 t 0.721s
=
+ + + = = =
What is the uncertainty in the mean value?
max min
range (spread) x x = −
max min
For our example:
0.81 - 0.63 Uncertainty = = 0.018 10
Now we should round off, the mean value sensibly to xmean= 0.72 s P robable value of x (or t in our case) = 0.72 0.018s ± ∴
uncertainty 100 % uncertainty = mean value 0.018 = 100 2.5% 0.72 × × =
mean
Therefore, t = x = 0.72s with ±2.5% uncertainty
Aim of an experiment true value But this is impossible to do. On the other hand, we are trying to approximate the true value by an average or mean value, i.e., How many times we must repeat the measurements? Recall, Larger the value of n, closer will be to its true value. If Our measurements are accurate.
true
i
1 x = x n ∑
t r u e
x x ≈
For example, the charge of an electron is known to be i.e., an uncertainty of
What is precision?
Uncertainty is small range is small, but it does not mean that the results are accurate! How?
Let us look at an example:
Boiling point of water @ 1 standard atmosphere T (oC) 102.4 102.6 102.3 102.6 102.4 102.7 102.4 102.4 102.5 102.6
19
−
5
3 10 %
−
×
x 102.49 C 102.7 102.3 Uncertainty = 0.04 C 10 Boiling point of water = 102.49 0.04 C = = − = ∴ ±
This looks very impressive in terms of precision except that it is not very accurate.!
Let us use a different thermometer (+0.5 oC) T (oC) 101.0 101.0 100.5 100.5 99.0 99.5 99.0 100.5 101.0
100.2 C 101 99 uncertainty = = 0.2 C 10 Boiling point of water = 100.2 0.2 C = − ∴ ± Less precise, but more accurate experiments.
Accurate close to the true value Precise Low uncertainty, but not necessarily close to the true value Accurate & Precise close to the true value, with a small uncertainty
Difficult to detect and deal with. Offset uncertainty Melting point of ice
ice+water mixture Thermocouple
7.43, uncertainty = 0.08 C = −
Very precise but inaccurate measurements!
The true value is expected to be close to zero! There is a big offset error here. Check your calibration, electronic gadgets, warm up period, insensitive thermocouple, etc.
On the other hand, for a plasma furnace (~ 1500 oC), 7.5 oC is not a significant
Try to develop a feel for the answer you are looking for!
Gain uncertainty: This varies with the magnitude of quantity itself.
Example: Calibration masses and electronic balance
Standard mass (g) 0.00 20.00 40.00 60.00 80.00 100.00 Electronic balance value (g) 0.00 20.18 40.70 61.00 81.12 101.68
The difference between the two values increases as the mass increases.
So far we have talked about uncertainties when we are interested in the measurement directly. Engineering experiments we need to combine several measurements to calculate the quantity of interest Let us say you are given a cylindrical bar of an unknown metal and we want to calculate its density.
D L
mass volume ρ=
m D L
Uncertainty in the value of depends upon the uncertainties in the measured values of m, D, L ρ
2
m D L 4 ρ = π
D D D L L L m m m ≡ ± ∆ ≡ ± ∆ ≡ ± ∆ ρ ≡ ρ ± ∆ρ
∆ρ
D D D ≡ + ∆ D D D ≡ − ∆ L L L ≡ + ∆ L L L ≡ + ∆ L L L ≡ − ∆ L L L ≡ − ∆
m m m ≡ − ∆ m m m ≡ + ∆ m m m ≡ − ∆ m m m ≡ + ∆ m m m ≡ + ∆
m m m ≡ + ∆ m m m ≡ − ∆
This looks like a lot of hard work!!
We can be a little smarter than this:
2
Multiply this equation by 100 on both sides
% uncertainty in = % uncertainty in m + 2 % uncertainty in D + % uncertainty in L ρ ×
NOTE:
(attributed to Ben Desraeli & Mark Twain).
62
x x
xi (s) 0.74
0.000441 0.74
0.000441 0.69 0.029 0.000841 0.68 0.039 0.001521 0.80
0.006561 0.71 0.009 0.000081 0.78
0.003721 0.65 0.069 0.004761 0.67 0.049 0.002401 0.73
0.000121 = 0.719 ∑di = 0 ∑di
2 = 0.02089
i i
d = x-x (s)
( )
=
2 2 i i
d x-x
2
(s )
θ
0.74, 0.74, 0.69, 0.68, 0.80, 0.71, 0.78, 0.65, 0.67, 0.73
i
1 x = x = 0.719 10 x
2 2
2 2 2 i i
Another related parameter Standard deviation
2 2
i
Let us say :- We have repeat data sets.
Series I Series II
VI 43
II III IV V VI VII VIII 51 51.7 50.4 51.5 51.7 50.4 52.5 49.5
σ
3.13 3.29 3.07 3.11 3.20 2.94 2.73 3.20
x
Therefore, the best estimate of x is 51.1 ± 0.893 This, however, only eliminates the role of random uncertainty & NOT
1 2
i x
x
If there are sufficient number of data points without any systematic uncertainties : This is more or less the universal curve which is encountered literally in every application relying on numerous data points. This is called Normal distribution or Bell-shaped curve.
Frequency or distribution
x
Two metrics are needed to describe this population?
Line of symmetry
x-σ x +σ x
≤ ≤ x-σ x x +σ :
Area under the curve between these limits α number of data points lying in this range.
± x σ : ± x 2σ : x ± 6 σ :
∼ 70% of the total area ∼ 95% of the total area ∼ 99.9999% of the total area
On one hand, we wish to have data which are reliable, reproducible and with as small uncertainty as possible, one can not go on making ∞ repeat measurements. Let us say that a population of measurements with mean µ and σpop: We would like our “sample” (small sub-set of population) such that sample mean ≈ µ and
1 2 2 i i p o p
x n x n µ µ σ ∑ = ∑ − =
Evidently, µ = true value
1 2 2
1
i p o p
x x s n σ ∑ − = = −
x-σ x x +σ
If 70% of the data lie within ±σ, so we can say that there is a 70% probability to predict the expected outcome within ±σ. If 95% of the data lie within ±2σ, so we can say that there is a 95% probability that we can predict the expected outcome within ̅x̅ ±2σ, etc.
x
Some will argue “all data are equal”. It is not correct to through away any data point. Other extreme is that “one data set looks like spurious or suspect” and therefore is less reliable than the other sets. There are statistical tests to deal with this issue. Therefore, the automatic filtering by a computer program or another device should be assessed properly. The question is:“truly spurious” vs.“new phenomenon”? Therefore, meticulous recording of data, observations, unusual features, frequent voltage fluctuations, exceptional temperature, etc. all must be documented in detail in lab notebooks.
72
One data point strikingly disagrees with all the others.
Fall time (seconds) of an object in a liquid : 3.8 , 3.5 , 3.9 , 3.9 , 3.4 , 1.8
Recall that individual data can differ within a band from each other. However, legitimate discrepancy of this size is highly improbable.
Data rejection Controversial Important
very different from all others !!
73
3.8 3.5 3.9 3.9 3.4 1.8 3.38 6 t + + + + + = ≃
3.4 t s = 0.8s σ =
Our suspect measurement of 1.8s deviates ,
3.4 1.8 1.6s − =
i.e., by 2σ Assuming Normal or Gaussian distribution, we can calculate the probability of a measurement lying outside ±2σ : ؞ Probability (outside 2σ) = 1 – probability (within 2σ)
74
95.45 %
2σ 3.4
؞ Probability (outside 2σ) = 1 – 0.9545 = 0.0455 < 0.05 Thus, there is only 5% chance of a measurement lying outside ± 2σ , i.e., 1 in 20 measurements could be beyond ± 2σ . Out of 6 measurements, only 6 x0.05 = 0.3 is likely to be beyond ± 2σ . Chauvenet’s criterion : If this number < 0.5 , this data can be rejected.
75
1 2
For N measurements : , ,.....,
N
x x x : value in doubt
sus
x
sus sus
x x t σ − =
# of standard deviations Find the probability of (outside tsusσ).
If n < 0.5 reject the data point in question and re-calculate , , etc. x σ
76
EXAMPLE A student makes 10 measurements of mass (g) as follows : 46 , 48 , 44 , 38 , 45 , 47 , 58 , 44 , 45 , 43. Our suspect is : 58
45.8 , 5.1 x σ = =
؞ 5845.82.45.1
sus
t − = =
i.e., our suspect deviates by 2.4σ. Prob (outside ± 2.4σ) = 1 – Prob (inside ± 2.4σ) = 1 – 0.9836 = 0.016 ؞ In a set of 10 measurements , 10 x 0.016 = 0.16 of data can be
77
Since 0.16 < 5 , we can safely reject this data point. New results are :
44.4, 2.9 x σ = =
Not much change in , but σ has dropped significantly.
78