Power and Sample Size
Low probability events; Comparative study of percentages · Power determination: percentages Sample size for a prevalence survey, with finite population correction If you enter a precision of 5%, Sampsize will return the sample size needed for 95% Example: A study on the hepatitis C virus aims at describing some. These results inform the appropriate sample sizes for animal transmission Transmission experiments of influenza A viruses have helped to .. Number of pairs, Final size of comparison group, p-value, power, p-value, power. The statistical power of a test is defined as 1 – β or the probability of rejecting . where ρ is the intraclass correlation, and m is the sample size within each cluster. .. new assay to screen healthy cattle for Foot-and-mouth disease virus (FMDV;.
Fisher, a giant in the field of statistics, chose this value as being meaningful for the agricultural experiments with which he worked in the s.
Comparing two means 2. Introduction Many studies in our field boil down to generating means and comparing them to each other. This is true even if the data are acquired from a single population; the sample means will always be different from each other, even if only slightly.
The pertinent question that statistics can address is whether or not the differences we inevitably observe reflect a real difference in the populations from which the samples were acquired. Put another way, are the differences detected by our experiments, which are necessarily based on a limited sample size, likely or not to result from chance effects of sampling i.
If chance sampling can account for the observed differences, then our results will not be deemed statistically significant In contrast, if the observed differences are unlikely to have occurred by chance, then our results may be considered significant in so much as statistics are concerned.
Whether or not such differences are biologically significant is a separate question reserved for the judgment of biologists. Most biologists, even those leery of statistics, are generally aware that the venerable t-test a.
Several factors influence the power of the t-test to detect significant differences. These include the size of the sample and the amount of variation present within the sample. If these sound familiar, they should. They were both factors that influence the size of the SEM, discussed in the preceding section.
This is not a coincidence, as the heart of a t-test resides in estimating the standard error of the difference between two means SEDM. Greater variance in the sample data increases the size of the SEDM, whereas higher sample sizes reduce it.
Thus, lower variance and larger samples make it easier to detect differences. If the size of the SEDM is small relative to the absolute difference in means, then the finding will likely hold up as being statistically significant. In fact, it is not necessary to deal directly with the SEDM to be perfectly proficient at interpreting results from a t-test. We will therefore focus primarily on aspects of the t-test that are most relevant to experimentalists.
These include choices of carrying out tests that are either one- or two-tailed and are either paired or unpaired, assumptions of equal variance or not, and issues related to sample sizes and normality. We would also note, in passing, that alternatives to the t-test do exist. These tests, which include the computationally intensive bootstrap see Section 6. For reasonably large sample sizes, a t-test will provide virtually the same answer and is currently more straightforward to carry out using available software and websites.
It is also the method most familiar to reviewers, who may be skeptical of approaches that are less commonly used. We will do this through an example. Imagine that we are interested in knowing whether or not the expression of gene a is altered in comma-stage embryos when gene b has been inactivated by a mutation.
To look for an effect, we take total fluorescence intensity measurements 15 of an integrated a:: For each condition, we analyze 55 embryos. Expression of gene a appears to be greater in the control setting; the difference between the two sample means is Summary of GFP-reporter expression data for a control and a test group.
Along with the familiar mean and SD, Figure 5 shows some additional information about the two data sets. Recall that in Section 1. What we didn't mention is that distribution of the data 16 can have a strong impact, at least indirectly, on whether or not a given statistical test will be valid. Such is the case for the t-test. Looking at Figure 5we can see that the datasets are in fact a bit lopsided, having somewhat longer tails on the right. In technical terms, these distributions would be categorized as skewed right.
Although not critical to our present discussion, several parameters are typically used to quantify the shape of the data including the extent to which the data deviate from normality e. In any case, an obvious question now becomes, how can you know whether your data are distributed normally or at least normally enoughto run a t-test?
Sample size estimation and power analysis for clinical research studies
Before addressing this question, we must first grapple with a bit of statistical theory. The Gaussian curve shown in Figure 6A represents a theoretical distribution of differences between sample means for our experiment. Put another way, this is the distribution of differences that we would expect to obtain if we were to repeat our experiment an infinite number of times. Thus, if we carried out such sampling repetitions with our two populations ad infinitum, the bell-shaped distribution of differences between the two means would be generated Figure 6A.
Note that this theoretical distribution of differences is based on our actual sample means and SDs, as well as on the assumption that our original data sets were derived from populations that are normal, which is something we already know isn't true. So what to do? Theoretical and simulated sampling distribution of differences between two means. The distributions are from the gene expression example. The black vertical line in each panel is centered on the mean of the differences.
As it happens, this lack of normality in the distribution of the populations from which we derive our samples does not often pose a problem. The reason is that the distribution of sample means, as well as the distribution of differences between two independent sample means along with many 20 other conventionally used statisticsis often normal enough for the statistics to still be valid.
How large is large enough? That depends on the distribution of the data values in the population from which the sample came. The more non-normal it is usually, that means the more skewedthe larger the sample size requirement. Assessing this is a matter of judgment Figure 7 was derived using a computational sampling approach to illustrate the effect of sample size on the distribution of the sample mean.
In this case, the sample was derived from a population that is sharply skewed right, a common feature of many biological systems where negative values are not encountered Figure 7A. As can be seen, with a sample size of only 15 Figure 7Bthe distribution of the mean is still skewed right, although much less so than the original population.
Sample Size Considerations for One-to-One Animal Transmission Studies of the Influenza A Viruses
By the time we have sample sizes of 30 or 60 Figure 7C, Dhowever, the distribution of the mean is indeed very close to being symmetrical i.
Illustration of Central Limit Theorem for a skewed population of values. Panel A shows the population highly skewed right and truncated at zero ; Panels B, C, and D show distributions of the mean for sample sizes of 15, 30, and 60, respectively, as obtained through a computational sampling approach. As indicated by the x axes, the sample means are approximately 3. The y axes indicate the number of computational samples obtained for a given mean value.
As would be expected, larger-sized samples give distributions that are closer to normal and have a narrower range of values. The Central Limit Theorem having come to our rescue, we can now set aside the caveat that the populations shown in Figure 5 are non-normal and proceed with our analysis. From Figure 6 we can see that the center of the theoretical distribution black line is Furthermore, we can see that on either side of this center point, there is a decreasing likelihood that substantially higher or lower values will be observed.
The vertical blue lines show the positions of one and two SDs from the apex of the curve, which in this case could also be referred to as SEDMs. Thus, for the t-test to be valid, the shape of the actual differences in sample means must come reasonably close to approximating a normal curve. But how can we know what this distribution would look like without repeating our experiment hundreds or thousands of times?
To address this question, we have generated a complementary distribution shown in Figure 6B. In contrast to Figure 6AFigure 6B was generated using a computational re-sampling method known as bootstrapping discussed in Section 6. It shows a histogram of the differences in means obtained by carrying out 1, in silico repeats of our experiment. Importantly, because this histogram was generated using our actual sample data, it automatically takes skewing effects into account.
Notice that the data from this histogram closely approximate a normal curve and that the values obtained for the mean and SDs are virtually identical to those obtained using the theoretical distribution in Figure 6A. What this tells us is that even though the sample data were indeed somewhat skewed, a t-test will still give a legitimate result. Moreover, from this exercise we can see that with a sufficient sample size, the t-test is quite robust to some degree of non-normality in the underlying population distributions.
Issues related to normality are also discussed further below. One- versus two-sample tests Although t-tests always evaluate differences between two means, in some cases only one of the two mean values may be derived from an experimental sample. For example, we may wish to compare the number of vulval cell fates induced in wild-type hermaphrodites versus mutant m.
Because it is broadly accepted that wild type induces on average three progenitor vulval cells, we could theoretically dispense with re-measuring this established value and instead measure it only in the mutant m background Sulston and Horvitz, In such cases, we would be obliged to run a one-sample t-test to determine if the mean value of mutant m is different from that of wild type.
With regard to the sample size i. There were two studies that used 4 pairs. Two studies used different pair numbers between two groups, i. The primary objective of the original studies was either to identify molecular mechanisms e. Stochastic general epidemic model To allow comparison of the transmissibility, here we express the result from one-to-one transmission experiment i.
First of all, we adopt an assumption that each pair is independent of other pairs, including no air-exchange between pairs. Let p s,i t be the conditional probability of observing s susceptible and i infected ferrets at time t given the initial condition of susceptible and infected ferrets s 0, i 0 at time 0, i. Since the one-to-one transmission experiment handles the binary outcome i. In addition to the final size discussed above, statistical consideration of transient state has been given elsewhere .
Hypothesis testing We subsequently consider the difference in the transmissibility in published experimental studies in two different ways, because the null hypothesis has not necessarily been mentioned in the original articles in Table 1.How to Use SPSS: Estimating Appropriate Sample Size
Let R 0,ref be a specified reference value of the basic reproduction number. The first possible way to compare the transmissibility is to regard the result from each virus as one-sample comparison, which may be the case when R 0,ref of control virus can be assumed known e. In this scenario, we compare R 0 against R 0,ref, i. It should be noted that R 0 depends on experimental design air change rate per hour, air flow direction, etc and is not comparable between differently designed experiments.