USMLE forum
Step 1
Step 2 CK
Step 2 CS
Matching & Residency
Step 3
  <<   < *  Step 3  *  2006 Archives  *   >   >>  

* biostatastics-hope to be usefull for someone
  sunny111 - 10/29/06 16:55
  1. Cohart studies:
The observational study design most similar to a clinical trial

Takes a population and divides it into groups based on the basis of exposure status to the risk factor being studied and observed over time
Relative risk if prospective
Not good for rare conditions

Matching and restriction are used in cohart studies

Relative risk –measure of outcome ithe followup studies.[cohart /prospective studies]
It is Risk ratio that compares the risk among exposed vs nonexposed.
Risk in exposed /risk in nonexposed
The main endpoint of interest in cohort studies is the ratio of disease incidence in people who are exposed divided by disease incidence in people who are free of exposure. This ratio is called the relative risk or the risk ratio associated with a given risk factor. The terms are synonymous. A relative risk greater than 1 indicates that the factor in question is a risk factor for disease; a relative risk less than 1 suggests that the exposure was protective against disease. Relative risks are reported with confidence intervals (CIs). If the interval includes 1, the finding is not statistically significant.
Null value for relative risk is 1
So confidence interval including one is not statistically significant
Ex:[ 0.7-1.2 which also includes 1 in between .
Confidence interval
Expresses the certainty that the observation is real or is the product of random chance
Used with or and rr , the 95% ci says the observed risk or the odds have the a 95% chance of being with in the interval
Ez : relarive risk of cancer with smoking is 2.0 with a 95 % ci of 1.3-3.5
Observed rr of cancer was 2.0 and there is 95% certainty that the actual rr of cancer from smoking falls somewhere between 1.3 and 3.5

2.Case control studies:
More efficient
Pt’s are registered based on the presence or absence of disease
Observed retrospectively for the presence of a risk factor
To decrease the effect of confounding variables ,use matching techniques for age , sex etc, while selecting the control group from the same population as the diseased

Measures the odds ratio[also called by some as estimated relative risk]
The OR compares odds of exposure in individuals with disease with odds of exposure in individuals who are free of disease. The OR is roughly equivalent to the risk ratio used in cohort studies
OR approximates relative risk as disease incidence approaches zero [ in rare diseases]

Cheaper as the sample size needed is smaller and also done faster than cohart
Used to investigate rare or chronic diseases
More prone to selection bias and does not provide direct information on disease risk
+ve Disease



Relative risk =risk in exposed / risk in nonexposed
= incidence in exposed / incidence in non exposed
=disease positive in exposed /disease positive in non exposed.
= [a/a+b ]/ [ c/c+d]

odds ratio = compares odds of exposure in individuals with disease with odds of exposure in individuals who are free of disease

= true value /false values = ad/bc

3.randomized control trial
people are randomly assigned to a treatment group or a placebo group and followed over time to see of the treatment made a difference


Relative risk reduction also used in rct
[Event rate in untreated – event rate in treated] /event rate in untreated

One problem with the relative risk measure is that without knowing the level of risk in the control group, one cannot assess the effect size in the treatment group. Treatments with very large relative risk reductions may have a small effect in conditions where the control group has a very low bad outcome rate. On the other hand, modest relative risk reductions can assume major clinical importance if the baseline (control) rate of bad outcomes is large
On the whole clinical importance of relative risk depends on the rate of bad outcome in the control group.

Note that a large difference can occur between absolute and relative risk reduction .RR has the potential to look largely deceptive ,so watch out for drug advertising that touts RRR rather than ARR.
Null value for relative risk reduction is 0%

Absolute risk reduction is used in rct
ARR= risk in
= absolute adverse event rate in placebo- absolute event rate in treated
Takes into account background rate of the disease.

The absolute risk reduction does not involve an explicit comparison to the control group as in the relative risk reduction and thus, does not confound the effect size with the baseline risk. However, it is a less intuitve measure to interpret.

Null value for absolute risk reduction is 0 .so confidence interval not including null value is statistically significant .

Number needed to treat
number needed to treat = number of pts needed to treat to prevent one adverse outcome
inverse of ARR
esplly usefull in comparing the results of multiple clinical trials , which the relative effectiveness of treatment are readily apparent ,
• A negative NNT is also known as the number needed to harm (NNH).
Null value for NNR is infinity [ 1/ARR= 1/0 =infinity ]

When addressing therapy, harm, or aetiology questions, a systematic review of 2 double blind RCTs typically provides more convincing evidence than an individual RCT, which again provides more convincing evidence than an individual cohort or case control study.

The larger the sample, the less the uncertainty, the narrower the CI, and hence the smaller the observed effect that can be declared statistically significant (p<0.05). Thus, if a sample is very large, even a very small difference (which may be of no clinical relevance) may be statistically significant . The width of a CI is affected by both the sample size (n) and the sample SD. The larger the sample (and the smaller its variability), the greater the accuracy of the sample estimate and thus the narrower the CI. A wide CI can thus reflect either a small sample or one with large variability .

Statistical uncertainty = Statistical uncertainty is the uncertainty (present even in a representative sample) associated with the use of sample data to make statements about the wider population
Why do we need measures of uncertainty? It usually is not feasible to include all individuals from a target population in a single study. For example, in a randomised controlled trial (RCT) of a new treatment for hypertension, it would not be possible to include all individuals with hypertension. Instead, a sample (a small subset of this population) is allocated to receive either the new or the standard treatment.
What are the measures of uncertainty? Either hypothesis tests (with the calculation of p values) or confidence intervals (CIs) can quantify the amount of statistical uncertainty present in a study, though CIs are usually preferred.

Comparison of the use of p values and confidence intervals in statistical inference
P values and hypothesis tests Confidence intervals

What are they used for?

p values are used to assess whether a sample estimate is significantly different from a hypothesised value (such as zero—ie, no treatment effect). Hypothesis tests assess the likelihood of the estimate under the null hypothesis of no difference between 2 population values (or no treatment effect). Conventionally, if p<0.05, the null hypothesis is rejected. Confidence intervals (CIs) present a range of values around the sample estimate within which there is reasonable confidence that the true, but unknown, population value lies. The 95% CI (the range of values within which there is 95% probability that the true value lies) is most commonly used. It corresponds with the typical 5% significance level used in hypothesis tests.

What do they tell us?

The p value is the probability that the observed effect (or more extreme ones) would have occurred by chance if in truth there is no effect.
However, it doesn’t tell us anything about the size of the true effect and, moreover, since hypothesis tests are 2 tailed (we are interested in differences in either direction) it doesn’t even tell us the direction of this effect. Thus, in the above example, the p value of 0.006 indicates that an effect of 19.6% or more, in favour of either streptomycin or bed rest, would occur in only 6 in 1000 trials if in truth there is no effect. The CI provides a range of values whose limits are, with specified probability (typically 95%), the smallest and the largest true population values consistent with the sample data. A CI can thus effectively function as a hypothesis test for an infinite number of values: if the CI includes any 1 of these values then the sample estimate is not statistically significantly different from it. The 95% CI is of particular relevance to evidence-based practice (EBP), providing valuable information such as whether the interval includes or excludes clinically significant values.

When can they be used?

There are many different types of hypothesis test, each suitable for a particular type of data. For example, parametric tests (such as t-tests) are only suitable for large samples and for data that are drawn from a population which is approximately normally distributed. A CI can, and should, be calculated for most measures of effect, such as differences between means (such as scores or weights), and differences in proportions, EER and CER, ARR, NNT, risk ratios (RR), and odds ratios (OR).

How are they calculated?

The observed effect together with a measure of its variability (such as the standard error, SE) is used to calculate a "test statistic" (eg, t, z, 2). For example, a t statistic is calculated by dividing the observed effect by its SE. The value of the test statistic is used (from statistical tables) to determine the p value. Thus, in the example above (where SE(ARR) = 7.1%), the z statistic (assuming that the ARR is approximately normally distributed) for the test of whether the risk of death differs between those allocated to streptomycin and those allocated to bed rest is calculated as z = 19.6/7.1 = 2.76. This has an associated p value of 0.006. To calculate a CI around a sample estimate, only 3 pieces of information are needed: the sample size (n), the sample standard deviation (SD), and the "z score," which varies depending on the degree of confidence wanted (95%, 99% etc). For a 95% CI, z = 1.96, and for a 99% CI, z = 2.58. A 95% CI is calculated as: Sample estimate ± 1.96 standard errors (SE) of the measure (note: SE = SD/ n). Thus, in the above example (where SE(ARR) = 7.1%), the 95% CI for the true ARR is calculated as: 19.6% ± 1.96 (7.1%) = 5.7% to 33.6%.

type 1 and type 11 errors

type 1: rejecting null hypothesis when it is true
type 11; accepting null hypothesis when it is false.

Attributable risk –it’s the measure of excess risk.
Also called the etiologic fraction
It estimates the proportion of disease in exposed subjects that is attributed to exposure status
[Risk in exposed –risk in unexposed ] / risk in exposed
from relative risk:[ rr-1]/rr

population AA is [ risk in population – risk in exposed ]/risk in population.

Risk in population = risk in exposed x prevalence in population
Risk in unexposed x prevalence in population

Observers bias is balanced by blinding

Randomization helps blance confounding factors

Lead time bias occurs if the screening test determines the disease at a earlier stage , but the prognosis is not effected

Sensitivity and specificity of a test

Pretest probability is equal to the prevalence of the disease in a population
Sensitivity = no:of true positive results among the diseased
Specificity = true negatives among the non diseased.
Positive predictive value =true positives among the total positive results
Negitive predictive value =true negative among the negative results

Post test probability =positive predictive value for the test

+ve Disease



Sensitivity = a/a+c

Specificity = d/ b+d



Probability of having disease in a population [prevalence ]= total no:of diseased / total
= a+c/a+b+c+d
if a prevalence of apopulation is give ,no:of pts with disease can be calculated by above formula if the total population is known.

higher the prevalence higher the positive predictive value
lower the prevalence higher the negative predictive value

likelihood ratios

same assessment as the sensitivity and specificity

but simple and used when the result of the test is having more than one results

takes into account the test results at multiple levels of severity

likehood ratio = probability of test result in total diseased patients /probability of a test result in patients without disease

for dichotomous tests , the calculation of likelihood ratio is as follows

post test odds =pretest odds x LL ratio

odds=probability / 1-probability [[ pretest odds = prevalence /1-prevalance]]

probability = odds/1+odds

+ve Disease



likehood ratio = probability of test result in total diseased patients /probability of a test result in patients without disease

probability of +ve test result in diseased patients = A/ A+B =sensitivity

probability of +ve test result in nondiseased patients = B/B+D = 1- specificity

likelihood ratio for +ve test result = sensitivity / 1-specificity

probability of -ve test result in diseased patients= C/A+C = 1-sensitivity

probability of -ve test result in nondiseased patients=D/B+D = specificity

likelihood ratio for –ve test result =1-sensitivity /specificity

Report Abuse

* Re:biostatastics-hope to be usefull for someone
  sunny111 - 10/29/06 17:01
  i am not able to post the tables
1st table

disease + and exposure + a
disease - and exposure + b
disease + and exposure - c
disease - and exposure - d

2nd table
disease + and test + a
disease - and test + b
disease + and test - c
disease - and test - d

Report Abuse

* Thanks
  auerrod - 10/29/06 18:39
  Thanks Sunny! You're the man.....I think!!  
Report Abuse

          Page 1 of 1          

[<<First]   [<Prev]  ... Message ...  [Next >]   [Last >>]




Step 1 Step 2 CK Step 2 CS Matching & Residency Step 3
USMLE Forum ArchivesUSMLE LinksUSMLE Forum Home