How To Calculate Family Wise Error Rate

Family-Wise Error Rate (FWER) Calculator

Calculate the probability of making at least one Type I error when performing multiple hypothesis tests. This tool helps researchers control the overall error rate across a family of comparisons.

The significance level for each individual hypothesis test
Total number of hypothesis tests being performed

Family-Wise Error Rate Results

Adjusted Alpha per Test:
Critical Value:

Comprehensive Guide: How to Calculate Family-Wise Error Rate (FWER)

The Family-Wise Error Rate (FWER) is a fundamental concept in statistical hypothesis testing that becomes particularly important when conducting multiple comparisons. This guide will explain what FWER is, why it matters, how to calculate it, and the various methods available to control it.

What is Family-Wise Error Rate?

FWER is defined as the probability of making at least one Type I error (false positive) when performing multiple hypothesis tests. In statistical terms:

FWER = 1 – (1 – α)m

Where:

  • α is the significance level for each individual test (typically 0.05)
  • m is the number of comparisons or hypothesis tests being performed

For example, if you perform 20 independent tests each at α = 0.05, the probability of at least one false positive is:

FWER = 1 – (1 – 0.05)20 ≈ 0.6415 or 64.15%

Why FWER Control is Important

When conducting multiple hypothesis tests, the probability of making at least one Type I error increases dramatically with the number of tests. This is known as the multiple comparisons problem. Without proper control:

Number of Tests (m) Individual α = 0.05 FWER (Uncontrolled)
10.050.0500 (5.00%)
50.050.2262 (22.62%)
100.050.4013 (40.13%)
200.050.6415 (64.15%)
500.050.9231 (92.31%)
1000.050.9941 (99.41%)

As shown in the table, even with a conservative individual α of 0.05, performing just 20 tests results in a 64% chance of at least one false positive. This demonstrates why FWER control is essential in research with multiple comparisons.

Methods for Controlling FWER

Several methods exist to control FWER. The calculator above implements three of the most common approaches:

1. Bonferroni Correction

The simplest and most conservative method. The Bonferroni correction divides the desired family-wise alpha level by the number of comparisons:

Adjusted α = αFW / m

Where αFW is the desired family-wise error rate (typically 0.05).

Advantages:

  • Simple to calculate and understand
  • Guarantees FWER ≤ αFW in all cases
  • Works for any number of tests

Disadvantages:

  • Very conservative – can lead to low statistical power
  • May fail to detect true effects (increased Type II errors)

2. Šidák Correction

A slightly less conservative alternative to Bonferroni that assumes independence of tests. The adjusted alpha is calculated as:

Adjusted α = 1 – (1 – αFW)1/m

Advantages:

  • Less conservative than Bonferroni when tests are independent
  • Still guarantees FWER control

Disadvantages:

  • Assumes test independence (may not hold in practice)
  • Still relatively conservative

3. Holm-Bonferroni Method

A step-down procedure that is less conservative than Bonferroni while still controlling FWER. The method:

  1. Sorts all p-values from smallest to largest: p(1) ≤ p(2) ≤ … ≤ p(m)
  2. Compares each p-value to αFW/(m – i + 1) where i is the rank
  3. Rejects all hypotheses where p(i) ≤ αFW/(m – i + 1)

Advantages:

  • More powerful than Bonferroni
  • Still controls FWER at αFW
  • Doesn’t assume independence of tests

Disadvantages:

  • More complex to compute manually
  • Still less powerful than methods that control the false discovery rate (FDR)

When to Use FWER Control

FWER control is particularly important in several research scenarios:

  1. Clinical Trials: When testing multiple endpoints (e.g., blood pressure, cholesterol, heart rate), controlling FWER prevents false claims about drug efficacy.
  2. Genomics: In genome-wide association studies (GWAS) with millions of tests, FWER control is essential to avoid false positive gene associations.
  3. Psychology Experiments: When testing multiple psychological measures or brain regions, FWER prevents inflated Type I error rates.
  4. Econometrics: When testing multiple economic hypotheses simultaneously, FWER control maintains research integrity.

FWER vs. False Discovery Rate (FDR)

While FWER focuses on controlling the probability of any Type I errors, the False Discovery Rate (FDR) controls the expected proportion of false positives among all significant results. The choice between them depends on the research context:

Characteristic FWER Control FDR Control
Error Control Focus Probability of any Type I errors Proportion of Type I errors among significant results
Conservatism Very conservative Less conservative
Statistical Power Lower (more Type II errors) Higher (fewer Type II errors)
Best For When even one false positive is unacceptable (e.g., clinical trials) When some false positives are tolerable (e.g., exploratory research)
Multiple Testing Scenarios Fewer tests (e.g., < 100) Large-scale testing (e.g., genomics with thousands of tests)

In practice, FWER control is preferred when the cost of a false positive is very high (e.g., approving an ineffective drug), while FDR control is often used in exploratory research where some false positives are acceptable in exchange for higher power to detect true effects.

Practical Example: FWER in Clinical Trials

Consider a clinical trial testing a new drug with three primary endpoints:

  1. Reduction in blood pressure
  2. Improvement in cholesterol levels
  3. Reduction in heart rate

Using α = 0.05 for each test without correction:

FWER = 1 – (1 – 0.05)3 = 0.1426 (14.26%)

This means there’s a 14.26% chance of incorrectly concluding the drug is effective on at least one endpoint, even if it’s not effective on any.

Applying the Bonferroni correction:

Adjusted α = 0.05 / 3 ≈ 0.0167

Now each test must achieve p ≤ 0.0167 to be considered significant, reducing the FWER back to 5%.

Common Misconceptions About FWER

Several misunderstandings about FWER persist in research communities:

  1. “FWER is only important in large studies” – Even with few comparisons, FWER can be substantial. With just 5 tests at α=0.05, FWER is 22.6%.
  2. “Bonferroni is always too conservative” – While true for many tests, with few comparisons (<10), Bonferroni’s power loss is often acceptable.
  3. “FWER control guarantees all significant results are true positives” – It only controls the probability of any false positives, not the proportion.
  4. “FDR is always better than FWER” – In confirmatory research where false positives are costly, FWER may be more appropriate.

Advanced Topics in FWER Control

For researchers dealing with complex multiple testing scenarios, several advanced FWER control methods exist:

1. Hochberg’s Procedure

A step-up procedure that is uniformly more powerful than Holm’s method while still controlling FWER. It works by:

  1. Sorting p-values from largest to smallest
  2. Comparing each to α/(m – i + 1)
  3. Rejecting all hypotheses where p(i) ≤ α/(m – i + 1)

2. Hommel’s Procedure

A more complex but more powerful method that:

  1. Finds the largest k where p(m-k+j) > jα/k for all j = 1,…,k
  2. Rejects all hypotheses with p(i) ≤ α/k

3. Resampling-Based Methods

For dependent test statistics, resampling methods like:

  • Permutation tests – Create a null distribution by permuting data
  • Bootstrap methods – Resample with replacement to estimate FWER

These can provide exact FWER control without distributional assumptions.

Software Implementation

Most statistical software packages include FWER control methods:

R Implementation

# Bonferroni
    p.adjust(p.values, method = "bonferroni")

    # Holm
    p.adjust(p.values, method = "holm")

    # Hochberg
    p.adjust(p.values, method = "hochberg")

Python Implementation (using statsmodels)

from statsmodels.stats.multitest import multipletests

    # Bonferroni
    reject, pvals_corrected, _, _ = multipletests(p_values, method='bonferroni')

    # Holm
    reject, pvals_corrected, _, _ = multipletests(p_values, method='holm')

Regulatory Guidelines on FWER

Regulatory agencies provide specific guidance on multiple testing corrections:

  • FDA: In clinical trials, the FDA typically expects FWER control for primary endpoints. Their guidance on multiple endpoints emphasizes the importance of controlling the overall Type I error rate.
  • EMA: The European Medicines Agency’s guideline on multiplicity discusses hierarchical testing procedures and gatekeeping strategies to control FWER in confirmatory trials.
  • ICH E9: The International Council for Harmonisation’s Statistical Principles for Clinical Trials (Section 5.6) addresses multiplicity issues and the need for appropriate adjustment methods.

Case Study: FWER in Genome-Wide Association Studies

GWAS typically test millions of genetic variants for association with diseases. With m ≈ 1,000,000 and α = 0.05:

Uncontrolled FWER ≈ 1 – (1 – 0.05)1,000,000 ≈ 1 (100%)

The Bonferroni correction would require:

Adjusted α = 0.05 / 1,000,000 = 5 × 10-8

This is why GWAS use such stringent significance thresholds (typically 5 × 10-8). However, this comes at the cost of power – many true associations may be missed. Some GWAS now use:

  • Two-stage designs – First stage uses lenient threshold, second stage confirms with strict threshold
  • Bayesian approaches – Incorporate prior probabilities of associations
  • Polygenic risk scores – Combine effects of many variants

Future Directions in Multiple Testing

Several emerging approaches aim to improve upon traditional FWER control:

  1. Adaptive Procedures: Use pilot data to estimate the proportion of true null hypotheses, adapting the correction accordingly.
  2. Weighted Methods: Assign different weights to different hypotheses based on their importance or prior evidence.
  3. Structured Hypotheses: Exploit known relationships between tests (e.g., hierarchical or graphical structures) to improve power.
  4. Machine Learning Integration: Use predictive models to prioritize hypotheses for testing, reducing the effective number of comparisons.

Conclusion

The Family-Wise Error Rate is a critical concept in statistical inference that becomes increasingly important as the number of hypothesis tests grows. Proper FWER control ensures the integrity of research findings by limiting the probability of false positive results.

Key takeaways:

  • FWER increases rapidly with the number of tests – even 20 tests at α=0.05 gives FWER≈64%
  • Several methods exist to control FWER, with Bonferroni being the simplest but most conservative
  • The choice between FWER and FDR control depends on the research context and tolerance for false positives
  • Regulatory agencies often require FWER control in confirmatory clinical trials
  • Emerging methods offer more powerful alternatives while still controlling error rates

Researchers should carefully consider their multiple testing strategy during study design, choosing an approach that balances Type I error control with statistical power to detect true effects.

Leave a Reply

Your email address will not be published. Required fields are marked *