Estimate cross-trait genetic correlations

ldsc_rg() uses ldscore regression to estimate the pairwise genetic correlations between traits. The function relies on named lists of traits, sample prevalences, and population prevalences. The name of each trait should be consistent across each argument.

Usage

ldsc_rg(
  munged_sumstats,
  ancestry,
  sample_prev = NA,
  population_prev = NA,
  ld,
  wld,
  n_blocks = 200,
  chisq_max = NA,
  chr_filter = seq(1, 22, 1)
)

Arguments

munged_sumstats: (list) A named list of dataframes, or paths to files containing munged summary statistics. Each set of munged summary statistics contain at least columns named SNP (rsid), A1 (effect allele), A2 (non-effect allele), N (total sample size) and Z (Z-score)
ancestry: (character) One of "AFR", "AMR", "CSA", "EAS", "EUR", or "MID", which will utilize the appropriate built-in ld and wld files from Pan-UK Biobank. If empty or NULL, the user must specify paths to ld and wld files.
sample_prev: (list) A named list containing the prevalence of cases in the current sample, used for conversion from observed heritability to liability-scale heritability. The default is NA, which is appropriate for quantitative traits or estimating heritability on the observed scale.
population_prev: (list) A named list containing the population prevalence of the trait, used for conversion from observed heritability to liability-scale heritability. The default is NA, which is appropriate for quantitative traits or estimating heritability on the observed scale.
ld: (character) Path to directory containing ld score files, ending in *.l2.ldscore.gz. Default is NA, which will utilize the built-in ld score files from Pan-UK Biobank for the ancestry specified in ancestry.
wld: (character) Path to directory containing weight files. Default is NA, which will utilize the built-in weight files from Pan-UK Biobank for the ancestry specified in ancestry.
n_blocks: (numeric) Number of blocks used to produce block jackknife standard errors. Default is 200
chisq_max: (numeric) Maximum value of Z^2 for SNPs to be included in LD-score regression. Default is to set chisq_max to the maximum of 80 and N*0.001.
chr_filter: (numeric vector) Chromosomes to include in analysis. Separating even/odd chromosomes may be useful for exploratory/confirmatory factor analysis.

Value

A list of class ldscr_list containing heritablilty and genetic correlation information

h2 = tibble containing heritability information for each trait. If sample_prev and population_prev were provided, the heritability estimates will also be returned on the liability scale.
rg = tibble containing pairwise genetic correlations information.
raw = A list of correlation/covariance matrices

Details

This function estimates the pairwise genetic correlations between an arbitrary number of traits. The function also estimates heritability for each individual trait. There is a ggplot2::autoplot() method for visualizing a heatmap of the results.

Examples

if (FALSE) {
# Estimate genetic correlations between "APOB" and "LDL"
ldsc_res <- ldsc_rg(munged_sumstats = list("APOB" = sumstats_munged_example(example = "APOB"), "LDL" = sumstats_munged_example(example = "LDL")), ancestry = "EUR")

# Plot heatmap of results
autoplot(ldsc_res)
}