What is Combined Annotation Dependent Depletion for Structural Variants (CADD-SV)?

CADD-SV is a tool that scores the deleteriousness of large (>50bp) deletions, insertions and duplications in the human genome.

We use structural variants (SVs) derived on the human and Chimpanzee evolutionary lineages as proxy-neutral and contrast them with matched simulated variants as proxy-pathogenic in a machine learning framework, an approach that has proven powerful in the interpretation of SNVs and short InDels (Kircher & Witten et al, 2014). Our tool computes summary statistics of overlapping and adjacent genomic annotations. We use random forest models to differentiate deleterious from neutral structural variants.

In a proof-of-principle study, we show that CADD-SV scores correlate with known pathogenic variants in individual genomes as well as allelic diversity observed across the population. CADD-SV is able to prioritize somatic variants observed in cancer patients as well as non-coding structural variants known to affect gene expression.

A pre-print describing CADD-SV was posted on bioRxiv (doi:10.1101/2021.07.10.451798) and the manuscript submitted for peer-review.

How can I obtain CADD-SV scores?

We have pre-computed CADD-SV scores for known sets of SVs and provide an easy to use lookup for these structural variants. We further enable the online scoring of new variants with an upload form. Further, all pre-scored files, annotations and scripts are available for download and we enable to run the CADD-SV workflow locally.