A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved.

Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel / Delaneau, O., Marchini, J., Mcveanh, G.A., Donnelly, P., Lunter, G., Marchini, J.L., Myers, S., Gupta Hinch, A., Iqbal, Z., Mathieson, I., Rimmer, A., Xifara, D.K., Kerasidou, A., Churchhouse, C., Altshuler, D.M., Gabriel, S.B., Lander, E.S., Gupta, N., Daly, M.J., Depristo, M.A., et al.. - In: NATURE COMMUNICATIONS. - ISSN 2041-1723. - 5:(2014), p. 3934. [10.1038/ncomms4934]

Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel

CUCCA, Francesco;
2014-01-01

Abstract

A major use of the 1000 Genomes Project (1000GP) data is genotype imputation in genome-wide association studies (GWAS). Here we develop a method to estimate haplotypes from low-coverage sequencing data that can take advantage of single-nucleotide polymorphism (SNP) microarray genotypes on the same samples. First the SNP array data are phased to build a backbone (or 'scaffold') of haplotypes across each chromosome. We then phase the sequence data 'onto' this haplotype scaffold. This approach can take advantage of relatedness between sequenced and non-sequenced samples to improve accuracy. We use this method to create a new 1000GP haplotype reference set for use by the human genetic community. Using a set of validation genotypes at SNP and bi-allelic indels we show that these haplotypes have lower genotype discordance and improved imputation performance into downstream GWAS samples, especially at low-frequency variants. © 2014 Macmillan Publishers Limited. All rights reserved.
2014
Integrating sequence and array data to create an improved 1000 Genomes Project haplotype reference panel / Delaneau, O., Marchini, J., Mcveanh, G.A., Donnelly, P., Lunter, G., Marchini, J.L., Myers, S., Gupta Hinch, A., Iqbal, Z., Mathieson, I., Rimmer, A., Xifara, D.K., Kerasidou, A., Churchhouse, C., Altshuler, D.M., Gabriel, S.B., Lander, E.S., Gupta, N., Daly, M.J., Depristo, M.A., et al.. - In: NATURE COMMUNICATIONS. - ISSN 2041-1723. - 5:(2014), p. 3934. [10.1038/ncomms4934]
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11388/178426
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 328
  • ???jsp.display-item.citation.isi??? 329
social impact