The Sri Lankan Personal Genome Project : an overview

The first Sri Lankan Personal Genome was sequenced heralding the entry of Sri Lanka into the new era of whole genome sequencing. This paper explains the background and the rationale for the project, gives a brief overview of what was found in the Sri Lankan Personal Genome, and discusses the future directions of the project.


Background
With the completion of the Human Genome Project (HGP) (1,2) and a decade of human genomics, we are at an interesting juncture in the history of mankind.New technologies have enabled sequencing of complete human genomes at a fraction of the original cost of what was spent on the Human Genome Project (HGP).At the same time, these technologies have significantly improved the scale and the ease of sequencing, and as a result it is now possible to sequence entire human genomes and understand the genomic make-up of the individual, with widespread potential applications in healthcare (3) .Major projects worldwide which followed the Human Genome Project, including the HapMap project (4) and the 1000 genomes project (5) , have been cataloguing human genetic variations at rapid speed.In addition, they have fuelled the growth of technologies and analytical methods to scan entire genomes for informative genetic markers, aimed at understanding the differences and similarities between individuals.This is the first step towards understanding genotype-phenotype correlations.Such large studies have been undertaken by multiple groups from around the world (6) .This has resulted in the identification of a large number markers associated with complex disorders and drugresponse (7) .This is just the tip of the iceberg, and many new associations continue to be reported in scientific literature on a daily basis.
In addition to understanding genetic variations and how they contribute to disease, there has been a large body of work aimed at understanding other important aspects such as epigenetic mechanisms and genomic regulation.This was made possible by new genomic tools which enabled researchers to address questions at a genomic level and developments in bioinformatics, made possible by the availability of cheaper and faster computers which made it possible to do large-scale data analysis, and the development of robust algorithms to mine data and model biological phenomenon on a genomic scale.

The Sri Lankan personal genome
Today any country aspiring to provide its people access to state of the art healthcare cannot ignore the rapid advances in the field of human genomics.It is imperative at this juncture for every country to acquire the much-required tools and know-how.In addition, it is also necessary to create the baseline data for understanding the genetic diversity of its population.
The Sri Lankan Personal Genome Project is the first step in this direction in Sri Lanka, and marks the entry of Sri Lanka to the exciting field of whole genome sequencing.Sri Lanka is home to over 20 million people with rich racial, cultural and linguistic diversity (8) .The earliest evidence (34,000 BP) of anatomically modern man in South Asia is found in Sri Lanka (9) .The rich diversity of human populations in the island has been influenced by migration from the mainland India.Sri Lanka also has a rich heritage in organised medical care.The hospital at Mihintale (437 -367 BC) is the most ancient hospital to be discovered in the World (10) .The Sri Lankan population consists presently of six major populations, the Sinhalese, Sri Lankan Tamils, Indian Tamils, Moors, Burghers, and Malays (8) .It is also home to other smaller diverse populations like Vaddhas, the descendents of the original inhabitants of the island who were geographically isolated from other populations, and Kaffirs, descendents of African slaves brought to the island over 500 years ago.
To understand the genetic diversity of the Sri Lankan populations, and to create the baseline data for disease association studies, we had earlier created the Sri Lankan Genome Variation Database (11) .This database contains information on Single Nucleotide Polymorphisms (SNPs) found in Sinhalese, Sri Lankan Tamils and Moors, the three major ethnic groups in the Sri Lankan population.The database presently contains information including genotype frequencies of 34 genomic variations encompassing 14 medically important genes.The database has been designed keeping in mind international standards for describing and annotating variations, including those of the Human Genome Variation Society (HGVS) (12) .In the true spirit of collaborations and open access to data, the database also accepts submissions from the research community and thus offers a standard access point to the spectrum of genetic variations in the population to researchers and clinicians.The resource is accessible at URL: http://www.hgucolombo.net/slgv/home.htmAs a proof of concept towards the goal of interpreting and analysing complete genome data, we sequenced a complete genome of an anonymous Sinhalese male of Sri Lankan origin with both upcountry and low country descent.Sequencing was performed using next-generation sequencing technology, with over 20x coverage of the genome.Analysis of the genome resulted in the identification of 2,811,918 SNPs, of which 222,739 were novel in comparison to dbSNP (13) build 131.This accounted for almost 7.9% of entire set of variations in the genome, pointing to the necessity of having more complete genomes to have a more comprehensive picture of the spectrum of genetic variations in humans.Analysis also resulted in the identification of 489,921 insertion-deletion (InDel) events in the genome.

Future directions
The immediate strategic goals of the Sri Lankan Personal Genome Project for 2011-2012 are to understand in depth, the genetic variations and their potential phenotypic consequences.Thus, for the years 2011-2012 we articulate our research in terms of three main streams annotating the genetic variations unique to Sri Lankans, studying the interactions between genes in relation to disease phenotypes, visualising the annotation of the Sri Lankan genome via "Sri Lankan Genome Browser" and the "Sri Lankan Genome Variation Database".
The first Sri Lankan Personal Genome has revealed over 2.8 million single nucleotide variations of which over 200,000 are unique variations which have hitherto not been identified in other populations as revealed by comparison with variations collected in the dbSNP database build 131.We hope that in depth annotation of these variations will provide crucial insights into some phenotypes which could be specific for the Sri Lankan population.Coupling this knowledge with associated clinical phenotypes and traits will potentially enable scientists to generate new hypothesis on the given association.Consequently, these hypothesis can be validated by specifically genotyping these unique variations in the Sri Lankan population.Recent advances in the field of bioinformatics and data mining offers the tools required for the annotation and functional interpretation of SNP data.[For example see the article by Harendra et al. in this issue of the Journal (14) ].The value of this information can be further enhanced by comparative studies with data coming from other projects including the 1000 Genomes Project and other population specific personal genome projects.
Recent advances in genomic technologies have enabled researchers to unravel many a biological pathway and process at molecular detail.It would be imperative to exploit this data and perform integrative analysis so as to understand the biological context and functional consequence of genomic variations.This would include (i) understanding biological interaction networks including genetic interaction networks and protein interaction networks from public databases like OMIM and HuGe Navigator, (ii) curation and integration of the interaction network so as to understand molecular processes of diseases and drug metabolic pathways, including integrating association data from public repositories and resources to understand the molecular pathways of biological processes, (iii) integration of the variation data with the gene interaction network to understand the potential consequences of the genomic variations which could be modelled and validated in model systems.
To ensure the wide use and ease of interpretation, we have made available the genomic variations and annotations of the Sri Lankan Personal Genome on the Sri Lankan Genome Browser, an online genome browser built on the Generic Model Organisms (GMOD) Gbrowse (15) .This would serve as the central hub for exchange of data, visualisation of genomic variations and their annotations including data that would come out of the Sri Lankan Personal Genome Project in the future (Figure 1).The resource is freely accessible online at www.srilankangenome.net.In the future, we hope to unravel the genetic diversity of Sri Lankan populations by sequencing more individuals from different racial groups.We also hope to collaborate both nationally and internationally to assimilate knowledge and expertise and possibly co-create resources which would enable the interpretation of data and its application in healthcare.This includes participation in co-creating open resources like OpenPGx (www.openpgx.org)for interpreting genomic variations and participation in collaborative initiatives aimed at understanding the diversity of Asian populations.We also recognise that application of genomics in healthcare would not be possible without educating and involving medical professionals in genomics research and that this would include educating medical professionals on analysing and interpreting genomic information and using such information in their clinical practice.

Figure 1 .
Figure1.The Sri Lankan Genome Browser provides quick visualization of genomic variations and would ease the annotation and interpretation of genetic variations.