Animal(Human) Genetics Research at Macrogen 

Comprehensive genomic analyses associate UGT8 variants with musical ability in a Mongolian population
Park H et al. J Med Genet. 2012 Dec;49(12):747-52.
  • To identify genetic loci and variants that contribute to musical ability, weconducted family-based linkage and association analyses, and incorporated the results with data from exome sequencing and array comparative genomic hybridisation analyses.
  • This study provides new insight into the genetics of musical ability, exemplifying a methodology to assign functional significance to synonymous and non-coding alleles by integrating multiple experimental methods.

The transcriptional landscape and mutational profile of lung adenocarcinoma
Seo JS et al. Genome Res. 2012 Nov;22(11):2109-19.
  • Here we present the first large scale RNA sequencing study of lung adenocarcinoma, demonstrating its power to identify somatic point mutations as well as transcriptional variants such as gene fusions, alternative splicing events and expression outliers.
  • Our results reveal the genetic basis of 200 lung adenocarcinomas in Koreans including deep characterization of 87 surgical specimens by transcriptome sequencing.

A transforming KIF5B and RET gene fusion in lung adenocarcinoma revealed from whole-genome and transcriptome sequencing
Ju YS et al. Genome Res. 2012 Mar;22(3):436-45.
  • Here we report a novel fusion gene between KIF5B and the RET proto-oncogene caused by a pericentric inversion of 10p11.22-q11.21. This fusion gene overexpresses chimeric RET receptor tyrosine kinase, which could spontaneously induce cellular transformation.
  • Our data demonstrate that a subset of NSCLCs could be caused by a fusion of KIF5B and RET, and suggest the chimeric oncogene as a promising molecular target for the personalized diagnosis and treatment of lung cancer.

Extensive genomic and transcriptional diversity identified through massively parallel DNA and RNA sequencing of eighteen Korean individuals
Ju YS et al. Nat Genet. 2011 Jul 3;43(8):745-52.
  • Here we deep sequenced and correlated 18 genomes and 17 transcriptomes of unrelated Korean individuals. This has allowed us to construct a genome-wide map of common and rare variants and also identify variants formed during DNA-RNA transcription.
  • Our findings suggest that a considerable number of unexplored genomic variants still remain to be identified in the human genome, and that the integrated analysis of genome and transcriptome sequencing is powerful for understanding the diversity and functional aspects of human genomic variants.

TIARA: a database for accurate analysis of multiple personal genomes based on cross-technology
Hong D et al. Nucleic Acids Res. 2011 Jan;39(Database issue):D883-8.
  • We describe here the Total Integrated Archive of Short-Read and Array (TIARA; database, which contains personal genomic information obtained from next generation sequencing (NGS) techniques and ultra-highresolution comparative genomic hybridization (CGH) arrays.
  • This database improves the accuracy of detecting personal genomic variations, such as SNPs, short indels and structural variants (SVs). At present, 36 individual genomes have been archived and may be displayed in the database. TIARA provides a new approach to the accurate interpretation of personal genomes for genome research.

Reference-unbiased copy number variant analysis using CGH microarrays
Ju YS et al. Nucleic Acids Res. 2010 Nov;38(20):e190.
  • Whole genome analysis using massively parallel sequencing with multiple ultra-high resolution CGH arrays provides an opportunity to catalog highly accurate genomic variants of the reference DNA (NA10851). Using information on variants, we developed a new method, the CGH array reference-free algorithm (CARA), which can determine reference-unbiased absolute CNVs from any CGH array platform.
  • The algorithm enables the removal and rescue of false positive and false negative CNVs, respectively, which appear due to the effects of genomic variants of the reference sample in raw CGH array experiments. We found that the CARA remarkably enhanced the accuracy of CGH array in determining absolute CNVs.

Discovery of common Asian copy number variants using integrated high-resolution array CGH and massively parallel DNA
Park H et al. Nat Genet. 2010 May;42(5):400-5.
  • Here, we have developed and applied a new method to combine high-resolution array comparative genomic hybridization (CGH) data with whole-genome DNA sequencing data to obtain a comprehensive catalog of common CNVs in Asian individuals.
  • We discovered 5,177 CNVs, of which 3,547 were putative Asian-specific CNVs. These common CNVs in Asian populations will be a useful resource for subsequent genetic studies in these populations.

A highly annotated whole-genome sequence of a Korean individual
Kim JI et al. Nature. 2009 Aug 20;460(7258):1011-5.
  • Here we provide a highly annotated, whole-genome sequence for a Korean individual, known as AK1.
  • Alignment to the NCBI reference, a composite of several ethnic clades, disclosed nearly 3.45 million single nucleotide polymorphisms (SNPs), including 10,162 nonsynonymous SNPs, and 170,202 deletion or insertion polymorphisms (indels).

Plant Genetics & Various Genetics Research at Macrogen 

The genome of the mesopolyploid crop species Brassica rapa
Wang X et al. Nat Genet. 2011 Aug 28;43(10):1035-9.
  • We reported the annotation and analysis of the draft genome sequence of Brassica rapa, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome and used Arabidopsis thaliana as an outgroup for investigating the consequences of genome triplication, such as structural and functional evolution.
  • The extent of gene loss (fractionation) among triplicated genome segments varies, with one of the three copies consistently retaining a disproportionately large fraction of the genes expected to have been present in its ancestor. Variation in the number of members of gene families present in the genome may contribute to the remarkable morphological plasticity of Brassica species.

Genome sequence and analysis of the tuber crop potato
Huang S et al. Nature. 2011 Jul 10;475(7355):189-95.
  • Here we used a homozygous doubled-monoploid potato clone to sequence and assemble 86% of the 844-megabase genome. We predicted 39,031 protein-coding genes and present evidence for at least two genome duplication events indicative of a palaeopolyploid origin.
  • We also sequenced a heterozygous diploid clone and showed that gene presence / absence variants and other potentially deleterious mutations occur frequently and are a likely cause of inbreeding depression.

The B73 Maize Genome : Complexity, Diversity, and Dynamics
Schnable PS et al. Science. 2009 Nov 20;326(5956):1112-5.
  • We reported an improved draft nucleotide sequence of the 2.3-gigabase genome of maize. Over 32,000 genes were predicted, of which 99.8% were placed on reference chromosomes. Nearly 85% of the genome is composed of hundreds of families of transposable elements, dispersed nonuniformly across the genome.
  • We also reported on the correlation of methylation-poor regions with Mu transposon insertions and recombination, and copy number variants with insertions and / or deletions, as well as how uneven gene losses between duplicated regions were involved in returning an ancient allotetraploid to a genetically diploid state.

A comprehensive resource of drought- and salinity- responsive ESTs for gene discovery and marker development in chickpea
Varshney RK et al. BMC Genomics. 2009 Nov 15;10:523.
  • Chickpea (Cicer arietinum L.), an important grain legume crop of the world, is seriously challenged by terminal drought and salinity stresses. A total of 20,162 (18,435 high quality) drought- and salinity- responsive ESTs were generated from ten different root tissue cDNA libraries of chickpea. Sequence editing, clustering and assembly analysis resulted in 6,404 unigenes.
  • We identified a total of 2,029 sequences containing 3,728 simple sequence repeats (SSRs) and developed 177 new EST-SSR markers. Besides SSR markers, 21,405 high confidence single nucleotide polymorphisms (SNPs) in 742 contigs (with ≥ 5 ESTs) were also identified. Recognition sites for restriction enzymes were identified for 7,884 SNPs in 240 contigs.

The oral metagenome in health and disease
Mira A et al. ISME J. 2012 January; 6(1): 46-56.
  • The oral cavity of humans is inhabited by hundreds of bacterial species and some of them have a key role in the development of oral diseases, mainly dental caries and periodontitis. Authors describe for the first time the metagenome of the human oral cavity under health and diseased conditions, with a focus on supragingival dental plaque and cavities.
  • Direct pyrosequencing of eight samples with different oral-health status produced 1Gbp of sequence without the biases imposed by PCR or cloning. The sequencing was performed at Macrogen using the GS-FLX sequencer with Titanium chemistry. After quality checking, average read length was 425􀅾117bp. Sequences were deposited, and are publicly available in the MG-RAST server.

Evaluating short-read sequence data from the highly redundant, novel transcriptome of Polarella glacialis
Gibbons TR et al. Genome Biol. 2011; 12(Suppl 1): P5.
  • Dinoflagellates have exceptionally large genomes (109 to 1011 bases) and highly duplicated genes (which can occur thousands of times within a single genome). These and other unusual characteristics have made dinoflagellates difficult to study using traditional molecular biology techniques.
  • Macrogen has sequenced the transcriptome of Polarella glacialis. One library was sequenced on one-eighth of a Roche/454 GS FLX picotiter plate using Titanium chemistry. A second library was sequenced using one lane on an Illumina GAIIx sequencer for 78 cycles in both directions (paired end). The sequences were assembled using Newbler, MIRA, Oases and Trinity, and they were analyzed using various custom scripts.

Metagenomic Analysis of Kimchi, a Traditional Korean Fermented Food
Jeon CO et al. Appl Environ Microbiol. 2011 April; 77(7): 2264-2274.
  • Kimchi, a traditional food in the Korean culture, is made from vegetables by fermentation. In this study, metagenomic approaches were used to monitor changes in bacterial populations, metabolic potential, and overall genetic features of the microbial community during the 29-day fermentation process.
  • The fragmented genomic DNA was individually tagged using multiplex identifier (MID) adaptors, which allowed for the automatic sorting of the pyrosequencing-derived sequencing reads based on MID adaptors. Pyrosequencing of the 10 tagged genomic DNA samples was performed by Macrogen using a 454 GS FLX Titanium system.

Deletion Hotspots in AMACR Promoter CpG Island Are cis-Regulatory Elements Controlling the Gene Expression in the Colon
Zhang X et al. PLoS Genet. 2009 January; 5(1): e1000334.
  • AMACR was first found overexpressed in prostate cancer but not in benign glands and is now an established diagnostic marker for prostate cancer. Authors’ findings identified novel transcription factors responsible for AMACR regulation.
  • Sequencing service was provided by Macrogen (Seoul, Korea) with BigDye terminator used in a 96-capillary 3730xl DNA analyzer. Bisulfite-sequencing PCR-targeting AMACR promoter CGI was conducted by nested PCR.

