A new method for EST clustering

Li da Zhang, De jun Yuan, Jian Wei Zhang, Shi Ping Wang, Qi Fa Zhang

Research output: Contribution to journalArticlepeer-review

8 Scopus citations


We developed an EST (expressed sequence tag) clustering method, ESTClustering, to generate high- quality unique expressed sequence based on large-scale EST sequencing. The method uses consensus sequences to sequence analyze with megablast and assemble each cluster with phrap in clustering process. The clustering strategy can efficiently identify gene family and alternate splicing forms of expressed sequences. It can also reduce the adverse effects caused by sequence errors. The ESTClustering method tends to provide more expressed gene forms comparing with the UniGene clustering method of the National Center for Biotechnology Information. Analysis of the 112 256 ESTs of Arabidopsis with ESTClustering produced 23 581 EST clusters. Among these Arabidopsis EST clusters, 13 597 have corresponding genome coding sequences and this number is close to the number of genes predicted with Arabidopsis ESTs. Using this clustering method, a total of 147 191 rice ESTs were clustered into 33 896 groups.

Original languageEnglish (US)
Pages (from-to)147-153
Number of pages7
JournalActa Genetica Sinica
Issue number2
StatePublished - Feb 1 2003
Externally publishedYes


  • Consensus sequence
  • EST clustering
  • Non-redundant cDNA library

ASJC Scopus subject areas

  • Molecular Biology
  • Genetics


Dive into the research topics of 'A new method for EST clustering'. Together they form a unique fingerprint.

Cite this