|
Information
Genome sequencing of Populus
The US Department of Energy has provided resources to obtain the full
genome sequence of Populus trichocarpa. The tree that will be used is
named Nisqually-1 after a river near Seattle where it originally was
collected. The Joint Genome Institute (JGI) have performed
full genome shot-gun sequencing to over 7 x coverage. The
assembly, annotation and analysis of the genome sequence is carried out
of a group of institutions, JGI, The Oak Ridge National
Laboratory, UPSC, Genome Canada and a few others. The release of the
annotated sequence during 2004 will present a tremendous resource for
tree researchers worldwide.
The Populus array
Our
first microarray (the wood array) with Populus cDNA clones consisted of
2995 clones from a library created from RNA prepared from the
wood-forming zone. The second array produced, contained 2667 clones
from the leaf library.
The first generation of a global Populus array contains in addition
clones from other tissues, in total 13 488 clones, is based on the
Unigene set we obtained after sequencing of about 35 000 clones. This, POP1,
was ready in 2001. Over 2000 slides were produced and used in over 25
different biological experiments.
The second generation of the global Populus arrays, POP2,
is now in production. It is based on the Unigene set extracted from the
analysis of over 100 000 ESTs. To our knowledge, no single academic
project has independent of others produced a more comprehensive
microarray. We have ourselves sequenced all the ESTs, performed all the
bioinformatics and performed every step in the production of the
Populus arrays, that the researchers in the project now have the full
benefit from.
The Database
This database is built from 121 495 populus EST-sequences from 19 cDNA
librarys.
Sequences that are similar are grouped together into clusters (11891).
A cluster should represent sequences coming from a specific gene. Each
cluster has been given an annotation based on best arabidopsis hit and
sometimes best swissprot hit.
Functional classes are assigned from automaticly derived arabidopsis
classifications. Within each cluster sequences may be further grouped
into contigs that show very high similarity.
There is a consensus sequence for each contig. Contigs may represent
species variants, splice variants etc. Clustered sequences that do not
belong to a contig are refered to as singlets. Sequences that do not
belong to a cluster are refered to as Singletons (12767) and their
annotation is only based on there own sequence. Clusters + singletons
should represent a UNIGENE set. Some clones have a PU number. This
referes to a DNA preparation spotted on a microarray. The
"Re-sequenced" sequences comes from this DNA preparation.
Sometimes clones may not have a unique annotation. There is always a
main annotation for each PU number. The details on how an annotation
was choosen can be viewed by clicking "show details".
The dataset is described in the publication:
Sterky F, Bhalerao RR, Unneberg
P, Segerman B, Nilsson P, Brunner AM, Campaa L, Jonsson Lindvall J,
Tandre K, Strauss SH, Sundberg B, Gustafsson P, Uhlen M, Bhalerao RP,
Nilsson O, Sandberg G, Karlsson J, Lundeberg J, Jansson S (2004)
A
Populus EST resource for plant functional genomics.
Proc Natl Acad Sci U S A. 2004 Sep 21;101(38):13951-6
Abstract
Full
text
|
|