The human genome project promised us much, a future return on investment that promised the resolution of disease and the careful planning of future generations. The enormous financial and scientific endeavour started out with powerful suggestions about the human having millions, then hundreds of thousands of genes. The thought by many that we as a species would not be the greatest source of genes in the living planet, was a thought to incompatible with our natural predilection for greatness for any to contemplate. Time marches on and as greater data sets are collected we are faced with the somewhat challenging news that chickens, are hard on our heels and that some plants are way ahead of us!
Although the near-finished human genome sequence now covers 99% of the euchromatic (or gene-containing) genome at 99.999% accuracy, the exact number of human genes is still unknown.
The reality has been a little more sobering and remarkably the figure continues to receive clarifications, in part due to the increasing sophistication of analysis techniques and in part due to different standards set by different gene repositories and the discovery that different humans actually also have different gene sets.[1]
In order to count genes, we need to define what we mean by a ‘gene’, a term whose meaning has changed dramatically over the past century. For our discussion, we will restrict the definition of gene to a region of the genome that is transcribed into messenger RNA and translated into one or more proteins.
A review paper out in Genome Biology this year has selected a figure of 22,333 based on the current gene total held at NCBI – a conservative gene bank,[2] but suggest it could reduce further by as much as 1,000 or more.
All of which suggests a rather strange outcome in terms of genetic numbers – it appears they say that after collating the information we as humans achieve the somewhat derisory position of being able to say we have more genes than a chicken, but less than a grape!
References
[1] Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci F, Hormozdiari F, Kitzman JO, Baker C, Malig M, Mutlu O, Sahinalp SC, Gibbs RA, Eichler EE: Personalized copy number and segmental duplication maps using next-generation sequencing. Nat Genet 2009, 41:1061-1067 View Abstract
[2] Pruitt KD, Tatusova T, Klimke W, Maglott DR: NCBI Reference Sequences: current status, policy and new initiatives. Nucleic Acids Res 2009, (37 Database):D32-D36 View Full Paper