Skip Navigation



International Immunology Advance Access published online on October 27, 2007

International Immunology, doi:10.1093/intimm/dxm109
This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
19/12/1361    most recent
dxm109v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Rannikko, K.
Right arrow Articles by Vihinen, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rannikko, K.
Right arrow Articles by Vihinen, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Japanese Society for Immunology. 2007. All rights reserved. For permissions, please e-mail: journals.permissions@oxfordjournals.org

Immunity genes and their orthologs: a multi-species database

Kathryn Rannikko1, Csaba Ortutay1 and Mauno Vihinen1,2

1 Bioinformatics Research Group, Institute of Medical Technology, FI-33014 University of Tampere, Finland
2 Research Unit, Tampere University Hospital, FI-33520 Tampere, Finland

Correspondence to: Correspondence to: M. Vihinen; E-mail: mauno.vihinen{at}uta.fi


    Abstract
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary data
 Funding
 References
 
Metazoan species, from sponges to insects and mammals, possess successful defence systems against their pathogens and parasites. The evolutionary origins of these diverse systems are beginning to be more comprehensively investigated and mapped out. We have collected 1811 metazoan immunity genes from literature and gene ontology annotations. Tentative orthologs of these genes were identified using reciprocal protein–protein Blast searches against proteins from the GenBank and RefSeq databases. We have defined different levels or classes of ortholog group according to the order of reciprocal ortholog pairs among the seed immunity genes. The genes were clustered into these different ortholog groups. Initial phylogenetic analysis of these ortholog groups suggests that by this approach, we can collect a spectrum of immunity genes representing well the taxa in which they appear. All the immunity genes and their evidence of immune function, orthologs and ortholog groups have been combined into an open access database—ImmunomeBase, which is publicly available from http://bioinf.uta.fi/ImmunomeBase.

Keywords: bioinformatics, human immune system, immunome, ImmunomeBase


    Introduction
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary data
 Funding
 References
 
Species across all taxa are exposed to a variety of pathogenic bacteria, viruses, fungi and parasites, as well as self-cells that have escaped regulation. The evolutionary survival of these organisms suggests that they have highly successful immune and defence systems that have evolved at least at the same rate as the pathogens themselves have evolved. The traditional dogma of immunology is that vertebrate species possess both adaptive and innate immunity, whereas invertebrate species only possess innate immunity. However, recently a paradigm shift has been proposed towards a more complex picture of immunity, whereby different methods of adaptive immunity may have evolved simultaneously, of which the vertebrate adaptive immune system, mediated by recombination–activation genes (RAG) 1 and 2, is just one (1). This implies that there is still a lot about immune systems that are unknown or unclear—especially outside of Mammalia.

Traditional, RAG-mediated adaptive immunity involving the recombination of genes to create an almost limitless number of Ig-containing recognition receptors is limited to the jawed vertebrates (Gnathostomata) (2). In jawless fish and lampreys (Agnatha), variable lymphocyte receptors, created by the recombination of a variety of leucine-rich repeat regions, play a role in acquired immunity (3).

Invertebrates, while lacking traditional adaptive immunity, possess diverse immunity-type genes, which may ‘blur the distinction between purely innate and purely adaptive receptors’ (4). A family of IgV region-containing chitin-binding proteins have been identified both in the lancelet Branchiostoma floridae (5) and in the tunicate Ciona intestinalis (6). These are highly polymorphic (6) and are secreted into the intestine, where they have anti-microbial effects. The freshwater snail Biomphalaria glabrata has a diverse family of Ig-containing fibrinogen-related haemolymph proteins, of which some have been shown to be up-regulated in response to a pathogen (7). Down syndrome adhesion molecules (Dscams), first discovered in Drosophila melanogaster but now identified throughout insects, are a diverse family of Ig-containing proteins created by alternative splicing (8). The mosquito Anopheles gambiae has recently been shown to be able to produce >31 000 potential alternatively spliced forms of Dscam, which enables specificity of recognition and protection against bacteria and Plasmodium parasites (1, 9).

Innate immunity is more rapidly acting in comparison with adaptive immunity and is older and more evolutionarily conserved—indeed, it provides the backbone on which adaptive immunity was able to evolve (10). There are broad similarities in innate immunity across metazoan phyla, as well as striking differences—leading to a somewhat complicated picture of its evolution (11). Pattern recognition receptors recognize and bind to different microbial markers, for example, peptidoglycan recognition protein (12), gram-negative-binding protein (13) and ß-1,3-glucan binding protein (14). Binding to these receptors instigates a reaction, for example the toll-like receptor pathway, which results in an anti-pathogen response, e.g. the production of anti-microbial peptides or the formation of melanin. Some innate immunity pathways are limited to one branch of Metazoa, e.g. the pro-phenoloxidase (pro-PO)-activating system, the lectin complement system and the haemolymph coagulation system in invertebrates (15). Other pathways are more ubiquitous, e.g. the Myd88-mediated response against Gram-negative bacteria (16), which is even found in Porifera (sponges)—one of the earliest metazoan phyla. Indeed, Porifera have a surprisingly complex innate immune system (17).

Previously, we collected the genes and the proteins of the human immune system, which we called the essential human immunome (http://bioinf.uta.fi/Immunome) (18, 19). This included 847 human immunity genes and their orthologs. Non-mammalian metazoans have complex and different immune systems that warrant further investigation and systematic analysis. In order to achieve a more complete overview of metazoan immunity and to map the evolution of immunity in different phylogenetic branches, we have extended the approach used in the human immunome database to cover all metazoan organisms for which sequence information is available.

We have created a list of immunity genes from literature (i.e. textbooks and articles), gene ontologies (GOs) and PubMed, and used these as seed genes to search for orthologs. There already exist several ortholog databases that are publicly available, for example, Inparanoid (20), EGO (21), OrthoMCL (22, 23) and HomoloGene (24). However, none of these was deemed appropriate for our requirements, mainly because of limited species coverage. This is because they mostly use as their basis sequences from organisms with a complete genome sequence. Instead, we have used a reciprocal best-hit-based search approach to identify orthologs. We have defined different levels of ortholog groups and have clustered the genes accordingly into ortholog groups. ImmunomeBase, which is freely available at http://bioinf.uta.fi/ImmunomeBase, contains information about immunity genes, their orthologs and the ortholog groups.


    Materials and methods
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary data
 Funding
 References
 
An outline of the methods used for the ortholog data collection is summarized in Fig. 1.


Figure 1
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 1. Flow diagram summarizing methods used in creating the database of metazoan immunity.

 
Preparing a list of immunity proteins
Immunity proteins were defined as proteins or peptides whose predominant role is in the host immune system or defence response against a pathogen. Thus, general maintenance genes were not included in a similar way to the human immunome study (18). Genes, where there is recorded evidence of an immunity function, were obtained from model species and from those of interest in the evolution of immunity. These immunity genes are subsequently referred to as seed genes as they were used as seeds in the identification of orthologs. A list of core human immunity genes was obtained from the immunome database (18). Genes from other metazoan species were identified from literature (2528), GO terms (29), and keyword searches in PubMed (see Supplementary Table S1, available at International Immunology Online.). Non-human mammalian genes were limited to genes from Mus musculus. A list of D. melanogaster genes was obtained from FlyBase (30). Vertebrate genes with hyper-variable regions and fragments of genes, such as Igs, TCRs and the MHC, were excluded because they do not have complete genomic genes and because they are already well covered, for example within the international ImMunoGeneTics information system from the European Bioinformatics Institute (31) and Centre National de la Recherche Scientifique (32).

Where available, NCBI Entrez Gene ids were used as gene identifiers; otherwise, Entrez Protein accessions were used. Amino acid sequences were obtained from the Entrez Protein database.

Identifying orthologs
Reciprocal protein Blast (33) searches were performed to identify orthologs of the seed genes. Using modules from Bioperl (34) and proprietary Perl scripts, each immunity seed gene was used to perform Blast searches against the NCBI's pre-formatted downloadable non-redundant Entrez Protein (released August 29, 2006) and RefSeq (released August 31, 2006) (35) databases (maximum E-value = 10–5). The top hit from each of the species in the top 150 hits was used in a reciprocal species-specific Blast search against the seed gene species. An orthologous pair of genes was identified if the original seed gene was the top hit of the reciprocal species-specific Blast. Pairwise Blast (bl2seq) (36) scores were calculated for each ortholog pair. This took several weeks of CPU time to run on our cluster of 40 nodes. Where no ortholog was identified for a human gene in Pan troglodytes (chimpanzee), the gene was used to search in a chimpanzee-specific Blastp for the missing ortholog. Additional information about genes and taxa was acquired using NCBI's Entrez Programming Utilities.

Clustering orthologs
We defined three distinct levels of ortholog groups depending on the number of reciprocal relationships required (Fig. 2). For the loosest definition (level 1) if two seed genes form an orthologous pair, they were clustered together into an ortholog group, along with their non-seed orthologs. Universal reciprocality between all seed genes is not required and non-seed orthologs may be present in more than one ortholog group.


Figure 2
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 2. Example of resolving an ortholog group from level 1 to level 2. Seed genes from the same level-1 ortholog group are represented by white circles and are connected by lines representing the reciprocal orthologous relationships between genes. Solid lines are confirmed relationships within the level-2 ortholog group; dashed are potential relationships that need resolving. The value on the line is the score from the bl2seq sequence alignment of the two genes. Non-seed genes are excluded from the ortholog group in (A), (B) and (C). In (D), the non-seed gene is represented by a solid black circle. The initials represent different genes, all genes are IL-1ß unless otherwise stated. CCA = Cyprinus carpio, IL-1ß 2-1, CAC19887; CCB = Cyprinus carpio, BAA24538; DR = Danio rerio, NP_998009; HS = Home sapiens, NP_000567; MM = Mus musculus, NP_032387; OM = Oncorhynchus mykiss, CAC83518; SS = Salmo salar, AAT36642. (A) Reciprocal orthologous relationships between seed genes in group 22 (level 1). This group is unresolved to level 2 as not all the seed genes form reciprocal orthologous relationships with each other. There are two possible ways to resolve this group. The first (B) splits the group into two: DR-OM-CCB and HS-MM-CCA. The second possibility (C) is to split into three groups: HS-MM-DR, OM-CCB and CCA. The groups are resolved by comparing the pairwise Blast scores for DR. The highest score is for DR-CCB (265), so DR belongs to the OM-CCB level-2 group (B). (D) Non-seed genes that have reciprocal relationships with immunity genes in more than one group are resolved by comparing bl2seq scores. In this case, SS-CCB has the highest score (129).

 
Level-1n ortholog groups were then split by the stricter level-2 definition, whereby all the seed genes need to form reciprocal relationships with each other. If any pairs of seed genes did not have an orthologous reciprocal relationship, the missing relationships were checked using a reciprocal species-specific remote Blastp. If the level-1 group needed to be split to achieve the level-2 definition and if there was more than one way of doing this, the ambiguity was resolved according to the bl2seq pairwise alignment scores and by manual checking of the groups. Non-seed orthologs that were duplicated between more than one ortholog group were again resolved by comparison of the pairwise Blast bl2seq scores.

Database of immunity
Seed genes, their orthologs and lists of ortholog groups were deposited in a relational MySQL database. A web interface was designed for the database to provide a useful tool for the scientific community to obtain data about metazoan immunity proteins. This database was named ImmunomeBase and is available from http://bioinf.uta.fi/ImmunomeBase.

Phylogenetic analysis
Protein multiple sequence alignments were produced with ClustalW (37) for those level-1 and level-2 ortholog groups where at least three different sequences were present in the group. Using these alignments, maximum likelihood trees were calculated for 715 level-1 groups and 798 level-2 groups using the RAxML software (38) and are available online from the database.

Protein multiple sequence alignments were created using T-Coffee (39) for selected groups for case studies. Alignments were converted into Nexus format using a BioPerl (34) script and phylogenetic trees for case studies were constructed using PAUP* (40) using the Neighbour-Joining method and 1000 bootstrap replicates.


    Results
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary data
 Funding
 References
 
We aimed to investigate the evolution of immunological systems and genes by constructing a comprehensive database of essential immunity genes and their orthologs from a wide range of metazoan organisms. Seed genes were identified from a variety of sources where there is evidence of an immunity function. These were used as the starting point for identifying orthologs from other species.

The existing ortholog databases use as their sequence source a limited number of completed genomes. Many species of interest in the evolution of immunity are yet to have their genomes sequenced or fully annotated, and thus there is a bias in the genomes that are available. The main differences between the databases occur when the orthologs are clustered into groups and when paralogs are defined. For example, HomoloGene has a stricter rule defining ortholog groups than EGO; therefore, EGO groups contain more distinct species than in the HomoloGene groups. Groups in OrthoMCL contain a lot of paralogs; some groups can contain tens of sequences from the same organism (Table 3).


View this table:
[in this window]
[in a new window]

 
Table 3. Ortholog groups containing human protein {alpha}-2-macroglobulin (GeneID: 2; accession: NP_000005) from HomoloGene (group 37248), OrthoMCL (group OG1_329), EGO (group 997720), Inparanoid (clusters containing protein ENSP00000323929) and our database of immunity (level-2 group 47)

 
A similar definition of a reciprocal orthologous pair of genes is used as a starting point for the ortholog databases, ‘a pair of [sequences] in separate species in which the first member of the pair has as its best hit in the second species, the second member and the second member has as its best hit in the first species the first member’, Lee et al. (21). We used this definition of an orthologous pair to search for orthologs of the immunity seed genes. The seed genes and their orthologs were then clustered into two levels of ortholog group (Fig. 2). At the first level, level 1, orthologous pairs of seed genes and their non-seed orthologs are grouped together. Seed genes within the same group do not need to form orthologous relationships with each other and non-seed orthologs may be present in more than one ortholog group (e.g. they are orthologs of more than one seed gene). The level-2 groups contain seed genes that are all orthologs of each other and non-seed orthologs that can only belong to one ortholog group (the one with the seed gene with the highest pairwise match).

Database of immunity
ImmunomeBase is a multi-species database of immunity that contains metazoan seed genes, their orthologs and the ortholog groups, and it is publicly available and searchable at http://bioinf.uta.fi/ImmunomeBase. The database contains information about the evidence of an immunity function for the seed genes in the form of journal citations and GO terms, as well as links to the entries in the primary databases (EntrezGene, GenBank, PubMed, etc.). Users can search for a gene by name, species, Entrez Protein accession or Entrez Gene id. The search results page contains a list of genes, both seed genes and non-seed genes, which match the search query. By clicking on one of the genes, the user is directed to the main page for that gene. For seed genes, the main page shows journal citations and GO terms for the gene, followed by a list of orthologs of the gene. For each ortholog, the bl2seq pairwise score of the seed gene and the ortholog is given, as well as the ortholog groups to which it belongs. There are links to both the level-1 and level-2 groups that the seed gene belongs to. The main page for a non-seed gene presents a list of seed genes that the non-seed gene forms an orthologous relationship with and links to the main pages of these genes. Each ortholog group has a page that displays a list of seed and non-seed genes belonging to that group, with links to the main pages for each gene.

The database contains 1811 metazoan seed genes and 10 333 non-seed orthologs (Table 1). These are clustered into the ortholog groups we defined. The level-1 groups are the loosest and may contain paralogs of seed genes. Non-seed orthologs may belong to more than one level-1 group. Level-1 groups are resolved to level 2 so that all the seed genes within a group form reciprocal orthologous relationships with each other and non-seed genes can only belong to one group. The database contains 1285 level-1 ortholog groups, of which 1046 contain more than one protein accession. These have been resolved to 1395 level-2 ortholog groups, of which 1134 contain more than one protein accession (Table 2).


View this table:
[in this window]
[in a new window]

 
Table 1. Coverage of a selection of organisms and phyla in ImmunomeBase

 

View this table:
[in this window]
[in a new window]

 
Table 2. Number of ortholog groups in the immunome database

 
Ortholog groups
Fifty-five of the level-1 ortholog groups contain three or more seed genes and thus had to be resolved to level 2. Some of these had to be split into smaller groups to obey the level-2 definition due to non-universal reciprocality between the seed genes. This was done manually by comparison of the bit scores from the pairwise sequence alignments. For example, the level-1 group containing IL-1ß from fish and mammals (group 22) had to be resolved to level 2 (Fig. 2). The reason for this is that at level 2, all the seed genes need to form reciprocal orthologous relationships with each other, and this is not the case with group 22 (Fig. 2A). This definition also requires that only one seed gene from each species be in each group and there are two genes for IL-1ß from Cyprinus carpio (carp) (41). There are two possible solutions for resolving this group. The first (Fig. 2B) splits the level-1 group into two, with the gene from zebra fish clustering with the trout and one of the carp genes and the second carp gene clustering with the genes from human and mouse. The second possible grouping (Fig. 2C) places the zebra fish gene with the human and mouse genes, the first carp gene again with the trout gene and the second carp gene in a group on its own. The groupings were resolved by comparing the bl2seq pairwise Blast scores from the zebra fish. The highest score was with the first carp gene, so the first grouping was accepted (Fig. 2B). The pattern of reciprocal relationships would suggest that an ortholog from Oncorhynchus mykiss (rainbow trout) is missing, and indeed, when the non-seed orthologs are added to the group, an additional rainbow trout IL-1ß gene joins the human–mouse group (group 142).

There are 1395 level-2 groups, of which 1147 contain more than one gene; 133 groups contain more than one gene and no mammalian genes (Table 2). As a case study, we constructed Neighbour-Joining phylogenetic trees for level-2 ortholog groups 131 and 1063 (Fig. 3). These contain the orthologs of pro-PO seed genes from Bombyx mori (silk worm) and A. gambiae (mosquito) (Fig. 3A) and the orthologs of invertebrate lysozyme (lysozyme-i) from Asterias rubens (starfish) (42) and Suberites domuncula (sponge) (43) (Fig. 3B). The pro-POI orthologs include haemocyanin genes from spiders, horseshoe crab and centipede. All the orthologs are restricted to Arthropoda, which reflects the gene's role in the melanization pathway of innate immunity (44). This is an integral defence mechanism whereby melanin is deposited at the site of an invasion and the organism is shielded from the invading pathogen. The orthologs of lysozyme-i are spread throughout the invertebrate phyla in nematode worms, molluscs, segmented worms, insects, echinoderms and crustaceans. The seed genes from the two groups were used to search against Vertebrata genes using Blastp and no significant hits were found, thus confirming that these are true invertebrate ortholog groups.


Figure 3
View larger version (24K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Fig. 3. Case study: phylogenetic analysis of two invertebrate ortholog groups. Phylogenetic trees were constructed for (A) level-2 group 131 (arthropod pro-POI) contains the seed genes: NP_001037335 Bombyx mori 1; NP_001037336 Bombyx mori 2; XP_312089 Anopheles gambiae and their non-seed orthologs. Colour coding: red Chilopoda (centipedes—outgroup); pink Crustacea (crustaceans); grey Xiphosura (horseshoe crabs); blue Arachnida (spiders); orange Hymenoptera (bees and wasps); brown Lepidoptera (moths); green Diptera (flies) and yellow Coleoptera (beetles); asterisk indicates orthologs that are haemocyanin genes. (B) Level-2 group 1063 (invertebrate lysozyme) contains the seed genes: AAR29291 Asterias rubens; CAG27844 Suberites domuncula; and their non-seed orthologs. Colour coding: blue Porifera (sponges—outgroup); green Echinodermata; orange Crustacea; grey Insecta; brown Annelida (segmented worms); yellow Mollusca and pink Nematoda; both trees were constructed using Neighbour-Joining method and 1000 bootstrap replicates. Bootstrap values >50% are indicated below the nodes. Seed genes are underlined.

 
A comparison of the Neighbour-Joining trees prepared for the case study with the maximum likelihood trees (available from the ImmunomeBase Web site) reveals that the overall structure of the trees are very similar; however, the branch lengths of the maximum likelihood trees are useful in indicating the distances between the sequences.


    Discussion
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary data
 Funding
 References
 
We have created the first dedicated and comprehensive database for metazoan immunity and evolution, which we have named ImmunomeBase. A total of 1811 immunity genes have been collected from a wide range of metazoan species and systematic analysis has been performed by means of reciprocal protein–protein blasts to identify potential orthologs. These orthologs have then been arranged into groups according to our determined criteria or levels, which vary according to the extent of orthologous relationships between the seed genes within the group.

All the seed genes from mammals are from mouse and human. These two species were chosen as the gene annotations are most complete when compared with other mammals. This explains the anomaly in Table 1, where P. troglodytes (chimpanzee) does not have any seed genes and yet has >800 non-seed genes. Incidentally, the majority of these genes comes as orthologs of the human seed genes. We hypothesize that, given the near-identical nature of the human and chimpanzee genome, the missing orthologs are a result of the chimpanzee proteins in the NCBI database being incomplete.

The coverage of anti-microbial peptides is incomplete in our database and these proteins could be considered a special case. There is already a database dedicated to this class of protein (45) and even one just for the diverse family of penaeidins that come from the shrimp Litopenaeus vannamei (46). Difficulties occur in determining their phylogeny because often the peptides arise from the post-translational cleavage of a longer pre-peptide into multiple shorter peptides (47). Also, they can be very short in length, making the Blast algorithm difficult to use.

Our invertebrate case studies (Fig. 3) show that by taking a few non-mammalian seed genes and using our basic approach to identifying orthologs, it is possible to construct ortholog groups that show good coverage of the phylogeny of that gene. While the phylogeny is necessarily incomplete for any given gene, the ortholog group in the database still presents a good starting point for more comprehensive phylogenetic study. For example, if a gene in ImmunomeBase does not have any mammalian orthologs present, this can be confirmed by performing a manual reciprocal protein Blast search against mammalian genes from the NCBI's online Blastp interface.

A comparison of ortholog groups from different databases containing the human protein {alpha}-2-macroglobulin (Table 3) shows that while the group from OrthoMCL contains the most protein sequences, 78, ImmunomeBase contains the most species, 22. The HomoloGene group contains the least sequences, 6, and only has proteins from mammals and birds. Contrastingly, ImmunomeBase has species from mammals, birds, amphibians and arthropods. ImmunomeBase does not contain all the species found in the other databases. This is a result of our methodology; reciprocal blasts are limited to the top 150 hits from the protein–protein Blast against the whole database. Therefore, if the sequence from an organism does not occur in the top 150 hits, it will not be included in the database even if it does form a close orthologous relationship. In this way, the database does not provide an exhaustive list of orthologs; however, this approach does result in orthologs from a range of species being identified. The reason for our cap on the number of reciprocal blasts carried out is the processing time required.

Unlike most other ortholog databases, ImmunomeBase does not merely use sequences from species with completed genomes. While this means that sequences from a wider range of species may be included, this also causes limitations, in that as more sequences become available from different species, the ortholog pairs will need to be updated as new, better matches are made between species. For this reason, it might be prudent to call our orthologs ‘tentative orthologs’ as in the EGO database (21). In any case, where there are paralogs it might be considered impossible to identify the true orthologous pair. The database will be regularly updated and the ortholog groups will be re-evaluated when new sequence information is available.

We have not attempted to define or identify paralogs, although in some instances the presence of these in both species may be the reason for there not being an orthogous pair identified from those species. The Inparanoid database (20) makes an attempt to identify and classify paralogs as in- and out-paralogs depending on whether the duplication event occurred before or after speciation.

We have defined two levels of ortholog group (Fig. 2). A further level, level 3, represents our idealized definition of an ortholog group—where every group member, both seeds and their non-seed orthologs, forms a reciprocal orthologous relationship with every other member. However, this exhaustive search would be very difficult, if not impossible, to implement because of the incomplete presence of annotated genomes and the presence of paralogs in the data set.


    Supplementary data
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary data
 Funding
 References
 
Supplementary data are available at International Immunology Online.


    Funding
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary data
 Funding
 References
 
Medical Research fund of Tampere University Hospital.


    Abbreviations
 
GO, gene ontology
pro-PO, pro-phenoloxidase
RAG, recombination–activation genes

    Notes
 
Transmitting editor: A. Falus

Received 19 June 2007, accepted 29 September 2007.


    References
 Top
 Abstract
 Introduction
 Materials and methods
 Results
 Discussion
 Supplementary data
 Funding
 References
 

  1. Kurtz J, Armitage SA. Alternative adaptive immunity in invertebrates. Trends Immunol. (2006) 27:493.[CrossRef][Web of Science][Medline]
  2. Agrawal A, Eastman QM, Schatz DG. Transposition mediated by RAG1 and RAG2 and its implications for the evolution of the immune system. Nature (1998) 394:744.[CrossRef][Medline]
  3. Pancer Z, Cooper MD. The evolution of adaptive immunity. Annu. Rev. Immunol. (2006) 24:497.[CrossRef][Web of Science][Medline]
  4. Litman GW, Cannon JP, Dishaw LJ. Reconstructing immune phylogeny: new perspectives. Nat. Rev. Immunol. (2005) 5:866.[CrossRef][Web of Science][Medline]
  5. Cannon JP, Haire RN, Litman GW. Identification of diversified genes that contain immunoglobulin-like variable regions in a protochordate. Nat. Immunol. (2002) 3:1200.[CrossRef][Web of Science][Medline]
  6. Cannon JP, Haire RN, Schnitker N, Mueller MG, Litman GW. Individual protochordates have unique immune-type receptor repertoires. Curr. Biol. (2004) 14:R465.[CrossRef][Web of Science][Medline]
  7. Adema CM, Hertel LA, Miller RD, Loker ES. A family of fibrinogen-related proteins that precipitates parasite-derived molecules is produced by an invertebrate after infection. Proc. Natl Acad. Sci. USA (1997) 94:8691.[Abstract/Free Full Text]
  8. Watson FL, Püttmann-Holgado R, Thomas F, et al. Extensive diversity of Ig-superfamily proteins in the immune system of insects. Science (2005) 309:1874.[Abstract/Free Full Text]
  9. Dong Y, Taylor HE, Dimopoulos G. AgDscam, a hypervariable immunoglobulin domain-containing receptor of the Anopheles gambiae innate immune system. PLoS Biol. (2006) 4:e229.[CrossRef][Medline]
  10. Kimbrell DA, Beutler B. The evolution and genetics of innate immunity. Nat. Rev. Genet. (2001) 2:256.[CrossRef][Web of Science][Medline]
  11. Martinelli C, Reichhart JM. Evolution and integration of innate immune systems from fruit flies to man: lessons and questions. J. Endotoxin Res. (2005) 11:243.[Medline]
  12. Steiner H. Peptidoglycan recognition proteins: on and off switches for innate immunity. Immunol. Rev. (2004) 198:83.[CrossRef][Web of Science][Medline]
  13. Lee WJ, Lee JD, Kravchenko VV, Ulevitch RJ, Brey PT. Purification and molecular cloning of an inducible Gram-negative bacteria-binding protein from the silkworm, Bombyx mori. Proc. Natl Acad. Sci. USA (1996) 93:7888.[Abstract/Free Full Text]
  14. Johansson MW, Soderhall K. The prophenoloxidase activating system and associated proteins in invertebrates. Prog. Mol. Subcell Biol. (1996) 15:46.[Medline]
  15. Iwanaga S, Lee BL. Recent advances in the innate immunity of invertebrate animals. J. Biochem. Mol. Biol. (2005) 38:128.[Web of Science][Medline]
  16. Wiens M, Korzhev M, Krasko A, et al. Innate immune defense of the sponge Suberites domuncula against bacteria involves a MyD88-dependent signaling pathway. Induction of a perforin-like molecule. J. Biol. Chem. (2005) 280:27949.[Abstract/Free Full Text]
  17. Müller WEG, Müller IM. Origin of the metazoan immune system: identification of the molecules and their functions in sponges. Integr. Comp. Biol. (2003) 43:281.[Abstract/Free Full Text]
  18. Ortutay C, Siermala M, Vihinen M. Molecular characterization of the immune system: emergence of proteins, processes, and domains. Immunogenetics (2007) 59:333.[CrossRef][Web of Science][Medline]
  19. Ortutay C, Vihinen M. Immunome: a reference set of genes and proteins for systems biology of the human immune system. Cell Immunol. (2007) 244:87.[CrossRef][Web of Science]
  20. O'Brien KP, Remm M, Sonnhammer EL. Inparanoid: a comprehensive database of eukaryotic orthologs. Nucleic Acids Res. (2005) 33:D476.[Abstract/Free Full Text]
  21. Lee Y, Sultana R, Pertea G, et al. Cross-referencing eukaryotic genomes: TIGR Orthologous Gene Alignments (TOGA). Genome Res. (2002) 12:493.[Abstract/Free Full Text]
  22. Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. (2003) 13:2178.[Abstract/Free Full Text]
  23. Chen F, Mackey AJ, Stoeckert CJ Jr, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. (2006) 34:D363.[Abstract/Free Full Text]
  24. Wheeler DL, Barrett T, Benson DA, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. (2006) 34:D173.[Abstract/Free Full Text]
  25. Vetvicka V, Sima P. Evolutionary Mechanisms of Defense Reactions (1998) Basel: Birkhauser Verlag.
  26. Beck G, Sugumaran M, Cooper EL. Phylogenetic Perspectives on the Vertebrate Immune System (2001) New York: Kluwer Academic/Plenum Publishers.
  27. Warr GW, Cohen N. Phylogenesis of Immune Functions (1991) Boston: CRC Press.
  28. Beck G, Cooper EL, Habicht GS, Marchalonis JJ. Primordial. immunity: foundations for the vertebrate immune system. Ann. NY Acad. Sci. (1994) 712.
  29. Ashburner M, Ball CA, Blake JA, et al. Gene ontology: tool for the unification of biology. Nat. Genet. (2000) 25:25.[CrossRef][Web of Science][Medline]
  30. Grumbling G, Strelets V. FlyBase: anatomical data, images and queries. Nucleic Acids Res. (2006) 34:D484.[Abstract/Free Full Text]
  31. Robinson J, Waller MJ, Fail SC, Marsh SG. The IMGT/HLA and IPD databases. Hum. Mutat. (2006) 27:1192.[CrossRef][Web of Science][Medline]
  32. Lefranc MP, Giudicelli V, Kaas Q, et al. IMGT, the international ImMunoGeneTics information system. Nucleic Acids Res. (2005) 33:D593.[Abstract/Free Full Text]
  33. Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. (1997) 25:3389.[Abstract/Free Full Text]
  34. Stajich JE, Block D, Boulez K, et al. The Bioperl toolkit: Perl modules for the life sciences. Genome Res. (2002) 12:1611.[Abstract/Free Full Text]
  35. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. (2007) 35:D61.[Abstract/Free Full Text]
  36. Tatusova TA, Madden TL. BLAST 2 sequences, a new tool for comparing protein and nucleotide sequences. FEMS Microbiol. Lett. (1999) 174:247.[CrossRef][Web of Science][Medline]
  37. Higgins D, Thompson J, Gibson T. CLUSTALW: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673.[Abstract/Free Full Text]
  38. Stamatakis A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics (2006) 22:2688.[Abstract/Free Full Text]
  39. Notredame C, Higgins DG, Heringa J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. (2000) 302:205.[CrossRef][Web of Science][Medline]
  40. Swofford D. PAUP*: phylogenetic analysis using parsimony (*and other methods) 4.0 beta. (2003) Sunderland, MA: Sinauer Associates.
  41. Engelsma MY, Stet RJ, Saeij JP, Verburg-van Kemenade BM. Differential expression and haplotypic variation of two interleukin-1ß genes in the common carp (Cyprinus carpio L.). Cytokine (2003) 22:21.[CrossRef][Web of Science][Medline]
  42. Bachali S, Bailly X, Jollès J, Jollès P, Deutsch JS. The lysozyme of the starfish Asterias rubens. A paradygmatic type i lysozyme. Eur. J. Biochem. (2004) 271:237.[Web of Science][Medline]
  43. Thakur NL, Perovic-Ottstadt S, Batel R, et al. Innate immune defense of the sponge Suberites domuncula against Gram-positive bacteria: induction of the lysozyme and AdaPTin. Mar. Biol. (2005) 146:271.[CrossRef]
  44. Cerenius L, Söderhäll K. The prophenoloxidase-activating system in invertebrates. Immunol. Rev. (2004) 198:116.[CrossRef][Web of Science][Medline]
  45. Wang Z, Wang G. APD: the Antimicrobial Peptide Database. Nucleic Acids Res. (2004) 32:D590.[Abstract/Free Full Text]
  46. Gueguen Y, Garnier J, Robert L, et al. PenBase, the shrimp antimicrobial peptide penaeidin database: sequence-based classification and recommended nomenclature. Dev. Comp. Immunol. (2006) 30:283.[CrossRef][Web of Science][Medline]
  47. Boman HG. Antibacterial peptides: basic facts and emerging concepts. J. Intern. Med. (2003) 254:197.[CrossRef][Web of Science][Medline]

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Supplementary Data
Right arrow All Versions of this Article:
19/12/1361    most recent
dxm109v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Google Scholar
Right arrow Articles by Rannikko, K.
Right arrow Articles by Vihinen, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rannikko, K.
Right arrow Articles by Vihinen, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?