Introduction
Algae bioinformatics, as the name suggests is the application of information technology to decipher more algae with the aid of computational tools and softwares. This emerging field, just like the other streams of bioinformatics employs computational or dry lab techniques for applications in various areas such as gene prediction, comparative genomics, genome analysis, and functional genomics and so on.
Algae bioinformatics cannot be performed without computer science, biology and genetics with a good-sized dollop of mathematics, statistics and other medical specialties thrown into the mix. There are loads of tools which are available for algae bioinformatics research, which have been briefly discussed at the very end of this article.
Scope of algae bioinformatics
Algae bioinformatics will be very useful for geneticists, phycologists and other researchers who sequence algal genomes as a part of their wet lab research. Hence, there is a growing need for sophisticated, computerized approaches for analyzing these databringing in a requirement of an algae bioinformaticist.
Algae bioinformatics will therefore mainly involve:
- The development of new algorithms and statistics with which relationships among the different algal species can be assessed.
- The analysis and interpretation of various types of data including DNA, RNA and protein sequences and structures.
- The advancement of various tools that enable efficient access and management of different types of information with regard to algae.
Role of algae bioinformatics
As discussed earlier, algae bioinformatics will only involve dry- lab research and it requires the use of RNA, DNA, protein sequence data. A series of wet-lab work is done so as to obtain the sequencing data and is subjected to bioinformatics analysis.
Steps involved in obtaining the data for analysis using bioinformatics
- Design of primers
- Extraction of DNA, RNA
- PCR amplification
- Denaturizing gradient gel electrophoresis.
- Sequencing
First, DNA and RNA are extracted from algae samples taken from the evaporation ponds to design primers. Subsequently, PCR amplification is performed. Since the products of the PCR reactions are similar in size, a denaturing gradient gel electrophoresis (DGGE) is performed, visualizing differences in the sequence of the products. It is after this step that algae bioinformatics work is performed. This might include creation of phylogenetic trees, BLAST searches for similar sequences, and annotation of the new sequences.
Applications of algae bioinformatics
- Assembly with known algal genes
- Phylogenetics analysis
- BLAST search for homologues
Assembly with known algal genes:
The first step after the gene information is obtained is to assign scores for identifying the inconsistencies in the algae gene, i.e., we can identify places where there are mutations such as insertions, deletions etc. enabling us to draw a comparison between two algal genomes. The gene annotations on the sequences are used during the assembly process to determine where the genes start and end. The scores can also be graphically represented along with the sequence information. Inconsistencies are highlighted and browsed through manually in order to see whether they are caused by sequencing errors or real differences in the sequence.
Phylogenetics analysis :
Phylogenetics analysis is the use of various bioinformatics tools to gain information on an organism's evolutionary relationships. The result of a molecular phylogenetic analysis is expressed in a phylogenetic tree.Phylogenetic methods can be used for many purposes, including analysis of morphological and several kinds of molecular data. We concentrate here on the analysis of DNA and protein sequences. This mainly includes:
- Comparisons of more than two sequences
- Analysis of gene families, including functional predictions
- Estimation of evolutionary relationships among different genera of the algae
The functional and the phylogenetic characterization of algae are usually done to determine the taxonomic patters amongst various algal species. For instance, a phylogenetics can be carried out to determine if there were taxonomic patterns of H2 production amongst green microalgae. A phylogenetic tree can be constructed using all available algal sequence data spanning ribosomal regions for any unicellular Chlorophyte. There are many species observed during this study which are capable of hydrogen production but algae bioinformatics techniques have still not been tried due to a current lack of sequence information.
The basic concepts of phylogenetic analysis with respect to algae bioinformatics are quite easy to understand, but understanding what the results of the analysis mean, and avoiding errors of analysis can be quite difficult.
BLAST search for homologues:
A major part of the work when analyzing the algal assembled sequences is performing BLAST searches at the NCBI database.
It is very useful as it helps to determine if your unknown algal sequence corresponds to a known (identified) sequence or if it is new data coding for new proteins. For example, the Chlamydomonas.reinhardtiigenome is available for free with the NCBI; it can be compared with an unknown sequence of any algal species. There are different kinds of Blasts according to what type of sequence has to be compared; for a transcript, it will be a Blast t; a Blast n if it’s nucleotides; a Blast p if it’s a protein.
The BLAST search in algae bioinformatics is generally used for functional and phylogenetic comparison of the sequenced organisms with known sequences from the database. The researcher examines the similarities between the sequences, and some of the sequences are downloaded and aligned with the data .For the conserved regions, annotations are transferred to the sequenced data, providing valuable information for the next step in the analysis which is the functional and phylogenetic characterization of the organism.
Therefore the three main steps involved in algae bioinformatics:
- Decreasing the background noise.
- Cleaning off the plasmid sequences (= pieces of sequences which aren’t part of the interesting DNA fragment that has been amplified).
- Then, once the sequence is clean, it is compared with others collected in worldwide databases.
Applications of algae bioinformatics
Algae, as you all know, are photosynthetic eukaryotic organisms that can be found in very different habitats (sea and freshwater, ice and snow and desert soils) showing an enormous variety of cell morphologies and life cycles. The applications of algae bioinformatics include sequence analysis, homology modelling, structural analysis and so on. One can then have a wet-lab component to experimentally verify the computational predictions obtained after the bioinformatics analysis.
Algae bioinformatics can be applied in the following fields:
- Homology and Similarity
- Protein Function Analysis
- Structural Analysis
- Sequence Analysis
- Functional characterization of organism.
Homology and Similarity - The term homology implies a common evolutionary relationship between two different algal species they can be DNA sequences or gene information of a high- oil producing algae. Homologous sequences are sequences that are related by divergence from a common ancestor. Thus the degree of similarity between two sequences can be measured while their homology is a case of being either true of false. This set of tools can be used to identify similarities between novel query sequences of unknown structure and function and database sequences whose structure and function have been elucidated.
Protein Function Analysis - This application of algae bioinformatics is used to determine the function or a role of a protein, which can be determined from the DNA sequence data. Function Analysis is identification and mapping of all functional elements (both coding and non-coding) in a genome. This group of programs allows a researcher to compare the protein sequence to the secondary (or derived) protein databases that contain information on motifs, signatures and protein domains. Highly significant hits against these different pattern databases allow you to approximate the biochemical function of your query algal protein.
Structural Analysis - This application allows researchers to compare structures with the known structure databases. The function of a protein is more directly a consequence of its structure rather than its sequence with structural homologs tending to share functions. The determination of a protein's 2D/3D structure is crucial in the study of its function. Structural analysis for hydrogen-producing algae has been extensively studied to decipher the function of these algal species.
Sequence Analysis - This set of tools allows you to carry out further, more detailed analysis on your query sequence including evolutionary analysis, identification of mutations, CpG islands and compositional biases. The identification of these and other biological properties are all clues that aid the search to elucidate the specific function of your sequence.
Evolutionary Analysis - An bioinformatics analysis method whill help researchers to
- Estimate a chromosomal evolutionary tree.
- Hypothesize the gene order and content of an ancestral species.
- Find the sequence of mutations that could have changed the gene order of one genome to that of another.
- Find the most likely sorts of mutation events that gave rise to a certain set of species.
Functional characterization of organism - This is an advanced application of algae bioinformatics done to learn more about the algae. It is generally done using programming languages such as R.
Phylogenetics and functional characterization of algae can be conducted using maximum likelihood (ML), maximum parsimony (MP) and Bayesian (B) approaches. This is the final step in algae bioinformatics and this helps a great deal in deciphering to which group the organism belongs and can also help a great deal in identifying its function as well.
Tools and softwares used in algae bioinformatics
The algae bioinformatics tools involve software programs for saving, retrieving and analysis of biological data to extract information from them. All algae bioinformatics tools and softwares must be designed in such a way that it is user friendly so that even a biologist with little computer knowledge can easily use it and it should be made available in the internet for people carrying out scientific research. There are some existing tools and open-softwares available for use for researchers.
Provided below is a list of softwares which can be used for algae bioinformatics research with their applications.
BLAST - The Basic Local Alignment Search Tool (BLAST) for comparing gene and protein sequences against others in public databases now comes in several types including PSI-BLAST, PHI-BLAST, and BLAST 2 sequences. Specialized BLASTs are also available for human, microbial, algae and other genomes, as well as for vector contamination, immunoglobulin, and tentative human consensus sequences.
FASTA - A database search tool used to compare a nucleotide or peptide sequence to a sequence database. The program is based on the rapid sequence algorithm described by Lipman and Pearson. It was the first widely used algorithm for database similarity searching. The program looks for optimal local alignments by scanning the sequence for small matches called "words". Initially, the scores of segments in which there are multiple word hits are calculated ("init1"). Later the scores of several segments may be summed to generate an "initn" score. An optimized alignment that includes gaps is shown in the output as "opt". The sensitivity and speed of the search are inversely related and controlled by the "k-tup" variable which specifies the size of a "word".
EMBOSS - EMBOSS (The European Molecular Biology Open Software Suite) is a free open source softwareanalysis package specially developed for the needs of the molecular biology user community. Within EMBOSS you will find around 100 programs (applications) for sequence alignment, database searching with sequence patterns, protein motif identification and domain analysis, nucleotide sequence pattern analysis, codon usage analysis for small genomes, and much more.
A list of applications that are included with the EMBOSS package can be found in http://www.hgmp.mrc.ac.uk/Software/EMBOSS/Apps/
Clustalw - ClustalW is a general purpose multiple sequence alignment program for DNA or proteins. It produces biologically meaningful multiple sequence alignments of divergent sequences, calculates the best match for the selected sequences, and lines them up so that the identities, similarities and differences can be seen.
RasMol - RasMol is a computer program written for molecular graphics visualization intended and used primarily for the depiction and exploration of biological macromolecule structures, such as those found in the Protein Data Bank. It is a powerful research tool to display the structure of DNA, proteins, and smaller molecules. Protein Explorer, a derivative of RasMol, is an easier to use program. For example, if we have a PDB code of algae, we can just enter it in Rasmol to obtain a 3D structure.
Specific algae bioinformatics softwares
Gene Prediction softwares
Genes in the eukaryotic genomic sequences can be predicted using softwares which can run on a web server or can be downloaded and run locally. There are loads of open source softwares which can be compiled for use in the computing platform of choice. There are some softwares which enable researchers to submit large sequence files and also allow protein homology information in the prediction. The results thus obtained will be displayed automatically in a genome browser. It also displays the input sequence and also enables the researcher to simultaneously display their own annotation. Some of the softwares even unable us to upload the cDNA sequences together with the genomic DNA.
The best example for algae gene prediction software is AUGUSTUS. AUGUSTUS usually belongs to the most accurate programs for the species it is trained for. Often it is the most accurate ab initio program.
For example, at the independent gene finder assessment (EGASP) on the human ENCODE regions AUGUSTUS was the most accurate gene finder among the tested ab initio programs. AUGUSTUS is retrainable. It comes with a training program that estimates the parameters given a training set of known genes. It also comes with an optimization script that tries to find values for the meta parameters, like splice window sizes, that optimize the prediction accuracy. Here is an example from the UCSC Genome Browser where the AUGUSTUS prediction incorporates mRNA alignments, EST alignments, conservation and other sources of information. AUGUSTUS can predict alternative splicing and alternative transcripts.
Sequence alignment softwares
SAM - a collection of flexible software tools for creating, refining, and using linear Hidden Markov Models for biological sequence analysis.
SeaView - a graphical multiple sequence alignment editor.
ShadyBox - the first GUI based multiple sequence alignment drawing program for Major Unix platforms
JAligner - a Java implementation of biological sequence alignment algorithms.
Comparitive genomics softwares
.VISTA - VISTA is a comprehensive suite of programs and databases for comparative analysis of genomic sequences
phyloXML- phyloXML is an XML language for the analysis, exchange, and storage of phylogenetic trees (or networks) and associated data.
Phylogenetics and evolution softwares
PhyloDraw - a drawing tool for creating phylogenetic trees
PHYLIP - is a free package of programs for inferring phylogenies
Other algae bioinformatics tools
There are a few databases important for algae bioinformatics researchers working on algae bioinformatics. A few of them have been highlighted below:
a. PLMItRNA database for mitochondrial tRNA molecules and genes in Viridiplantae(green plants) has been enlarged to include algae. The database now contains 436 genes and 16 tRNA entries relative to 25 higher plants, eight Green algae four Red Algae (Rhodophytae) and two Stramenopiles.
b. algaeBASE- AlgaeBase is a database of information on algae that includes terrestrial, marine and freshwater organisms.
c. UBC databases: Information from the UBC databases are used by researchers around the world to study DNA, species variation, plant chemistry, bioinformatics and other related fields.
Other open softwares –
Visualisation tools -
Programming aspects of algae bioinformatics
i BioPerl
The BioPerl project many modules for biological data processing. Algae bioinformatics requires the use of programming languages like Perl, Java, XML, R and so on.
PERL is a powerful and flexible language with a quick development cycle that makes it perfect for fast-paced and fluid problem domain Earlier, Perl was used widely in the field of bioinformatics for use in the processing of biological data.
Perl allows the rapid collection and analysis of data to answer directed questions such as how many genes exist in a specific chromosome. This is primarily due to the fact that, with little effort, Perl developers can quickly leverage the power of regular expressions and the large collection of bioinformatics-based modules.
Perl allows the rapid collection and analysis of data to answer directed questions such as how many genes exist in a specific chromosome. This is primarily due to the fact that, with little effort, Perl developers can quickly leverage the power of regular expressions and the large collection of bioinformatics-based modules.
ii BioJava
BioJava is an open-source project which provides a Javaframework for processing biological data. It provides analytical and statistical routines, parsers for common file formats and allows the manipulation of sequences and 3D structures. The goal of the biojava project is to facilitate rapid application development for bioinformatics. It can be widely applied in algae bioinformatics.
iii BioXML
A part of the BioPerl project, this is a resource to gather XML documentation, DTDs and XML aware tools for biology in one location.
iv Bioconductor packages
Other Bioinformatics packages such as the Bioconductor packages can be used to identify the transcription sites using R programming.
Algae bioinformatics citations
Clemens, S., Naumann, B. and Hippler, M. (2009) Proteomics of metal mediated protein dynamics in plants - iron and cadmium in the focus. Frontiers in Biosciences, 14, 1955-1969.
Stauber, E.J., Busch, A., Naumann, B., Svatos, A. and Hippler, M. (2009) Proteotypic profiling of LHCI from Chlamydomonas reinhardtii provides new insights into structure and function of the complex. Proteomics, 9, 398-408.
Rolland, N., Atteia, A., Decottignies, P., Garin, J., Hippler, M., Kreimer, G., Lemaire, S.D., Mittag, M. and Wagner, V. (2009) Chlamydomonas proteomics. Curr Opin Microbiol, 12, 285-291.
Ozawa, S.I., Nield, J., Terao, A., Stauber, E.J., Hippler, M., Koike, H., Rochaix, J.D. and Takahashi, Y. (2009) Biochemical and Structural Studies of the Large Ycf4-Photosystem I Assembly Complex of the Green Alga Chlamydomonas reinhardtii. Plants Cell.
Petroutsos, D., Terauchi, A.M., Busch, A., Hirschmann, I., Merchant, S.S., Finazzi, G. and Hippler, M. (2009) PGRL1 participates in iron-induced remodeling of the photosynthetic apparatus and in energymetabolism in Chlamydomonas reinhardtii. J Biol Chem,
Weinl, S., Held, K., Schlucking, K., Steinhorst, L., Kuhlgert, S., Hippler, M. and Kudla, J. (2008) A plastid protein crucial for Ca2+-regulated stomatal responses. New Phytol, 179, 675-686.
Busch, A., Rimbauld, B., Naumann, B., Rensch, S. and Hippler, M. (2008) Ferritin is required for rapid remodeling of the photosynthetic apparatus and minimizes photo-oxidative stress in response to iron availability in Chlamydomonas reinhardtii. Plant J, 55, 201-211.
Allmer, J., Kuhlgert, S. and Hippler, M. (2008) 2DB: a Proteomics database for storage, analysis, presentation, and retrieval of information from mass spectrometric experiments. BMC Bioinformatics, 9, 302.
Sarry, J.E., Chen, S., Collum, R.P., Liang, S., Peng, M., Lang, A., Naumann, B., Dzierszinski, F., Yuan, C.X., Hippler, M. and Rea, P.A. (2007) Analysis of the vacuolar luminal proteome of Saccharomyces cerevisiae. Febs J. 274: 4287-4305.
References
0 comments:
POST A COMMENT