Perl is a widely used language in bioinformatics. As I already experimented Python and Biopython for handling a few simple bioinformatics tasks I will now try Perl and Bioperl.
Install on Ubuntu 11.10 (oneiric)
Perl already comes with Ubuntu. Bioperl can be installed (without CPAN):
$ sudo apt-get install bioperl
After the installation on have several tools in your PATH:
bp_aacomp, bp_biblio, bp_biofetch_genbank_proxy, bp_bioflat_index, bp_biogetseq, bp_blast2tree, bp_bulk_load_gff, bp_chaos_plot, bp_classify_hits_kingdom, bp_composite_LD, bp_das_server, bp_dbsplit, bp_download_query_genbank, bp_einfo, bp_extract_feature_seq, bp_fast_load_gff, bp_fastam9_to_table, bp_fetch, bp_filter_search, bp_flanks, bp_gccalc, bp_genbank2gff, bp_genbank2gff3, bp_generate_histogram, bp_heterogeneity_test, bp_hivq, bp_hmmer_to_table, bp_index, bp_load_gff, bp_local_taxonomydb_query, bp_make_mrna_protein, bp_mask_by_search, bp_meta_gff, bp_mrtrans, bp_mutate, bp_netinstall, bp_nexus2nh, bp_nrdb, bp_oligo_count, bp_pairwise_kaks, bp_parse_hmmsearch, bp_process_gadfly, bp_process_sgd, bp_process_wormbase, bp_query_entrez_taxa, bp_remote_blast, bp_revtrans-motif, bp_search2BSML, bp_search2alnblocks, bp_search2gff, bp_search2table, bp_search2tribe, bp_seq_length, bp_seqconvert, bp_seqfeature_delete, bp_seqfeature_gff3, bp_seqfeature_load, bp_seqret, bp_seqretsplit, bp_split_seq, bp_sreformat, bp_taxid4species, bp_taxonomy2tree, bp_translate_seq, bp_tree2pag, bp_unflatten_seq
You can try to import a Bioperl module to check if everything is working properly.
#!/bin/perl -w
use Bio::Seq;
Writing a nucleotide sequence to a FASTA file
#!/usr/bin/perl -w
use Bio::Seq;
use Bio::SeqIO;
$seq_obj = Bio::Seq->new(-seq => "gattaca",
-display_id => "#10191997",
-desc => "Example",
-alphabet => "dna" );
$seqio_obj = Bio::SeqIO->new(-file => '>sequence.fasta', -format => 'fasta' );
$seqio_obj->write_seq($seq_obj);
The output in the sequence.fasta created will be:
#10191997 Example
gattaca
Reading a Genbank file
Opening the same example I used last time (Hippopotamus amphibius mitochondrion, complete genome).
#!/usr/bin/perl -w
use Bio::Seq;
use Bio::SeqIO;
$seqio_obj = Bio::SeqIO->new(-file => "sequence.gb", -format => "genbank" );
while ($seq_obj = $seqio_obj->next_seq){
print $seq_obj->seq,"\n";
}
Online Querying Genbank
With Bioperl is possible to programmatically query and retrieve data directly from GenBank. For example, to retrieve the same mitochondrial genome from the Hippopotamus I used in the example above.
#!/usr/bin/perl -w
use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;
$query = "Hippopotamus amphibius[ORGN] AND NC_000889[LOCUS]";
$query_obj = Bio::DB::Query::GenBank->new(-db => 'nucleotide', -query => $query );
$gb_obj = Bio::DB::GenBank->new;
$stream_obj = $gb_obj->get_Stream_by_query($query_obj);
while ($seq_obj = $stream_obj->next_seq) {
print $seq_obj->display_id, "\t", $seq_obj->length, "\n";
}