Skip to content

Tag: programming

GenBank renaming
DNA inspired sculpture by Charles Jencks. Creative Commons photo by Maria Keays.

What is GenBank?

The GenBank sequence database is a widely used collection of nucleotide sequences and their protein translations. A GenBank sequence record file typically has a .gbk or .gb extension and is filled with plain text characters. A example of GenBank file can be found here.

Filename problem

Although there are several metadata are available inside a GenBank record the name of the file are not always in accordance with the content of the file. This is potentially a source of confusion to organize files and requires an additional effort to rename the files according to their content.

Approach using Biopython

The Biopython project is a mature open source international collaboration of volunteer developers, providing Python libraries for a wide range of bioinformatics problems. Among other tools, Biopython includes modules for reading and writing different sequence file formats including the GenBank’s record files.

Despite the fact that is possible to write a parser for GenBank’ files it would represent a redundant effort to develop and maintain such tool. Biopython can be delegated to perform parsing and focus the programming on renaming mechanism.

Biopython installation on Linux (Ubuntu 11.10) or Apple OS X (Lion)

For both Ubuntu 11.10 and OS X Lion, a modern version of Python already comes out of the box.

For Linux you just need to install the Biopython package. One method to install Biopython in a APT ready distribution as Ubuntu 11.10 (Oneiric Ocelot) is:

# apt-get install python-biopython

For an Apple OS X (Lion) you can install Biopython using easy_install, a popular package manager for the Python. Easy_install is bundled with Setuptools, a set of tools for Python.

To install the Setuptools download the .egg file for your python version (probably setuptools-0.6c11-py2.7.egg) and execute it as a Shell Script:

sudo sh setuptools-0.6c11-py2.7.egg

After this you already have easy_install in place and you can use it to install the Biopython library:

sudo easy_install -f biopython

For both operational systems you can test if you already have Biopython installed using the Python iterative terminal:

$ python
Python 2.7.2+ (default, Oct 4 2011, 20:03:08)
[GCC 4.6.1] on linux2
Type “help”, “copyright”, “credits” or “license” for more information.
>>> import Bio
>>> Bio.__version__

Automatic rename example through scripting

Below the Python source-code for a simple use of using Biopython to rename a Genbank file to it’s description after removing commas and spaces.

Using the the previous example of GenBank file, suppose you have a file called To rename this file to the GenBank description metadata inside it you can use the script.


And after this it will be called Hippopotamus_amphibius_mitochondrial_DNA_complete_genome.gbk.


There is plenty of room for improvement as:

  • Better command line parsing with optparse and parameterization of all possible configuration.
  • A graphical interface
  • Handle special cases such multiple sequences in a single GenBank file.

Python, flatten a list

Surprisingly python doesn’t have a shortcut for flatten a list (more generally a list of lists of lists of…).

I made a simple implementation that doesn’t use recursion and tries to be written clearly.

I get a element from a “notflat” list (a list that can have another lists on it). If a element is not a list we store in our flat list. If the element is still a list we deal with him later. The flat list always have only elements that are not a list.
To preserve the original order we reverse the elements at the end.

OpenPixels: simple sprite sheet with Processing

 * Openpixels example in Processing.
 * This simple example of how to get a sprite 
 * from a sprite sheet.
PImage bg;
PImage sprite_sheet;
PImage player;
void setup() { 
  // load images
  bg = loadImage("kitchen.png");
  sprite_sheet = loadImage("guy.png");
  /* The sprite size is 32x49.
     Look guy.png, the "stand position" is at (36,102). */
  player = createImage(32, 49, ARGB);
  player.copy(sprite_sheet, 36, 102, 32, 49, 0, 0, 32, 49);
  // set screen size and background
  size(bg.width, bg.height);  
void draw() { 
  image(player, 100, 50);

See more at OpenPixels.

Atiaia early releases

This was a project that me and Marco Diego created during our graduation for the Computer Graphics course. It is a ray tracing engine build from scratch in C. It was great exercise of experimentation on how implement object-oriented design patterns in ANSI C. Later Marco continued it in his master’s degree thesis implementing more features.

Parts of the sources were lost during a disk failure in the forge we hosted the project. I found some early releases and packed them here for future use. It can be useful for someone studying C or how to implement a ray tracer.


Enjoy it.

ps: with this project we won the 1st place project of class and maximum grade. ;)

Android screen height and width

Context ctx = getContext();
Display display = ((WindowManager)ctx.getSystemService(ctx.WINDOW_SERVICE)).getDefaultDisplay();
int width = display.getWidth();
int height = display.getHeight();

Yes, there are easier ways to retrieve the screen width on Android but there are cases that this long code is the only solution. You may already have the Context. WindowManager or the Display and so it would be smaller, but this code is more general.

PHP: array, all elements but first

$bric = array("Brazil", "Russia", "India", "China"); 
$ric = $bric; // array copy
$br = array_shift($ric); // left shift at $ric. $br stores "Brazil" 
print_r($bric); // $bric remains the same
print_r($ric); // $ric lost "Brazil"


    [0] => Brazil
    [1] => Russia
    [2] => India
    [3] => China
    [0] => Russia
    [1] => India
    [2] => China

Reference: PHP array_shift at