Skip to content

Tag: Python

Extração de Dados e Fundos de Investimento do Banco do Brasil

Eu não achei onde coletar os dados diários de rentabilidade dos fundos de investimento do Banco do Brasil em formato bem estruturado.

Num mundo ideal as coisas seriam assim, você faria uma requisição numa url como esta:

http://bb.com.br/apps/rentabilidade?fundo=Siderurgia&saida=xml

E ele cuspiria um XML com as informações da rentabilidade diária desse fundo, isso se eu não especificasse através de outro parâmetro qual a data ou intervalo de datas desejado ou outro tipo de dados para saída como YAML ou JSON. Mas por enquanto não temos isso, nem unicórnios, então temos de fazer as coisas do jeito mais difícil, que é puxando os dados feitos para humanos e escrevendo um programa pra extrair à força os dados que desejamos e quem sabe usar eles para algum uso relacionado a mineração de dados.

A primeira abordagem que eu tentei foi a de criar um desses pequenos parsers XML que eu já mostrei como fazer antes, mas o código fonte desse documento se mostrou muito incompatível com o XML que o parser estava disposto a trabalhar. A solução alternativa foi tratar o documento linha a linha.

import urllib

# abrimos o documento referenciado pela url
url = 'http://www21.bb.com.br/portalbb/rentabilidade/index.jsp?tipo=01'
documento = urllib.urlopen(url)

# fundo de investimento que me interessa
fundo = 'small caps'

# estados
INICIO = 0
ACHOU_FUNDO = 1
FIM = 2

# estado inicial
estado = INICIO

# vamos analisar linha a linha do fluxo do documento
for linha in documento:
	# simplificamos, tudo pra minusculas
	linha = linha.lower()

	# no inicio, procura uma linha que tenha o fundo
	if estado == INICIO and linha.find(fundo) != -1:
		estado = ACHOU_FUNDO

	# depois, procuramos o proximo inicio de tabela html.
	# dessa linha, pegamos o que vem depois do primeiro >
	# e entao o que vem antes do primeiro <
	# e trocamos a virgula por ponto.
	elif estado == ACHOU_FUNDO and linha.find('>')[1].split('<')[0].replace(',','.')
		estado = FIM

E para usar:

$ python rendimento_small_caps.py
0.881

Geralmente estamos mais interessados em saber o valor da cota daquele fundo, daí podemos calcular o rendimento total sabendo a cota que compramos a ação inicialmente. Nesse caso o dado está na 11º coluna.

import urllib
 
# abrimos o documento referenciado pela url
url = 'http://www21.bb.com.br/portalbb/rentabilidade/index.jsp?tipo=01'
documento = urllib.urlopen(url)
 
# fundo de investimento que me interessa
fundo = 'small caps'
 
# estados
INICIO = 0
ACHOU_FUNDO = 1
FIM = 2
 
# estado inicial
estado = INICIO
coluna = 0
 
# vamos analisar linha a linha do fluxo do documento
for linha in documento:
	# simplificamos, tudo pra minusculas
	linha = linha.lower()
 
	# no inicio, procura uma linha que tenha o fundo
	if estado == INICIO and linha.find(fundo) != -1:
		estado = ACHOU_FUNDO
 
	# para cada coluna, conta a coluna, mas nao faz nada
	elif estado == ACHOU_FUNDO and linha.find('<'):
		coluna += 1
 
	# quando chegar na coluna onze, retira o conteudo entre os sinais > e <
	# e troca virgula por ponto, transforma em float e joga na tela
	if estado==ACHOU_FUNDO and coluna == 11:
		print float(linha.split('>')[1].split('<')[0].replace(',','.'))
		estado = FIM

$ python cota_small_caps.py
6.156906634

Essa é uma abordagem que eu não gosto nem recomendo porque ela é muito frágil e está extremamente acoplada a formatação de dados para humanos. Esta formatação está interessada no saída gráfica que o usuário vai obter e não em facilitar a extração (não humana) desses dados. Isso torna a solução muito frágil:

  • Se mudarem os nomes internos dos elementos, a solução pode falhar.
  • Se mudarem a formatação da tabela, a solução pode falhar.
  • Se mudarem a disposição interna dos elementos html, a solução pode falhar.
  • Se mudarem a url do documento, a solução vai falhar.
  • Se o documento não puder mais ser tratado linha a linha, a solução vai falhar feio.

É provável que quando você estiver lendo isso ela nem funcione mais do jeito que está descrita aqui.

Por outro lado, a solução funciona e nesse caso é o que me interessa. Quando ela quebrar, se ainda for do meu interesse eu posso rapidamente conserta-la e os dados já coletados no passado continuam válidos.

Isso somado  a uma programa como o Cron pode se tornar uma ferramenta realmente poderosa.

Python Fast XML Parsing

Here is a useful tip on Python XML decoding.

I was extending xml.sax.ContentHandler class in a example to decode maps for a Pygame application when my connection went down and I noticed that the program stop working raising a exception regarded a call to urlib (a module for retrieve resources by url). I noticed that the module was getting the remote DTD schema to validate the XML.


This is not a requirement for my applications and it’s a huge performance overhead when works (almost 1 second for each map loaded) and when the applications is running in a environment without Internet it just waits for almost a minute and then fail with the remain decoding. A dirty workaround is open the XML file and get rid of the line containing the DTD reference.

But the correct way to programming XML decoding when we are not concerned on validate a XML schema is just the xml.parsers.expat. Instead of using a interface you just have to set some callback functions with the behaviors we want. This is a example from the documentation:

import xml.parsers.expat

# 3 handler functions
def start_element(name, attrs):
    print 'Start element:', name, attrs
def end_element(name):
    print 'End element:', name
def char_data(data):
    print 'Character data:', repr(data)

p = xml.parsers.expat.ParserCreate()

p.StartElementHandler = start_element
p.EndElementHandler = end_element
p.CharacterDataHandler = char_data

p.Parse("""
Text goes here
More text
""", 1)

The output:

Start element: parent {'id': 'top'}
Start element: child1 {'name': 'paul'}
Character data: 'Text goes here'
End element: child1
Character data: '\n'
Start element: child2 {'name': 'fred'}
Character data: 'More text'
End element: child2
Character data: '\n'
End element: parent

Tiled TMX Map Loader for Pygame

I’m using the Tiled Map Editor for a while, I even wrote that tutorial about it. It’s a general purpose tile map editor, written in Java but now migrating to C++ with Qt, that can be easily used with my set of free pixelart tiles.

map editor tiles tileset game deveopment

A map done with Tiled is stored in a file with TMX extension. It’s just a XML file, easy to understand.

As I’m creating a map loader for my owns purposes, the procedure I’m doing here works we need some simplifications. I’m handling orthogonal maps only. I’m not supporting tile properties as well. I also don’t want to handle base64 and zlib encoding in this version, so in the Tiled editor, go at the menu Edit → Preferences and in the Saving tab unmark the options “Use binary encoding” and “Compress Layer Data (gzip)”, like this:

Tiled Preferences Window

When saving a map it will produce a TMX file like this:




 
  
  
 
 
  
 
 
  
   
   
    ...
   
   
  
 

For processing it on Python I’m using the event oriented SAX approach for XML. So I create a ContentHandler that handles events the start and end of XML elements. In the first element, map, I know enough to create a Pygame surface with the correct size. I’m also storing the map properties so I can use it later for add some logics or effects on the map. After that we create a instance of the Tileset class from where we will get the each tile by an gid number. Each layer has it’s a bunch of gids in the correct order. So it’s enough information to mount and draw a map.

# Author: Silveira Neto
# License: GPLv3
import sys, pygame
from pygame.locals import *
from pygame import Rect
from xml import sax

class Tileset:
    def __init__(self, file, tile_width, tile_height):
        image = pygame.image.load(file).convert_alpha()
        if not image:
            print "Error creating new Tileset: file %s not found" % file
        self.tile_width = tile_width
        self.tile_height = tile_height
        self.tiles = []
        for line in xrange(image.get_height()/self.tile_height):
            for column in xrange(image.get_width()/self.tile_width):
                pos = Rect(
                        column*self.tile_width,
                        line*self.tile_height,
                        self.tile_width,
                        self.tile_height )
                self.tiles.append(image.subsurface(pos))

    def get_tile(self, gid):
        return self.tiles[gid]

class TMXHandler(sax.ContentHandler):
    def __init__(self):
        self.width = 0
        self.height = 0
        self.tile_width = 0
        self.tile_height = 0
        self.columns = 0
        self.lines  = 0
        self.properties = {}
        self.image = None
        self.tileset = None

    def startElement(self, name, attrs):
        # get most general map informations and create a surface
        if name == 'map':
            self.columns = int(attrs.get('width', None))
            self.lines  = int(attrs.get('height', None))
            self.tile_width = int(attrs.get('tilewidth', None))
            self.tile_height = int(attrs.get('tileheight', None))
            self.width = self.columns * self.tile_width
            self.height = self.lines * self.tile_height
            self.image = pygame.Surface([self.width, self.height]).convert()
        # create a tileset
        elif name=="image":
            source = attrs.get('source', None)
            self.tileset = Tileset(source, self.tile_width, self.tile_height)
        # store additional properties.
        elif name == 'property':
            self.properties[attrs.get('name', None)] = attrs.get('value', None)
        # starting counting
        elif name == 'layer':
            self.line = 0
            self.column = 0
        # get information of each tile and put on the surface using the tileset
        elif name == 'tile':
            gid = int(attrs.get('gid', None)) - 1
            if gid <0: gid = 0
            tile = self.tileset.get_tile(gid)
            pos = (self.column*self.tile_width, self.line*self.tile_height)
            self.image.blit(tile, pos)

            self.column += 1
            if(self.column>=self.columns):
                self.column = 0
                self.line += 1

    # just for debugging
    def endDocument(self):
        print self.width, self.height, self.tile_width, self.tile_height
        print self.properties
        print self.image

def main():
    if(len(sys.argv)!=2):
        print 'Usage:\n\t{0} filename'.format(sys.argv[0])
        sys.exit(2)
    pygame.init()
    screen = pygame.display.set_mode((800, 480))
    parser = sax.make_parser()
    tmxhandler = TMXHandler()
    parser.setContentHandler(tmxhandler)
    parser.parse(sys.argv[1])
    while 1:
        for event in pygame.event.get():
            if event.type == QUIT:
                return
            elif event.type == KEYDOWN and event.key == K_ESCAPE:
                return
        screen.fill((255,255,255))
        screen.blit(tmxhandler.image, (0,0))
        pygame.display.flip()
        pygame.time.delay(1000/60)

if __name__ == "__main__": main()

Here is the result for opening a four layers map file:

netbeans python openning map

That’s it. You can get this code and adapt for your game because next versions will be a lot more coupled for my own purposes and not so general.

Download:packagemaploader.tar.bz2 It’s the Netbeans 6.7 (Python EA 2) project file but that can be opened or used with another IDE or without one. Also contains the village.tmx map and the tileset.

Pygame: Running Orcs

Here is a Pygame Sprite animation using the approach presented by Joe Wreschnig and Nicolas Crovatti. It’s not yet exactly what I need but is very suitable.

import pygame, random
from pygame.locals import *

class Char(pygame.sprite.Sprite):
	x,y = (100,0)
	def __init__(self, img, frames=1, modes=1, w=32, h=32, fps=3):
		pygame.sprite.Sprite.__init__(self)
		original_width, original_height = img.get_size()
		self._w = w
		self._h = h
		self._framelist = []
		for i in xrange(int(original_width/w)):
			self._framelist.append(img.subsurface((i*w,0,w,h)))
		self.image = self._framelist[0]		
		self._start = pygame.time.get_ticks()
		self._delay = 1000 / fps
		self._last_update = 0
		self._frame = 0
		self.update(pygame.time.get_ticks(), 100, 100)	

	def set_pos(self, x, y):
		self.x = x
		self.y = y

	def get_pos(self):
		return (self.x,self.y)

	def update(self, t, width, height):
		# postion
		self.y+=1
		if(self.y>width):
			self.x = random.randint(0,height-self._w)
			self.y = -self._h

		# animation
		if t - self._last_update > self._delay:
			self._frame += 1
			if self._frame >= len(self._framelist):
				self._frame = 0
			self.image = self._framelist[self._frame]
			self._last_update = t

SCREEN_W, SCREEN_H = (320, 320)

def main():
	pygame.init()
	screen = pygame.display.set_mode((SCREEN_W, SCREEN_H))
	background = pygame.image.load("field.png")
	img_orc = pygame.image.load("orc.png")
	orc = Char(img_orc, 4, 1, 32, 48)
	while pygame.event.poll().type != KEYDOWN:
		screen.blit(background, (0,0))
		screen.blit(orc.image,  orc.get_pos())
		orc.update(pygame.time.get_ticks(), SCREEN_W, SCREEN_H)
		pygame.display.update()
		pygame.time.delay(10)

if __name__ == '__main__': main()

Here is it working:

Uptade: I put this source and images at the OpenPixel project in Github

Pygame Simple Key Handling

Here’s a simple key handle in Pygame wheres you move a circle using keyboard.

import pygame
from pygame.locals import *

def main():
	x,y = (100,100)
	pygame.init()
	screen = pygame.display.set_mode((400, 400))
	while 1:
		pygame.time.delay(1000/60)
      # exit handle
		for event in pygame.event.get():
			if event.type == QUIT:
				return
			elif event.type == KEYDOWN and event.key == K_ESCAPE:
				return

      # keys handle 
		key=pygame.key.get_pressed()
		if key[K_LEFT]:
			x-=1
		if key[K_RIGHT]:
			x+=1
		if key[K_UP]:
			y-=1
		if key[K_DOWN]:
			y+=1

		# fill background and draw a white circle
		screen.fill((255,255,255))
		pygame.draw.circle(screen, (0,0,0), [x,y], 30)
		pygame.display.flip()

if __name__ == '__main__': main()

Here’s a video of it working:

Function pygame.key.get_pressed Returns a sequence of boolean values representing the state of every key on the keyboard. It’s very useful because usually on others game platforms I have to create it by myself.

This approach allow me to handle more than one key at time. For example, left and up keys can be pressed and each one is handled separately creating a diagonal movement.

Pygame, Simple Space Effect

This is a simple space effect of sliding stars using Pygame.

[youtube]TXGV6guTOno[/youtube]

Direct link to video: simple_space_effect_01.ogv

We set some constants like the screen size and the number N of star we want.

N = 200
SCREEN_W, SCREEN_H = (640, 480)

Using list comprehension we create a list of random points in the screen, that will be our stars. The size of this list is N.

stars = [
  [random.randint(0, SCREEN_W),random.randint(0, SCREEN_H)]
  for x in range(N)
]

Each star is represented by one tuple on the stars list. The first star is on stars[0] and is a touple with [x, y] positions.

At each step from the game loop we draw and update the position of each star. A star is draw as a white line of one pixel. See the pygame.draw.line doc.

for star in stars:
  pygame.draw.line(background,
    (255, 255, 255), (star[0], star[1]), (star[0], star[1]))
  star[0] = star[0] - 1
  if star[0] < 0:
      star[0] = SCREEN_W
      star[1] = random.randint(0, SCREEN_H)

In this example we update the position of a star by decreasing its horizontal position. When the horizontal position is less than zero, it's not displayed on the screen anymore so we replace its horizontal position (star[0]) by the screen width (SCREEN_W) and the vertical position (star[1]) by a new random position. This will be like create a new star and guarantee always a different pattern of sliding stars.

The complete code:

#!/usr/bin/env python

# A simple effect of sliding stars to create a deep space sensation.
# by Silveira Neto 
# Free under the terms of GPLv3 license
# See http://silveiraneto.net/2009/08/12/pygame-simple-space-effect/

import os,sys,random
import pygame
from pygame.locals import *

# Constants 
N = 200
SCREEN_W, SCREEN_H = (640, 480)

def main():
	# basic start
	pygame.init()
	screen = pygame.display.set_mode((SCREEN_W,SCREEN_H))
	pygame.display.set_caption('Simple Space Effect by Silveira Neto')

	# create background
	background = pygame.Surface(screen.get_size())
	background = background.convert()

	# generate N stars
	stars = [
		[random.randint(0, SCREEN_W),random.randint(0, SCREEN_H)]
		for x in range(N)
	]

	# main loop
	clock = pygame.time.Clock()
	while 1:
		clock.tick(22)
		for event in pygame.event.get():
			if event.type == QUIT:
				return
			elif event.type == KEYDOWN and event.key == K_ESCAPE:
				return
		background.fill((0,0,0))
		for star in stars:
			pygame.draw.line(background,
				(255, 255, 255), (star[0], star[1]), (star[0], star[1]))
			star[0] = star[0] - 1
			if star[0] < 0:
				star[0] = SCREEN_W
				star[1] = random.randint(0, SCREEN_H)
		screen.blit(background, (0,0))
		pygame.display.flip()

if __name__ == '__main__': main()

JavaFX, Retrieving non XML/JSON data from clouds

tango weather overcast

Usuually on JavaFX we grab data using HttpRequest from external resources on formats like JSON or XML. I showed how to get it on the post Reading Twitter with JavaFX and how to parse it using PullParser on the post Parsing a XML sandwich with JavaFX.

Another day I need to grab and interpret some plain results, not in XML nor JSON, while consuming a REST service. In this case we don’t have a well structure data so the PullParser won’t help us.

Example 1: Reading Raw Data

In this example we’ll load a plain text file served in a remote location.

var planetsRequest = HttpRequest {
    location: "http://silveiraneto.net/downloads/planets";
    onInput: function(stream: InputStream) {
        var buff = new BufferedReader(new InputStreamReader(stream));
        var line = "";
        while((line = buff.readLine())!=null){
            println(line);
        }
    }
}
planetsRequest.enqueue();

This will produce the output:

Mercury
Venus
Earth
Mars
Jupiter
Saturn
Uranus
Neptune

Example 2: Discovering your IP Address

In this example we’ll examine how to integrate a request of a remote data in a running graphical program.

The best way to know your real IP address is asking for a remote server to look which IP made that request. It’s like calling for a friend and asking him which number appeared in his mobile. =) This server side Python script prints the IP address of who requested the page.

#!/usr/bin/env python
import os

print "Content-type: text/html"
print
print os.environ['REMOTE_ADDR']

In the client side, with JavaFX, we’ll load the remote value into a local variable. The ip is assigned with the value “…” and later the ipRequest will replace it with a String with the IP. The bind feature will automatically fix the GUI String text.

For the user he will see the ellipsis for a few seconds and so their IP.

import javafx.stage.Stage;
import javafx.scene.Scene;
import javafx.scene.text.Text;
import javafx.io.http.HttpRequest;
import java.io.*;

var ip = "...";

Stage {
    title: "What is my IP?" width: 250 height: 80
    scene: Scene {
        content: Text {
            x: 10, y: 30
            content: bind "My IP is {ip}"
        }
    }
}

var ipRequest = HttpRequest {
    location: "http://silveiraneto.net/scripts/myip.py";
    onInput: function(stream: InputStream) {
        var buff = new BufferedReader(new InputStreamReader(stream));
        ip = buff.readLine();
    }
}
ipRequest.enqueue();

You can try this JavaFX applet here.

Example 3: Reading Integer values

Until now we handled just plain Strings. But in some cases you want to get number as non structured data. In this case you need to know previously which type the data is. In the case of a web service this probably will be described in a WSDL file.

Here I’m writing a very simple service script at Zembly, a great platform for cloud computing. It’s called aplusb, it justs add the first parameter A to the second B.

if ((Parameters.a != null) && (Parameters.b!= 0)) {
return Parameters.a+Parameters.b;
}

The service is published at Zembly here where you can see more details on how to invoke it.

A simple way to invoke it on JavaFX and than getting the value as an Integer:

import java.io.*;
import javafx.io.http.HttpRequest;

var a = 100;
var b = 200;
var result = 0 on replace {
    println(result);
}

var zemblyRequest = HttpRequest {
    location: "http://zembly.net/things/1827f696529d4e6f940c36e8e79bea1c;exec?a={a}&b={b}";
    onInput: function(stream: InputStream) {
        var buff = new BufferedReader(new InputStreamReader(stream));
        result = Integer.valueOf(buff.readLine());
    }
}
zemblyRequest.enqueue();

The output will be:

0
300

The first 0 is from the first assignment on the var result. The 300 is from the webservice itself.

The same approach can be used to convert the ASCII/Unicode result from the stream to the suitable type on a variable.

Ilex Paraguariensis

Chimarrão Gaúcho
Creative Commons image from Flickr.

From days 15 to 20 from April, I’ll be in Porto Alegre. I’ll participate on FISL (an old dream) with the presentation “Netbeans: beyond Java”. I’d like to talk about how you can use Netbeans as a great IDE for languages others than Java like Ruby, PHP, JavaFX, Javascript, Python, etc.

Probably I’ll be able to participate also on two events before FISL (about Opensolaris and Java ME). 🙂

So … how chimarrão tastes?

Pythons at Sun

Pythons at Sun
Source: python.at.sun.svg

Good news. Two important Pythonistas, Ted Leung (Apache Foundation member) and Frank Wierzbicki (Jython lead) joined Sun Microsystems.

It’s one important step in the recognition of the Python language and certainly will bring benefits for the Python Community. Some things I’d like to see:

  • Python support on Netbeans. Bringing Python as a first class citizen on Netbeans, syntax highlight, code completion, debugging tools, unitary testing, Jython and a lot of more.
  • Django support on Netbeans. As we have Rails and Ruby support on Netbeans we can have also Django and Python support on Netbeans. Django is accessed with some command line tools, the work is just plug this on Netbeans, and its architecture makes this not hard. We could have a lot of wizards to creating new models and views. I’d love that.
  • More support of Python and dynamic languages at JVM. There’s already the Da Vince Machine Project on the OpenJDK Project. I hope one day we can see call the Java Virtual Machine as Universal Virtual Machine or Multi Language Virtual Machine.
  • More Python on OpenSolaris. Some projects at OpenSolaris are allready using Python, see Image Packaging System Project. Python is a really good language for common script tasks and I use it for that purpose very often. We could see a lot of wizards and configuration panels in Opensolaris using Python and PyGTK or PyQT.

It’s really a great moment for Sun and Open and Free Software. I’m happy with all that? You can bet on it. 😀

Sources: Tim Bray Blog and Cnet news.

Gerando permutações

Muitas vezes para resolver uma única instância de um problema é mais rápido ataca-lo com força bruta do que encontrar um algoritmo geral com uma boa ordem de complexidade. Permutações são de grande utilidade nesse tipo de abordagem.

Permutações em Prolog:

Esse é um código em Prolog que o Wladimir Araujo passou na cadeira de IA.

select(X, [X|Xs], Xs).
select(X, [Y|Ys], [Y|Zs]) :- select(X, Ys, Zs).

permutar([], []).
permutar(Xs, [Z|Zs]) :-
    select(Z, Xs, Ys),
    permutar(Ys, Zs).

Permutações em Python:
Esse é um código de um certo Michael Davies que eu tirei daqui. Ele gera uma lista com todas as permutações de uma lista. Muito bonitinho. 🙂

def all_perms(str):
    if len(str) <=1:
        yield str
    else:
        for perm in all_perms(str[1:]):
            for i in range(len(perm)+1):
                yield perm[:i] + str[0:1] + perm[i:]

Um exemplo de uso:

>>> for p in all_perms(['a','b','c']):
	print p
['a', 'b', 'c']
['b', 'a', 'c']
['b', 'c', 'a']
['a', 'c', 'b']
['c', 'a', 'b']
['c', 'b', 'a']

Outras implementações:
Em outras linguagens o código para gerar permutações geralmente é muito grande, então eu preferi deixar alguns links.