Python: Parse HTML using Beautiful Soup

Categories: Python; Tagged with: ; @ April 6th, 2014 17:24

Requirement:

Get the FX rate from: http://fx.cmbchina.com/hq

Solution:

Use urllib2 to get the html, use BeautifulSoup to parse the html;

Details:

1.  Get html by urllib2:

import urllib2
html = urllib2.urlopen(“
http://fx.cmbchina.com/hq”).read()

2. parse the html using BeautifulSoup:

install it:

$ apt-get install python-bs4 // http://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-beautiful-soup

Python Code:

from bs4 import BeautifulSoup
soup = BeautifulSoup(html)   # you may check the HTML by: print(soup.prettify()) 
sgRateString = soup.find(id=”realRateInfo”).find_all(‘tr’)[8].find_all(‘td’)[5].get_text()
sgRate = float(sgRateString)
print ‘SG Rate:’, sgRate

All Code:

Links

urllib2 examples:  https://docs.python.org/2/library/urllib2.html#examples

BeautifulSoup Quick Start: http://www.crummy.com/software/BeautifulSoup/bs4/doc/#quick-start



// Proudly powered by Apache, PHP, MySQL, WordPress, Bootstrap, etc,.