Tech Chorus

Personal Wiki

written by Sudheer Satyanarayana on 2011-08-26

When someone asks how to become a programmer or a good programmer, the usual answer is "hack something". To clarify the jargon, hacking is not breaking into computer systems. When you start learning programming, it is good to write programs. You could hack on existing open source software projects or your own cool new project.

I have been blogging ever since I have started programming. I have urged a lot of developers to start blogging. In addition to working on a personal project, blogging about hacking is an awesome experience. Unfortunately, blogging doesn't tick for a lot of people. The usual reasons are lack of time and not having writing skills.

I have been a wiki user from a long time and I enjoy browsing wikis. I use Wikipedia everyday. Wikipedia search is the second search engine in my Firefox's list of search engines. At work, I use the wiki in Redmine. A wiki is a good tool for collaborative content creation and editing. It also works great for single users. Apart from using wikis at work and other places, I have been using a personal wiki. I use it to store

Takeaway: if you are not using a personal wiki, start today.

Web Scraping With lxml

written by Sudheer Satyanarayana on 2010-08-22

More and more websites are offering APIs nowadays. Previously, we've talked about XML-RPC and REST. Even though web services are growing exponentially there are a lot of websites out there that offer information in unstructured format. Especially, the government websites. If you want to consume information from those websites, web scraping is your only choice.

What is web scraping? Web scraping is a technique used in programs that mimic a human browsing the website. In order to scrape a website in your programs you need tools to

Make HTTP requests to websites Parse the HTTP response and extract content Making HTTP requests is a snap with urllib, a Python standard library module. Once you have the raw HTML returned by the website, you have to have an efficient technique to extract content.

Many programmers immediately think of regular expressions when talking about extracting information from text documents. But, there are better tools at your disposal. Enter lxml. Using tools like lxml you can transform an HTML document into an XML document. After all, an XHTML document is an XML document. As we all know that web authors seldom care for standards compliant HTML documents. Majority of websites have broken HTML. We have to deal with it. But hey, lxml is cool with it. Even if you supply a broken HTML document, lxml's HTML parser can transform it into valid XML document. However, regular expressions are still useful in web scraping. You can use regular expressions in conjunction with lxml, specifically when you're dealing with text nodes.

What you should know before you start?

W3Schools.com has good tutorials on these subjects. Head over to XML tutorial and XPath tutorial to brush up your knowledge.

Let's write a Python script to put our new found skills into practice.

The government of India has a web page where it lists the honourable members of the parliment. The goal of this exercise is to scrape the web page and extract the list of names of members of the parliment.

The web page in question is http://164.100.47.132/LssNew/Members/Alphabaticallist.aspx

Without further ado, let's begin coding.

import urllib
from lxml import etree
import StringIO

We can grab the web page using the urllib module. lxml.etree has the required parser objects.

result = urllib.urlopen("http://164.100.47.132/LssNew/Members/Alphabaticallist.aspx")
html = result.read()

At this point, we have the raw HTML in html variable.

parser = etree.HTMLParser()
tree   = etree.parse(StringIO.StringIO(html), parser)

We create the HTML parser object and then pass the parser to etree.parse. In other words, we tell etree.parse to use the HTML parser object. We also pass the file like string object using StringIO.StringIO.

Now, take a look at the source of the document.

The information we want is in the table whose id is "ctl00_ContPlaceHolderMain_Alphabaticallist1_dg1".

Let's begin constructing the XPath expression to drill down the document to those parts we care about.

//table[@id='ctl00_ContPlaceHolderMain_Alphabaticallist1_dg1']

The above XPath expression grabs the table node having the id "ctl00_ContPlaceHolderMain_Alphabaticallist1_dg1" irrespective of its location in the document.

The first row, , is not required since it contains the table heading. Let's grab all the rows of the table element except the first row.

//table[@id='ctl00_ContPlaceHolderMain_Alphabaticallist1_dg1']/tr[position()>1]

In each table row, the name of the member of the parliment is contained in the second cell, .

Filter the XPath expression to return only the second cell of each row.

//table[@id='ctl00_ContPlaceHolderMain_Alphabaticallist1_dg1']/tr[position()>1]/td[position()=2]

Within our target cell node, the name of the member of the parliment is contained in the anchor, <a>, element.

Further refine the XPath expression to grab the text nodes.

//table[@id='ctl00_ContPlaceHolderMain_Alphabaticallist1_dg1']/tr[position()>1]/td[position()=2]/a/child::text()

Apply the XPath expression to our tree.

xpath = "//table[@id='ctl00_ContPlaceHolderMain_Alphabaticallist1_dg1']/tr[position()>1]/td[position()=2]/a/child::text()"
filtered_html = tree.xpath(xpath)

That's all we need to do to grab the names of members of the parliment.

The filtered_html variable is a Python list. The elements of the list are the names of the members of the parliment.

Try it and see for yourself.

print filtered_html

Here's the sample output

['Aaroon Rasheed,Shri J.M.', 'Abdul Rahman,Shri ', 'Abdullah,Dr. Farooq', 'Acharia,Shri Basudeb', 'Adhalrao Patil,Shri Shivaji', 'Adhi Sankar,Shri ', 'Adhikari ,Shri Sisir Kumar', ...]

By the time you read this document, if the web page is moved or its contents altered, refer to the attached HTML document.

The complete script is posted as a gist .

Event Report - Richard M Stallman Spoke At Reva Institute of Science and Management, Bangalore

written by Sudheer Satyanarayana on 2010-08-09

he topic was Free Software Movement and GNU/Linux operating system.

It was a long drive to Reva Institute, 40 kilometers from home. I reached the venue in time thanks to the moderate traffic. The third floor was already filled. I had to go to the fourth floor to listen to the speech. The auditorium stage can be viewed from both third and fourth floor. The floor had two elevated blocks, one above the other. There were no chairs on the fourth floor. The floor was a bit dusty. Approximately five hundred people attended the event. The talk was usual as you would expect. RMS started off, explaining the meaning of free software. The four freedoms. Then he talked about the history of the free software movement, FSF, GNU, Linux, Emacs. Even though I am quite familiar with the topics, it was interesting to hear them from the horse's mouth.

RMS proceeded and made his case why you should not use proprietary software. Notable examples he presented were Skype and Microsoft Windows. He did mention about back doors of Windows and how the software owner takes control of the user's computer. He also mentioned the perils of Amazon Swindle.

From a few days, I was wondering what RMS had to say about GNU/Linux other than asking people to call it GNU/Linux or GNU+Linux instead of Linux. To my surprise there was nothing I hadn't already heard of on this topic. Naturally, the flow went to open source and distros. RMS recalled that someone called him father of open source. He said it was saying Mahatma Gandhi as father of BJP.

As usual, RMS recommended the distros BLAG, gNewSense, Ututo among others.

The man is humorous. He performed the part of the Saint IGNUcius. He wore the robe and the hat and delivered the saint talk. Everyone in the audience had a good laugh, enjoying the performance.

Before concluding his speech, he answered few questions. The questions were sent to him prior to the speech. He read the questions from the paper he held in his had and answered them one by one. People sitting in the front row of third floor had an opportunity to ask questions to which RMS answered rather fast.

The mike and speakers were not good enough. After the fans were turned off, the voice was better. The speakers squeaked a few times interrupting the speech for a few seconds, few times.

At the end of the speech, the GNU was auctioned to help raise funds for gnu.org.in. The bidding closed at Rs. 5500, approximately 118 US dollars. GNU stickers were sold. I bought a sheet of stickers thus contributing a tiny amount to FSF India.

Renuka Prasad, presumably one of the organizers, invited me and few others to lunch with RMS. At the lunch room, I had the honour of meeting few noteworthy people. The lunch and the gossip was rather quick.

My views on the talk It is hard not to like RMS and his views. His work on free software truly deserves accolades. The principal of Reva Institutes, said he hoped that RMS will be considered for Nobel Prize. I am with him on this.

Thinking practically, it is not easy to get people to call it GNU/Linux instead of Linux. The man himself said, GNU not being credited wasn't a big issue. Similarly, using the distros he recommends is not entirely practical either. A free software activist can go to this extent. Not an ordinary mortal me.

What do you think?

Make Your Own Script Appender In Mako Templates

written by Sudheer Satyanarayana on 2010-07-21

In a recently started Pylons project, I wanted to make an easy script appending facility in Mako templates.

The requirement:

base.mako contains the layout of the web page. Many templates inherit base.mako. Here's a snippet from base.mako

<html>
<head>
    <title>Some title</title>
    <script>...</script>
    <script>...</script>
</head>

my_page.mako inherits base.mako. From within my_page.mako we want to be able to append script tags in the head section of the web page. base.mako

# -*- coding: utf-8 -*-
<%! scripts = [] %>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
    <title>${self.title()}</title>
    ${self.head_scripts()}
</head>
<body>
     ${self.menu()}
     ${next.body()}
     ${self.footer()}
</body>
</html>
...
<%def name="head_scripts()">
<% 
    all_scripts = []
    t = self
    while t:
      all_scripts = getattr(t.module, 'scripts', []) + all_scripts
      t = t.inherits
%>
% for script in all_scripts:
    <script src="${script}" type="text/javascript"></script>
% endfor

Notice the top portion of the template. We define a list variable called scripts. At this point scripts is empty.

<%! scripts = [] %>

We render the script tags by calling the function head_scripts().

 ${self.head_scripts()}

my_page.mako

<%inherit file="/base.mako"/>


<%! scripts = ['some_script.js'] %>

In my_page.mako, we define the variable scripts that contains the URLs. scripts is a list which lets you add any number of scripts to be appended.

<%! scripts = ['one.js', 'two.js', 'three.js'] %>

Looking back at base.mako, we have the function head_scripts() that grabs the scripts attribute in the inheritance chain . Once we have the list of all the URLs to be appended, we simply iterate and write the script tags.

<% 
    all_scripts = []
    t = self
    while t:
      all_scripts = getattr(t.module, 'scripts', []) + all_scripts
      t = t.inherits
%>
% for script in all_scripts:
    <script src="${script}" type="text/javascript"></script>
% endfor

getattr() ensures that if any template in the chain doesn't define scripts, there will be no error.

Once base.mako is setup, you can append the script tags by just defining a list in the inheriting templates. You can use the same technique to append title, link, style and other HTML tags.

Becoming Productive In Bash Using The Keyboard Shortcuts

written by Sudheer Satyanarayana on 2010-07-05

Moving around

You can use the arrow keys on keyboard to move around in the command line. Bash also provides convenient keyboard short cuts to navigate effectively. Try them out and see for yourself.

To become a Bash pro user you have to get yourself familiar with the keyboard shortcuts. Once you do, you'll find yourself productive.

CTRL+b move backward one character
CTRL+f move forward one character
ESC+b move one word backward
ESC+f move one word forward
CTRL+a move to beginning of line
CTRL+e move to end of line
CTRL+p move to previous line
CTRL+n move to next line
ESC+< move to first line of history list
ESC+> move to last line of history list

Moving around words using ESC+f and ESC+b are my favourites in this list. Jumping to first and last lines of the history list is also useful.

Deleting And Undeleting

Bash provides convenient keyboard short cuts for deleting and retrieving the last deleted item.

CTRL+d delete one character forward
ESC+d delete one word forward
CTRL+k delete forward to end of line
CTRL+u delete the line from the beginning to point
CTRL+y retrieve last item deleted

Searching CTRL+r search backward When you hit CTRL+r the prompt change to (reverse-i-search)`': Type the first few characters of the command you have entered before, Bash completes the command line for you.

Changing Case

ESC+c Capitalize word after point
ESC+u Change word after point to all capital letters

ESC+l| Change word after point to all lowercase letters|

This is especially useful, when your caps lock is accidentally on and you type something without realizing it. Without the short cut to change case, you would turn caps lock off, delete the characters you accidentally typed in upper case and then type them again. Now you are empowered with ESC+l.

Miscellaneous

CTRL+l clear screen
CTRL+d logout or close the terminal window
CTRL+c cancel the currently running program or command

Spend some time with these keyboard short cuts. Become a productive Bash user.

Concluding The Bangalore PHP User Group Meeting - January 30 2010

written by Sudheer Satyanarayana on 2010-02-01

Last Saturday, the Bangalore PHP User Group conducted a meeting. The venue was same as the last time, Microsoft office, Bangalore! The topic of the meeting was Framework Shootout. The frameworks represented were:

I was glad to get an opportunity to represent the Zend Framework. The slides I presented with Ganesh H S can be downloaded or viewed online at SlideShare.

I liked all the presentations. Personally, I believe Zend Framework and Symfony are the two PHP5 frameworks you would want to seriously consider using in your projects. The strengths and weaknesses of each framework varies. In a previous post we discussed the reasons to use Zend Framework. Sjoerd de Jong has offered to conduct training sessions on Symfony for free.

I would like to thank Vinu Thomas for arranging the meet. Look forward forward for more BPUG meetings in the future.

Concluding The Bangalore PHP User Group Meeting - Oct 31 2009

written by Sudheer Satyanarayana on 2009-10-31

Today, the Bangalore PHP User Group had a meeting. The meetup.com site reports that sixty eight people attended the meeting. The venue was at Microsoft. The increasing participation of Microsoft in PHP conferences and meetings have taken many by surprise. Microsoft were kind enough to offer free Pizza for all the attendees.

They have given me a copy of Windows 7 Release Candidate which expires on June 1, 2010. If time permits, I will sure try to install it as a virtual guest.

The attendees. Vinu Thomas, the event organizer is seen standing.  Janakiram presented "Cloud Computing For PHP Developers". It was nice to see someone making an interesting presentation without slides. Janakiram used a text editor and typed the bullet points in big fonts as he spoke about them. It was indeed a good presentation.

Sudheer's Talk On Building Restful Applications

I gave a talk on "Building RESTful applications with PHP".

I have uploaded my slides "Building Restful Applications Using PHP" so that you can view it online or download the file.

At the meeting, I have pitched the idea of starting a PHP podcast. Do you think podcasting is a good idea? We are looking for a panel of volunteer hosts. If you can join us or can recommend someone let us know.

Pictures: Courtesy, Vinu Thomas.

How To Remove Alpha Channel From The Image Using GIMP

written by Sudheer Satyanarayana on 2008-11-05

As per the Wikipedia "alpha compositing is the process of combining an image with a background to create the appearance of partial transparency". To remove the transparency or the alpha channel:

Fire up GIMP

Quote from the GIMP docs:

The Flatten Image command merges all of the layers of the image into a single layer with no alpha channel. After the image is flattened, it has the same >appearance it had before. The difference is that all of the image contents are in a single layer without transparency. If there are any areas which are >transparent through all of the layers of the original image, the background color is visible. This operation makes significant changes to the structure of the image. It is normally only necessary when you would like to save an image in a format which >does not support levels or transparency (an alpha channel).

Reference:

  1. http://docs.gimp.org/en/gimp-image-flatten.html
  2. http://en.wikipedia.org/wiki/Alpha_compositing