A while ago, we discussed how to scrape information from websites that don't offer information in a structured format like XML or JSON. We noted that urllib and lxml are indispensable tools in web scraping. While urllib enables us to connect to websites and retrieve information, lxml helps convert HTML, broken or not, to valid XML and parse it. In this post, I will demonstrate how to retrieve information from web pages that require a login session.
Jamie asks on LinkedIn.
The short answer
The question is wrong.
The long answer
A true PHP developer is a programmer who knows PHP. A false PHP developer is someone who doesn't know PHP. That's the closest correct answer I can think of.
I think, Jamie wants to ask, "what's your definition of a good PHP developer?". There is no correct answer to the question. All, you can do is highlight some of the good things a PHP developer does.
Let's seize this opportunity to talk about the traits of a good PHP developer. Most of the things that apply for a discussion about good PHP programmer also applies to a good web developer and good programmer in general.
The topic was Free Software Movement and GNU/Linux operating system.
It was a long drive to Reva Institute, 40 kilometers from home. I reached the venue in time thanks to the moderate traffic. The third floor was already filled. I had to go to the fourth floor to listen to the speech. The auditorium stage can be viewed from both third and fourth floor. The floor had two elevated blocks, one above the other. There were no chairs on the fourth floor. The floor was a bit dusty. Approximately five hundred people attended the event.
The talk was usual as you would expect. RMS started off, explaining the meaning of free software. The four freedoms. Then he talked about the history of the free software movement, FSF, GNU, Linux, Emacs. Even though I am quite familiar with the topics, it was interesting to hear them from the horse's mouth.
More and more websites are offering APIs nowadays. Previously, we've talked about XML-RPC and REST. Even though web services are growing exponentially there are a lot of websites out there that offer information in unstructured format. Especially, the government websites. If you want to consume information from those websites, web scraping is your only choice.
What is web scraping?
Web scraping is a technique used in programs that mimic a human browsing the website. In order to scrape a website in your programs you need tools to
- Make HTTP requests to websites
- Parse the HTTP response and extract content
- base.mako contains the layout of the web page. Many templates inherit base.mako. Here's a snippet from base.mako
<html> <head> <title>Some title</title> <script>...</script> <script>...</script> </head> </%def>
- my_page.mako inherits base.mako. From within my_page.mako we want to be able to append script tags in the head section of the web page.
You can use the arrow keys on keyboard to move around in the command line. Bash also provides convenient keyboard short cuts to navigate effectively. Try them out and see for yourself.
To become a Bash pro user you have to get yourself familiar with the keyboard shortcuts. Once you do, you'll find yourself productive.
|CTRL+b||move backward one character|
|CTRL+f||move forward one character|
|ESC+b||move one word backward|
|ESC+f||move one word forward|
|CTRL+a||move to beginning of line|
|CTRL+e||move to end of line|
|CTRL+p||move to previous line|
|CTRL+n||move to next line|
|ESC+||move to first line of history list|
|ESC+>||move to last line of history list|
Moving around words using ESC+f and ESC+b are my favourites in this list. Jumping to first and last lines of the history list is also useful.
How to create dijit.form.DateTextBox widget programmatically
There are two ways to create Dojo's widgets
Programmatically creating widgets has its advantages. For instance, you may want to create a date picker when a button is clicked.
Let's create a dijit.form.DateTextBox widget programmatically step by step.
I was contacted by PackT to review the book PHP 5 e-commerce Development by Michael Peacock.
The book serves as an introductory tutorial on developing an e-commerce website using PHP. The book has 15 chapters covered in 310 pages.
You can grab a sample chapter from the publisher's website.
The publisher's website has a detailed table of contents.
Who should read the book?
You should read the book if you are learning PHP and new to e-commerce. Beginners trying to utilize out of the box software like Drupal CMS or OSCommerce tend to be frustrated sooner or later. These content management systems have their own ways of doing things. Being new to PHP and complex software like Drupal can intimidate you until you thoroughly understand the inner workings of the software. Often developers choose to roll their own software to avoid the steep learning curve of existing open source software. If you have experienced similar feeling you can sure try this book.
Today, I was looking for a quick way to get the current weather information on my computer. There are so many websites out there that offer the information. But I was looking for a program I could permanently install on my computer and launch it whenever I want to lookup the weather information. Oddly, I didn't find any satisfying program. At the same time I was also watching a video about network programming. That inspired me to quickly write a program in PHP to print the current weather information where I live.
I started to look out for a web service that offers information about weather for free. Did I tell you programmableweb.com is a useful website to find web services? If you have subscribed to the Tech Chorus blog you know we've been talking about REST, XML-RPC and web services in general for a while. I landed up on the Yahoo! Weather API web page.
I wrote a program to print the weather information in 7 lines of PHP code. I have published this program on Code Album github repository. You can grab it and use it.
If you want to know how to write similar programs, read on. If you know a bit of PHP and have heard about XML and RSS before you can understand the program and start building upon it.
Having the clients send API key within the HTTP header is convenient to handle. We can quickly check the HTTP request header and decide whether to allow or deny the request.
As a prerequisite you should be familiar writing front controller plugins. Let's write a front controller plugin that does the following: