top button
Flag Notify
    Connect to us
      Site Registration

Site Registration

looking for a good tutorial for a HTML Parser

0 votes
474 views

I was looking for a good tutorial for a "HTML Parser". My intention was to extract tables from web pages or information from tables in web pages.

I tried to make a search, I got HTMLParser, BeautifulSoup, etc. HTMLParser works fine for me, but I am looking for a good tutorial to learn it nicely.

I could not use BeautifulSoup as I did not find an .exe file.

I am using Python 2.7 on Windows 7 SP1 (64 bit).

I am looking for a good tutorial for HTMLParser or any similar parser which have an .exe file for my environment and a good tutorial.

posted Jul 2, 2013 by anonymous

Share this question
Facebook Share Button Twitter Share Button LinkedIn Share Button

2 Answers

+1 vote

Take a read of the topic "Parsing, creating, and Manipulating HTML Documents" from chapter five of Text Processing in Python.

http://gnosis.cx/TPiP/chap5.txt

answer Jul 2, 2013 by anonymous
+1 vote

I believe that BeautifulSoup is a pure-Python module, and so does not have a .exe file. However, it does have good tutorials:

https://duckduckgo.com/html/?q=beautifulsoup+tutorial

Why do you care about a .exe file? Most Python libraries are .py files.

answer Jul 3, 2013 by anonymous
Similar Questions
+1 vote

I am new to ruby on rails (using rails 4 and ruby 2.0.0) and I'm looking for a good tutorial for mongo db using mongoid.yml.

I found a few online, but they give only the basic steps. And I have done till that. ie. insert, query, etc from a mongo db database through the controller.

My basic question revolves around making a connection pool and using it. What changes to do it the mongoid.yml and how to (in java terms) getConnection from the controller???

Any help would be appreciated?

0 votes

In Perl I had been using HTML::Element with great success, and now I would like to know if something similar exists for Python?

I would like to construct the complete page as a plain python structure and then feed it to a function to get a chunk of HTML ready to send to the client.

Something like:

table_struct = [table, [tr, [td, {class=>"red"}, "this is red"],[td, {class=>"blue"}, "this is not red"]]]   
html = struct2html(table_struct)
+1 vote

I want the javascript API which could capture the screen of the client. Does anyone knows about it?

...