HTML parsing using gem Nokogiri in Rails

parsing html is easy using Nokogiri, to do follow the steps

    Step 1 : Open the terminal and go to your rails application folder.

  • $ cd yourRailsApp
    add  the following line to your Gemfile
    gem  ‘nokogiri’
    First install the dependencies

    $  sudo apt-get install libxml2 libxml2-dev libxslt1-dev
    

    For Fedora do

    $ sudo yum install libxml2-devel libxslt-devel
    

    For CentOS:

    $ sudo yum install -y gcc ruby-devel libxml2 libxml2-devel libxslt libxslt-devel
    

    do bundle install
    /yourRailsApp $ bundle install
    Your nokogiri installation is completed. Parsing HTML documents look like
    doc = Nokogiri::HTML(html_document) and parsing XML documents look like
    doc = Nokogiri::XML(xml_document)
    Nokogiri converts HTML and XML documents into a tree data structure. Nokogiri extracts data using this tree. Xpath and CSS are small languages used for tree traversal. Xpath is a language that was written to traverse an XML tree struture, but we can use it with HTML tree as well.

  • Lets try an example take the folowing html code

    Step 2 : In views/form.erb add

  • Add a text field in your form
    <%= text_field_tag(“my[url]”, @user.url) %> and enter the sample code in this text field.
  • Step 3 : In controller add

  • @user.url = params[“my”][“url”]
    call the function
    @user.get_value_from_url
    write the function in the model
    def get_value_from_url
    node = Nokogiri::XML(self.url)
    node.xpath(‘//param[@name=”pic”]’).each do |node|
    @result = node.attribute(“value”).to_s
    end
    @result
    end

Then you get the value : “http://www.my_site.com/v/xedFamzsaaxo7hc0?fs=1&amp;hl=en_US&#8221; in the result. Look its so easy to use nokogiri. Its Wonderful !!

You can also view Aaron Patterson’s (creator of Nokogiri) blog from here.

Advertisements