I. 寫在開始之前
會使用此功能主要是想寫一個收集各地餐廳的網頁,想法是使用者找到食記之後可以藉由貼網址的方式抓取該篇食記的標題,省去還要複製標題(title)的麻煩(有網址可以順便建立連結,讓對該篇食記有興趣的使用者能直接連過去)。
關於抓取網頁標題其實有一些套件可使用,例如mechanize[1]以及pismo[2]等,一開始還試過用open-uri[3] 進行網頁的讀取再抓資料,但後來一直有編碼和其他問題只好作罷。本篇使用的是pismo套件(gem),能夠抓取網頁中的title, feed URL, lede, author, keyword, datetime等等,功能算是蠻齊全的,如果想進一步了解可以至pismogithub[2]看。
II. 提醒事項
§ 在本文中寫給讀者看的註解會使用 % 符號
§ 本文作業系統為Ubuntu14.04,Rails 4.2.4,Ruby2.2.3
§ [ ] 為reference,可直接點選連結,最後面也會放上網址
§ 下面看到$的記號代表的是終端機的符號,直接輸入$後面那一串即可
III. 安裝及使用方法
1. 在Gemfile安裝pismo
(1)打開自己專案的Gemfile並在裡面加入
gem 'pismo'
(2) 在終端機中(移至你的專案資料夾下)輸入
$ bundle install% 這兩步完成後這個專案就有pismo gem可以使用了!
2. 使用pismo,在需要的controller中使用require呼叫出pismo即可
require 'pismo' # Load a Web page (you could pass an IO object or a string with existing HTML data along, as you prefer) doc = Pismo::Document.new('http://www.rubyinside.com/cramp-asychronous-event-driven-ruby-web-app-framework-2928.html') doc.title # => "Cramp: Asychronous Event-Driven Ruby Web App Framework" doc.author # => "Peter Cooper" doc.keywords # => [["cramp", 7], ["controllers", 3], ["app", 3], ["basic", 2], ..., ... ]
% 上面是github的範例,在呼叫出pismo之後開啟某個網頁(new括號裡的那一串)並把網頁的內容存在doc裡面,接著你就可以把方法用在doc上(如doc.title)取出你要的特定項目,可以使用的方法也詳見pismo[2]。
3. 在實作過程中我使用了doc.title來抓取每篇食記的標題,但會發現有些(如pixnet等)抓的並不是標題,開啟網頁原始碼來看之後判斷應該是含有 title的地方他都會抓取,因此改用doc.titles(這樣他會將所有資料抓起來存成array)來看結果,如下圖:
會發現他抓到的title都不只一個(只有Via's旅行札記例外),但觀察後可以看出每個選項的最後一項都是正確的食記標題,因此最後我在存title時都只取array的最後一項(用.last),如下列程式碼:
% 在抓取title時把所有都抓進來存成array,但在存檔時只存doc.titles.last,這樣就可以確保得到的是正確的食記名稱!成果如下:
參考資料
Thank you for taking the time and sharing this information with us. It was indeed very helpful and insightful while being straight forward and to the point.
回覆刪除Data Science Training in Chennai
Data science training in bangalore
Data science online training
Data science training in pune
Data science training in kalyan nagar
Data Science with Python training in chenni
I simply wanted to write down a quick word to say thanks to you for those wonderful tips and hints you are showing on this site.
回覆刪除java training in chennai | java training in bangalore
java training in tambaram | java training in velachery
java training in omr | oracle training in chennai
Hello. This post couldn’t be written any better! Reading this post reminds me of my previous roommate. He always kept chatting about this. I will forward this page to him. Fairly certain he will have a good read. Thank you for sharing.
回覆刪除AWS Training in Bangalore | Amazon Web Services Training in Bangalore
Amazon Web Services Training in Pune | Best AWS Training in Pune
AWS Online Training | Online AWS Certification Course - Gangboard
Selenium Training in Chennai | Best Selenium Training in Chennai
Selenium Training in Bangalore | Best Selenium Training in Bangalore
Its really an Excellent post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog. Thanks for sharing....
回覆刪除python training in tambaram
python training in annanagar
python training in jayanagar
Thanks you for sharing this unique useful information content with us. Really awesome work. keep on blogging
回覆刪除Devops Training courses
Devops Training in Bangalore
Devops Training in pune
Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
回覆刪除Data science training in Electronic City
Wonderful post...This post really give to me more technical information ..keep to share!!
回覆刪除Android Training in Chennai
Android Online Training in Chennai
Android Training in Bangalore
Android Training in Hyderabad
Android Training in Coimbatore
Android Training
Android Online Training
betmatik
回覆刪除kralbet
betpark
mobil ödeme bahis
tipobet
slot siteleri
kibris bahis siteleri
poker siteleri
bonus veren siteler
68RPX
خادمات فلبينيات للتنازل بالرياض
回覆刪除خادمات فلبينيات للتنازل
شركة تسليك مجاري بالاحساء 1YRFWPiNnZ
回覆刪除شركة تنظيف خزانات بخميس مشيط KKlqbt9Vop
回覆刪除