Skip navigation.

Lu cát

Action, Correction, Perfection !

Strip url from a web pages

,

Để lọc ra các url link để download trong một trang web, có rất nhiều cách, dưới này chỉ là một cách nhanh gọn bằng 1 script awk nho nhỏ từ : http://www.gnu.org/software/gawk/manual/gawkinet/html_node/WEBGRAB.html đã được mod lại xíu :

BEGIN { RS = "http://[#%&\\+\\-\\./0-9\\:;\\?A-Z_a-z\\~]*" }     
RT != "" {        command = ("wget " RT \)
print command
     }


gawk -f getlink.awk target.html > down.sh


với target.html là file đã save về local, muốn dùng direct link thì cần 1 script awk nho nhỏ nữa, nhưng chắc không cần thiết.

15/07/2007Delete large numbers of file quicly ...

Comments

pclouds 16. July 2007, 14:51

my favourite (not an awk fan apparently):
grep 'href="' target.html|sed 's,.*href="\([^"]*\)".*,\1,g' (change href= to src= if you are going to grab images)

it's not perfect (won't work with href=blah without quotes) but simple enough to type it by hand

oh and append the following to wget them all
grep ...|sed ...|while read i;do wget "$i";done

Lu cát 16. July 2007, 15:28

Hix, thanks pclouds for another way using bash script only ! Firstly, I used to try sed but not successed ...then have to find a working way at least :D

How to use Quote function:

  1. Select some text
  2. Click on the Quote link

Write a comment

Comment
(BBcode and HTML is turned off for anonymous user comments.)

If you can't read the words, press the small reload icon.


Smilies

December 2009
M T W T F S S
November 2009January 2010
1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31