Skip navigation.

【*可可熊D窝*】

cocobear'home

改进的bash脚本[网页资源提取]

提取网页中资源的一个小工具

使用方法(假设保存bash脚本的文件名为get):

./get -u http://my.opera.com/blog/ -t pdf

上面的命令会连接到my.opera.com/blog自动搜索pdf格式的资源,并下载(先给get加可执行权限)

如果网页文件已经保存,可以使用下面的命令:

./get -f index.html -t pdf

得到的结果与前面相同

代码如下:

#!/bin/bash
#Write for downing special type of file in website.
#Author:cocobear
#E-Mail:cocobearc@gmail.com
URL=false
FILE=false
HELP=false
TYPE=false
function help() {
        echo "Usage:$0 -[f <filename> h u <url> ] -[t type] "
        exit 1;
}
function awkfile() {
        filename="$1.$2"
        #type="$2$" match specify type at the end of url 
        awk -v type="$2$" '
        BEGIN {FS = "\""} 
        {
        for (i=1;i<=NF;i++)
        if (($i ~ /^http:/) && ($i ~ type ))
                {print $i}
        }' $1 > $filename
        echo "Delete temp file."
        rm  $1
        if [ -s $filename ]
        then wget -i $filename
        else echo "Find nothing match $2"
        fi
        echo "Delete temp file."
        rm  $filename
        exit 
}
function processurl() {
        tempfile="downfile"
        if [ -e $tempfile ]
        then
                echo "$tempfile exist!!" 
                exit 1
        fi
        #redirection 
        wget -O $tempfile $1
        if [ -s $tempfile ]
        then awkfile $tempfile $2
        else echo "Nothing down!"
        fi
        exit 
}

if [ $# -eq  ];
then
        help
        exit 1
fi

#deal with option
while getopts :f:hu:t: option
do
case $option in
f)FILE=$OPTARG
;;
h)help
;;
u)URL=$OPTARG
;;
t)TYPE=$OPTARG
;;
?)
echo "Missing arguments!"
help
;;
esac
done
if [ $TYPE = "false" ]
then {
        echo "Missing type"
        help
}
else {
        if [ $FILE = "false" ] && [ $URL = "false" ]
        then {
                echo "Must specify the filename or url"
                help
        }
        else {
                if [ $FILE != "false" ] &&  [ $URL != "false" ]
                then {
                        echo "filename and url can't be specify together"
                        help
                }
                fi
        }
        fi
}
fi
#main
if [ $FILE != "false" ]
then {
        if [ -e $FILE ]
        then awkfile $FILE $TYPE
        else {
                echo "No such file!"
                exit 1
        }
        fi
}
else processurl $URL $TYPE
fi

主要用到的就是awk进行文本的分析,大部分的shell是用来分析参数的,在写这个脚本的时候基本是边学边写的,也弄懂了不少东西。有时间的时候会写一个详细的分析解释

第一个bash脚本Linux's boot process explained

Comments

可可熊 25. December 2006, 14:46

如果显示不正常,刷新一下应该就ok了,我用opera浏览时有时候上面的代码就不换行了,不过刷新一下就ok了:D

How to use Quote function:

  1. Select some text
  2. Click on the Quote link

Write a comment

Comment
(BBcode and HTML is turned off for anonymous user comments.)

If you can't read the words, press the small reload icon.


Smilies

December 2009
S M T W T F S
November 2009January 2010
1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31