改进的bash脚本[网页资源提取]
Sunday, 24. December 2006, 15:26:13
提取网页中资源的一个小工具
使用方法(假设保存bash脚本的文件名为get):
./get -u http://my.opera.com/blog/ -t pdf
上面的命令会连接到my.opera.com/blog自动搜索pdf格式的资源,并下载(先给get加可执行权限)
如果网页文件已经保存,可以使用下面的命令:
./get -f index.html -t pdf
得到的结果与前面相同
代码如下:
#!/bin/bash
#Write for downing special type of file in website.
#Author:cocobear
#E-Mail:cocobearc@gmail.com
URL=false
FILE=false
HELP=false
TYPE=false
function help() {
echo "Usage:$0 -[f <filename> h u <url> ] -[t type] "
exit 1;
}
function awkfile() {
filename="$1.$2"
#type="$2$" match specify type at the end of url
awk -v type="$2$" '
BEGIN {FS = "\""}
{
for (i=1;i<=NF;i++)
if (($i ~ /^http:/) && ($i ~ type ))
{print $i}
}' $1 > $filename
echo "Delete temp file."
rm $1
if [ -s $filename ]
then wget -i $filename
else echo "Find nothing match $2"
fi
echo "Delete temp file."
rm $filename
exit
}
function processurl() {
tempfile="downfile"
if [ -e $tempfile ]
then
echo "$tempfile exist!!"
exit 1
fi
#redirection
wget -O $tempfile $1
if [ -s $tempfile ]
then awkfile $tempfile $2
else echo "Nothing down!"
fi
exit
}
if [ $# -eq ];
then
help
exit 1
fi
#deal with option
while getopts :f:hu:t: option
do
case $option in
f)FILE=$OPTARG
;;
h)help
;;
u)URL=$OPTARG
;;
t)TYPE=$OPTARG
;;
?)
echo "Missing arguments!"
help
;;
esac
done
if [ $TYPE = "false" ]
then {
echo "Missing type"
help
}
else {
if [ $FILE = "false" ] && [ $URL = "false" ]
then {
echo "Must specify the filename or url"
help
}
else {
if [ $FILE != "false" ] && [ $URL != "false" ]
then {
echo "filename and url can't be specify together"
help
}
fi
}
fi
}
fi
#main
if [ $FILE != "false" ]
then {
if [ -e $FILE ]
then awkfile $FILE $TYPE
else {
echo "No such file!"
exit 1
}
fi
}
else processurl $URL $TYPE
fi
主要用到的就是awk进行文本的分析,大部分的shell是用来分析参数的,在写这个脚本的时候基本是边学边写的,也弄懂了不少东西。有时间的时候会写一个详细的分析解释














可可熊 # 25. December 2006, 14:46