Little stats script - tophits.sh

Simon Lyall simon at darkmere.gen.nz
Fri May 27 03:27:53 CEST 2011


In case anyone finds this useful. It is a little script that outputs the 
URLs that are doing the most hits and most bandwidth.

It's a bit of a hack (I see bits I could tidy just now) but works okay for 
me. Main bug is that URLs with different sizes (gziped/non-gziped mainly) 
are totalled seperately.


#!/bin/bash
varnishncsa -d > /tmp/vlog
#
START1=`head -1 /tmp/vlog | cut -f4 -d" " | cut -f2 -d"[" | sed "s/\/[0-9]*\:/\//" | awk -F/ ' { print $2" "$1" "$3 } ' `
START=`date +%s --date="$START1"`
FIN1=`tail -1 /tmp/vlog | cut -f4 -d" " | cut -f2 -d"[" | sed "s/\/[0-9]*\:/\//" | awk -F/ ' { print $2" "$1" "$3 } ' `
FIN=`date +%s --date="$FIN1"`
DIFF=` echo " $FIN - $START " | bc `

echo "Data for the last $DIFF seconds "

cat /tmp/vlog | sed "s/\%5F/_/g"  | sed "s/\%2E/\./g" > /tmp/tophits.tmp
echo ""
echo "Top Hits per second URLs"
echo ""
cat /tmp/tophits.tmp | awk -v interval=$DIFF ' { COUNT += 1 } END { OFMT = "%f" ; printf "Total Hits/second: %i\n" , COUNT/interval }'
echo ""
cat /tmp/tophits.tmp | awk ' { print $7 }' | sort | uniq -c | sort -rn | head -20  | awk -v interval=$DIFF ' { printf "%4.1f Hits/s %s\n" , $1/interval , $2 } '
echo ""
echo ""
echo "URLs using the most bandwidth"
echo ""
cat /tmp/tophits.tmp | awk -v interval=$DIFF ' { SUM += $10} END { OFMT = "%f" ; printf "Total Bits/second: %6.1f Kb/s \n", SUM*8/interval/1000 }'
echo ""
cat /tmp/tophits.tmp | awk  ' { print $10 " " $7 }' | sort | uniq -c | awk -v interval=$DIFF ' { printf "%6.1f Kb/s  %i h/min  %i KB  %s\n" , $1*$2/interval*8/1000,$1*60/interval,$2/1000,$3}' | sort -rn | head -20
echo ""
echo ""



-- 
Simon Lyall  |  Very Busy  |  Web: http://www.darkmere.gen.nz/
"To stay awake all night adds a day to your life" - Stilgar | eMT.





More information about the varnish-misc mailing list