首页 未命名正文

使用Spark分析网站日志

郁闷从昨天开始个人网站不断的发出告警504错误,登录机器看了一下是php-fpm报错,这个错误重启php-fpm后,几个小时就告警,快一年了都没什么问题,奇怪 [28-Sep-2016 11:53:19] NOTICE: ready to handle connections [28-Sep-2016 11:53:19] NOTICE: systemd monitor interval set to 10000ms [28-Sep-2016 11:53:26] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [28-Sep-2016 13:46:35] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it [28-Sep-2016 13:49:32] WARNING: [pool www] server reached pm.max_children setting (5), consider raising it 以为是这个值设置的太小了,所以修改了配置修改大了值 [28-Sep-2016 15:51:43] NOTICE: fpm is running, pid 28179 [28-Sep-2016 15:51:43] NOTICE: ready to handle connections [28-Sep-2016 15:51:43] NOTICE: systemd monitor interval set to 10000ms [28-Sep-2016 15:52:12] WARNING: [pool www] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 0 idle, and 7 total children [28-Sep-2016 16:15:58] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it [28-Sep-2016 16:52:32] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it [28-Sep-2016 16:53:05] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it [28-Sep-2016 16:55:17] WARNING: [pool www] server reached pm.max_children setting (20), consider raising it 结果后来还是一样,几个小时之后再次504告警,再看nginx的日志,发现一些奇怪的ip访问量非常大。。。有怀疑是有恶意ip的访问,看来有必要查查访问日志中的ip访问量 root@iZ28bhfjhgkZ:/var/log/nginx# vim access.log 121.42.53.180 - - [25/Sep/2016:06:26:29 +0800] "POST /wp-cron.php?doing_wp_cron=1474755989.0131719112396240234375 HTTP/1.0" 499 0 "-" "WordPress/4.3.1; http://zhwen.org" 182.92.148.207 - - [25/Sep/2016:06:26:29 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)" 203.208.60.226 - - [25/Sep/2016:06:28:55 +0800] "GET /?p=675 HTTP/1.1" 200 8204 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 203.208.60.226 - - [25/Sep/2016:06:28:57 +0800] "GET /wp-content/themes/sparkling/inc/css/font-awesome.min.css?ver=4.3.1 HTTP/1.1" 200 26711 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 203.208.60.226 - - [25/Sep/2016:06:28:57 +0800] "GET /wp-content/plugins/wp-pagenavi/pagenavi-css.css?ver=2.70 HTTP/1.1" 200 374 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 203.208.60.226 - - [25/Sep/2016:06:28:58 +0800] "GET /wp-content/plugins/yet-another-related-posts-plugin/style/widget.css?ver=4.3.1 HTTP/1.1" 200 771 "http://zhwen.org/?p=675" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 121.43.107.174 - - [25/Sep/2016:06:29:18 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)" 115.28.189.208 - - [25/Sep/2016:06:29:33 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)" 42.156.139.59 - - [25/Sep/2016:06:30:58 +0800] "GET /?paged=14 HTTP/1.1" 200 11164 "-" "YisouSpider" 182.92.148.207 - - [25/Sep/2016:06:31:29 +0800] "GET / HTTP/1.1" 200 41253 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Win64; x64; Trident/5.0)" 61.135.169.81 - - [25/Sep/2016:06:34:14 +0800] "GET /?p=articles/cscope-tags HTTP/1.1" 200 10681 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12) AppleWebKit/602.1.50 (KHTML, like Gecko)" 61.135.169.81 - - [25/Sep/2016:06:34:14 +0800] "GET /apple-touch-icon-precomposed.png HTTP/1.1" 404 151 "-" "Safari/12602.1.50.0.10 CFNetwork/807.0.4 Darwin/16.0.0 (x86_64)" 所以对访问日志的ip做了一个简单统计: 1)先把ip取出来(为了减少数据量,其实也可以直接压缩后下载到本地),再下载到本地 root@iZ28bhfjhgkZ:/var/log/nginx# cat access.log|awk '{print $1}' > tt 在sparkshell中执行下面的代码: val line = sc.textFile("/data1/data/t1") line.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).map(e => (e._2, e._1)).reduceByKey(_+","+_).sortByKey(true,1).saveAsTextFile("/data1/data/t3") 2)最后的结果t3的内容如下,发现这几个ip的访问量非常大,尤其191.96.249.53 。。。。。 (855,182.92.148.207) (3100,121.8.136.75) (3889,61.135.169.81) (53513,191.96.249.53) 3)再搞一个iptables限制,搞定。spark做这种统计分析还是非常简单的,就是一行代码搞定分析。 root@iZ28bhfjhgkZ:/var/log# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination root@iZ28bhfjhgkZ:/var/log# iptables -A INPUT -s 191.96.249.53 -j DROP root@iZ28bhfjhgkZ:/var/log# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination DROP all -- DEDICATED.SERVER anywhere Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination root@iZ28bhfjhgkZ:/var/log#

版权声明

本文仅代表作者观点,不代表本站立场。
本文系作者授权发表,未经许可,不得转载。

评论