2012-03-15 3 views
-4

파이썬 파일이 있는데, 로그 파일과 url을 코드에 전달합니다. 출력 파일에는 URL이 어떤 IP 주소로 액세스되었는지 표시됩니다.특정 날짜에 ping 된 ips 수를 찾는 방법은 무엇입니까?

#!/usr/bin/env python 
# 
# Counts the IP addresses of a log file. 
# 
# Assumption: the IP address is logged in the first column. 
# Example line: 117.195.185.130 - - [06/Mar/2012:00:00:00 -0800] \ 
# "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
# 

import sys 

def urlcheck(line, url): 
    '''Checks if the url is part of the log line.''' 
    lsplit = line.split() 
    if len(lsplit)<7: 
     return False 
    return url==lsplit[6] 

def extract_ip(line): 
    '''Extracts the IP address from the line. 
     Currently it is assumed, that the IP address is logged in 
     the first column and the columns are space separated.''' 
    return line.split()[0] 

def increase_count(ip_dict, ip_addr): 
    '''Increases the count of the IP address. 
     If an IP address is not in the given dictionary, 
     it is initially created and the count is set to 1.''' 
    if ip_addr in ip_dict: 
     ip_dict[ip_addr] += 1 
    else: 
     ip_dict[ip_addr] = 1 

def read_ips(infilename, url): 
    '''Read the IP addresses from the file and store (count) 
     them in a dictionary - returns the dictionary.''' 
    res_dict = {} 
    log_file = file(infilename) 
    for line in log_file: 
     if line.isspace(): 
      continue 
     if not urlcheck(line, url): 
      continue 
     ip_addr = extract_ip(line) 
     increase_count(res_dict, ip_addr) 
    return res_dict 

def write_ips(outfilename, ip_dict): 
    '''Write out the count and the IP addresses.''' 
    out_file = file(outfilename, "w") 
    for ip_addr, count in ip_dict.iteritems(): 
     out_file.write("%s\t%5d\n" % (ip_addr, count)) 
    out_file.close() 

def parse_cmd_line_args(): 
    '''Return the in and out file name. 
     If there are more or less than two parameters, 
     an error is logged in the program is exited.''' 
    if len(sys.argv)!=4: 
     print("Usage: %s [infilename] [outfilename] [url]" % sys.argv[0]) 
     sys.exit(1) 
    return sys.argv[1], sys.argv[2], sys.argv[3] 

def main(): 
    infilename, outfilename, url = parse_cmd_line_args() 
    ip_dict = read_ips(infilename, url) 
    write_ips(outfilename, ip_dict) 

if __name__ == "__main__": 
    main() 

은 지금은 내가 URL 대신 날짜를 전달하면, 출력 파일이 특정 날짜에 핑을하는 모든 IP를 포함해야하는 방식으로 코드를 수정합니다.

로그 파일 형식은 다음과 같습니다.

220.227.40.118 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
220.227.40.118 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - - 
59.95.13.217 - - [06/Mar/2012:00:00:00 -0800] "GET /dbupdates2.xml HTTP/1.1" 404 0 - - 
111.92.9.222 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
120.56.236.46 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - - 
49.138.106.21 - - [06/Mar/2012:00:00:00 -0800] "GET /add.txt HTTP/1.1" 204 214 - - 
117.195.185.130 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
122.160.166.220 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /welcome.html HTTP/1.1" 204 212 - - 
117.18.231.5 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
117.18.231.5 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
122.169.136.211 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
203.217.145.10 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
117.18.231.5 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - - 
59.95.13.217 - - [06/Mar/2012:00:00:00 -0800] "GET /dbupdates2.xml HTTP/1.1" 404 0 - - 
203.217.145.10 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
117.206.70.4 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /css/epic.css HTTP/1.1" 204 214 "http://www.epicbrowser.com/welcome.html" - 
117.206.70.4 - - [06/Mar/2012:00:00:00 -0800] "GET /add.txt HTTP/1.1" 204 214 - - 
117.206.70.4 - - [06/Mar/2012:00:00:00 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - - 
118.97.38.130 - - [06/Mar/2012:00:00:00 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /js/flash_detect_min.js HTTP/1.1" 304 0 "http://www.epicbrowser.com/welcome.html" - 
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/home-page-bottom.jpg HTTP/1.1" 304 0 "http://www.epicbrowser.com/welcome.html" - 
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/Facebook_Like.png HTTP/1.1" 204 214 "http://www.epicbrowser.com/welcome.html" - 
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/Twitter_Follow.png HTTP/1.1" 204 214 "http://www.epicbrowser.com/welcome.html" - 
117.214.20.28 - - [06/Mar/2012:00:00:00 -0800] "GET /images/home-page-top.jpg HTTP/1.1" 304 0 "http://www.epicbrowser.com/welcome.html" - 
49.138.106.21 - - [06/Mar/2012:00:00:01 -0800] "GET /dbupdates2.xml HTTP/1.1" 404 0 - - 
117.18.231.5 - - [06/Mar/2012:00:00:01 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
117.18.231.5 - - [06/Mar/2012:00:00:01 -0800] "GET /hrefadd.xml HTTP/1.1" 204 214 - - 
120.61.182.186 - - [06/Mar/2012:00:00:01 -0800] "GET /mysidebars/newtab.html HTTP/1.1" 404 0 - - 
+1

가능한 중복 [특정 IP가 URL로 핑 횟수를 찾는 방법?] (http://stackoverflow.com/questions/9678609/how-to-find-how-many-times -a-specific-ip-is-pinged-the-url) – Marcin

+2

전체 스크립트를 작성하려면 stackoverflow 사용을 중단하십시오. – Marcin

답변

2

grep, cut, sort 및 uniq의 문제점은 무엇입니까?

grep "\[07/Mar/2012" logfile.txt | cut -d " " -f 1 | sort | uniq 
관련 문제