2013-07-02 5 views
2

데이터 집합을 거치고 데이터베이스 (포스트 그레스)의 기능을 계산하려고합니다.Python 코드가 무작위로 고정됩니다.

문제는 때마다 어딘가에서 멈추고 (데이터베이스 로그에서 확인할 수 있습니다. 오랜 시간 동안 새로운 쿼리가 실행되지 않는 곳에서 확인되었습니다), Ctrl + C를 누르면 프로그램 정상적으로 다시 시작하는 것 같습니다 (많은 행이 있기 때문에 정확한 계산인지 아직 확인하지 못했습니다). 같은 위치에서 멈추지는 않지만 임의의 패턴을 가진 것처럼 보입니다. 내가 뭘 잘못하고 있을지 모르겠다.

저는 main.py와 NAC.py라는 두 개의 파일이 있습니다.

main.py :

import NAC 
from dateutil.parser import parse 
from datetime import timedelta 
rows = fc.Read_CSV_to_Dict(input_file) #just a wrapper around csv.Dictreader 
i=0 
start_time = time.time() 
for row in rows : #rows has about 600,000 rows 
    ret1,ret2 = NAC.function(row['key1'], ...) #and other parameters 
    #new keys 
    row['newKey1'],row['newKey2'] = ret1 
    row['newKey3'],row['newKey4'] = ret2 #unpacking 
    i=i+1 
    if(i%10000==0): #progress monitor 
     print i 
print (time.time()-start_time)/60 
NAC.db_close() 

NAC.py :

from dateutil.parser import parse 
from datetime import timedelta 
import psycopg2 
import psycopg2.extras 

def function(param1, ...): 
    """  
    Returns: 
     2 element list, each a list by itself 
    """ 
    nsclist = [0]*param2_count 
    naclist = [0]*param2_count 
    for i in range(param2_count): 
     stime = (begintime + timedelta(seconds = 60*intervalPeriod * i)) 
     etime = (begintime + timedelta(seconds = 60*intervalPeriod * (i+1))) 
     table1_query = "select sum(count)from table1 where column1= '{0}' and column2>'{1}'::TIMESTAMP WITH TIME ZONE and column2<='{2}'::TIMESTAMP WITH TIME ZONE" 
     cur.execute(sched_query.format(param1,stime,etime)) 
     nsclist[i] = cur.fetchone()[0] 
     if(nsclist[i] == []): 
      nsclist[i] = 0 
     table2_query = "select sum(count)from table2 where column1 = '{0}' and column2 >'{1}'::TIMESTAMP WITH TIME ZONE and column2 <='{2}'::TIMESTAMP WITH TIME ZONE" 
     cur.execute(table2_query .format(param1,stime,etime)) 
     naclist[i] = cur.fetchone()[0] 
     if(naclist[i] == []): 
      naclist[i] = 0 
    return nsclist, naclist 

def db_close(): 
    cur.close() 
    conn.close() 

intervalPeriod = 5 #minutes 
conn = psycopg2.connect(cs.local_connstr) 
cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) 

DB LOG 타임 스탬프 :

2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:26:01 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:29:30 ctl+c pressed (manually added... not in the log) 
2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ... 
2013-07-01 18:29:30 PDT LOG: statement: select sum(count)from ... 

답변

0

이 커서에 문제가 있다고 밝혀졌습니다. 모든 함수 호출시 커서를 열고 닫아야했습니다. 이유가 확실하지 않습니다.

from dateutil.parser import parse 
from datetime import timedelta 
import psycopg2 
import psycopg2.extras 

def function(param1, ...): 
    """  
    Returns: 
     2 element list, each a list by itself 
    """ 
    cur = conn.cursor(cursor_factory=psycopg2.extras.DictCursor) 
    nsclist = [0]*param2_count 
    naclist = [0]*param2_count 
    for i in range(param2_count): 
     table1_query = "select sum(count)from table1 where column1= '{0}' and column2>'{1}'::TIMESTAMP WITH TIME ZONE and column2<='{2}'::TIMESTAMP WITH TIME ZONE" 
     cur.execute(sched_query.format(param1,stime,etime)) 
     nsclist[i] = cur.fetchone()[0] 
     if(nsclist[i] == []): 
      nsclist[i] = 0 
     table2_query = "select sum(count)from table2 where column1 = '{0}' and column2 >'{1}'::TIMESTAMP WITH TIME ZONE and column2 <='{2}'::TIMESTAMP WITH TIME ZONE" 
     cur.execute(table2_query .format(param1,stime,etime)) 
     naclist[i] = cur.fetchone()[0] 
     if(naclist[i] == []): 
      naclist[i] = 0 
    cur.close() 
    return nsclist, naclist 

def db_close(): 
    conn.close() 

intervalPeriod = 5 #minutes 
conn = psycopg2.connect(cs.local_connstr) 
관련 문제