2013-12-12 2 views
2

데이터 분석을 미리 생각하지 않은 사람으로부터 프로젝트를 상속 받았습니다. 결과적으로 여러 유형의 중괄호와 그룹 데이터에 대한 다양한 임베딩 정도, 괄호로 숫자를 구분하는 쉼표 등 여러 구분 기호로 데이터 파일을 출력했습니다. 또한 좋은 측정을 위해 평범한 텍스트 문장이 있습니다.여러 구분 기호 및 포함 된 중괄호를 구문 분석 R

누구든지 포함 된 구조와 묘사를 데이터 프레임으로 바꾸는 간단한 방법을 도울 수 있습니까 R? 이상적으로는 첫 번째 행 3 열의 데이터 프레임, 두번째 라인 무시 3 행 4 정수 벡터로 전환 될

[(3, None, 1), (1, 0.36, 1), (3, None, 1), (2, 0.41, 1), (5, 0.47, 1), (6, 0.36, 1), (2, 0.45, 1), (2, 0.36, 1), (4, 0.39, 1), (6, 0.34, 1), (1, 0.47, 1), (7, 0.44, 1), (4, 0.39, 1), (6, 0.38, 1), (9, 0.39, 1), (5, 0.37, 1), (8, 0.41, 1), (9, 0.38, 1), (1, 0.44, 1), (9, 0.38, 1), (4, 0.36, 1), (8, 0.41, 1), (7, 0.38, 1), (7, 0.41, 1), (7, 0.36, 1), (7, 0.39, 1), (9, 0.41, 1), (5, 0.36, 1), (8, 0.31, 1), (6, 0.38, 1), (1, 0.44, 1), (3, None, 1), (5, 0.59, 1), (7, 0.52, 1), (7, 0.44, 1), (7, 0.38, 1), (8, 0.34, 1), (9, 0.39, 1), (3, None, 1), (7, 0.44, 1), (7, 0.53, 1), (8, 0.36, 1), (3, 0.36, 0), (8, 0.34, 1), (5, 0.38, 1), (3, None, 1), (5, 0.52, 1), (3, None, 1), (9, 0.55, 1), (9, 0.36, 1), (4, 0.38, 1), (2, 0.73, 1), (9, 0.36, 1), (7, 0.44, 1), (4, 0.45, 1), (4, 0.62, 1), (9, 0.39, 1), (3, 0.31, 0), (1, 0.42, 1), (4, 0.34, 1), (5, 0.53, 1), (8, 0.34, 1), (3, None, 1), (8, 0.47, 1), (6, 0.39, 1), (1, 0.42, 1), (5, 0.53, 1), (1, 0.53, 1), (8, 0.62, 1), (1, 0.39, 1), (8, 0.44, 1), (8, 0.45, 1), (9, 0.38, 1), (1, 0.36, 1), (4, 0.38, 1), (6, 0.36, 1), (7, 0.36, 1), (9, 0.39, 1), (8, 0.41, 1), (8, 0.31, 1), (3, None, 1), (2, 0.36, 1), (4, 0.36, 1), (2, 0.31, 1), (9, 0.36, 1), (1, 0.31, 1), (4, 0.34, 1), (1, 0.56, 1), (7, 0.61, 1), (9, 0.38, 1), (3, None, 1), (1, 0.36, 1), (1, 0.53, 1), (5, 0.33, 1), (3, None, 1), (1, 0.39, 1), (6, 0.34, 1), (9, 0.33, 1), (4, 0.38, 1), (3, None, 1), (5, 0.44, 1), (2, 0.52, 1), (1, 0.42, 1), (6, 0.38, 1), (9, 0.33, 1), (4, 0.38, 1), (5, 0.31, 1), (6, 0.31, 1), (8, 0.31, 1), (2, 0.33, 1), (9, 0.33, 1), (1, 0.56, 1), (6, 0.38, 1), (3, None, 1), (7, 0.34, 1), (5, 0.34, 1), (2, 0.36, 1), (2, 0.47, 1), (3, None, 1), (2, 0.39, 1), (2, 0.36, 1), (6, 0.31, 1), (1, 0.53, 1), (5, 0.45, 1), (7, 0.42, 1), (5, 0.45, 1), (2, 0.39, 1), (2, 0.45, 1), (6, 0.36, 1), (2, 0.45, 1), (1, 0.39, 1), (1, 0.34, 1), (4, 0.39, 1), (2, 0.34, 1), (2, 0.31, 1), (3, 0.31, 0), (8, 0.39, 1), (6, 0.34, 1), (6, 0.31, 1), (5, 0.38, 1), (9, 0.34, 1), (7, 0.31, 1), (1, 0.33, 1), (4, 0.38, 1), (6, 0.38, 1), (5, 0.38, 1), (9, 0.38, 1), (2, 0.5, 1), (8, 0.44, 1), (8, 0.39, 1), (4, 0.38, 1), (5, 0.5, 1), (9, 0.48, 1), (2, 0.59, 1), (8, 0.41, 1), (7, 0.41, 1), (3, None, 1), (4, 0.5, 1), (4, 0.36, 1), (7, 0.38, 1), (5, 0.44, 1), (6, 0.34, 1), (6, 0.41, 1), (3, None, 1), (7, 0.39, 1), (6, 0.34, 1), (2, 0.34, 1), (9, 0.36, 1), (4, 0.36, 1), (5, 0.38, 1), (3, None, 1), (6, 0.36, 1), (5, 0.33, 1), (4, 0.44, 1), (7, 0.34, 1), (8, 0.48, 1), (6, 0.34, 1), (8, 0.38, 1), (3, None, 1), (4, 0.31, 1), (3, 0.31, 0)] 
Percentage of correctly suppressed responses per five-target section: 
[80, 80, 100, 80] 
Average reaction time per five-target section: 
[0.4, 0.43, 0.39, 0.39] 
Percentage of correctly suppressed responses per ten-target section: 
[80, 90] 
Average reaction time per ten-target section: 
[0.41, 0.39] 

답변

2

사용 readLines 행 : 여기

샘플이고 당신의 데이터를 얻을, 다음 gsubstrsplit 모든을 제압 :

#txt <- readLines(textConnection("<insert your text here>")) 
#or probably more appropriately 
txt <- readLines("filename.txt") 

# remove labels 
txt <- txt[-c(2,4,6,8)] 

# remove first [ character 
txt <- lapply(txt,function(x) substr(x,2,nchar(x)-1)) 

# reformat element 1 
txt[[1]] <- gsub("[()]","",txt[[1]]) 
txt[[1]] <- gsub("None","0",txt[[1]]) 
txt[[1]] <- as.numeric(unlist(strsplit(txt[[1]],","))) 
txt[[1]] <- data.frame(matrix(txt[[1]],ncol=3,byrow=TRUE)) 

# reformat elements 2-5 
txt[2:5] <- lapply(txt[2:5],function(x) as.numeric(unlist(strsplit(x,",")))) 

결과 :

txt 

#[[1]] 
# X1 X2 X3 
#1 3 0.00 1 
#2 1 0.36 1 
#3 3 0.00 1 
#4 2 0.41 1 
#5 5 0.47 1 
#6 6 0.36 1 
# etc... etc... 
# 
#[[2]] 
#[1] 80 80 100 80 
# 
#[[3]] 
#[1] 0.40 0.43 0.39 0.39 
# 
#[[4]] 
#[1] 80 90 
# 
#[[5]] 
#[1] 0.41 0.39 
+0

완벽합니다. 감사합니다. 나중에 누구에게나 작은 편집 - 파일 이름에서 텍스트를 읽는 경우, 다음과 같아야합니다. 'txt <- readLines ("filename.txt") #ie textConnection()'제거 – jzadra

관련 문제