2017-02-09 1 views
0

내 출력 데이터는 다음 코드 내가 사용하고 내가웹 스크래핑 - R, bind_rows_ 오류 사용 (X는 .ID)

enter image description here

을 첨부 한 이미지와 유사해야하지만지고있어 오류

Error in bind_rows_(x, .id) : 
    Can not automatically convert from character to integer in column "Runs" 

내가 샘플 코드

require(rvest) 
require(tidyverse) 

urls <- c("http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=1;innings_number=1;orderby=start;result=1;template=results;type=batting;view=match", 
"http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=2;innings_number=2;orderby=start;result=1;template=results;type=batting;view=match", 
"http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=1;innings_number=1;orderby=start;result=2;template=results;type=batting;view=match", 
"http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=1;innings_number=2;orderby=start;result=2;template=results;type=batting;view=match", 
"http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=2;innings_number=1;orderby=start;result=2;template=results;type=batting;view=match", 
"http://stats.espncricinfo.com/ci/engine/player/326016.html?class=2;filter=advanced;floodlit=2;innings_number=2;orderby=start;result=2;template=results;type=batting;view=match" 
) 

extra_cols <- list(tibble("Team"="IND","Player"="B.Kumar","won"=1,"lost"=0,"D"=1,"D/N"=0,"innings"=1,"Format"="ODI"), 
        tibble("Team"="IND","Player"="B.Kumar","won"=1,"lost"=0,"D"=0,"D/N"=1,"innings"=2,"Format"="ODI"), 
        tibble("Team"="IND","Player"="B.Kumar","won"=0,"lost"=1,"D"=1,"D/N"=0,"innings"=1,"Format"="ODI"), 
        tibble("Team"="IND","Player"="B.Kumar","won"=0,"lost"=1,"D"=1,"D/N"=0,"innings"=2,"Format"="ODI"), 
        tibble("Team"="IND","Player"="B.Kumar","won"=0,"lost"=1,"D"=0,"D/N"=1,"innings"=1,"Format"="ODI"), 
        tibble("Team"="IND","Player"="B.Kumar","won"=0,"lost"=1,"D"=0,"D/N"=1,"innings"=2,"Format"="ODI") 
) 

doc <- map(urls, read_html) %>% 
    map(html_node, ".engineTable:nth-child(5)") 



keep <- map_lgl(doc, ~class(.) != "xml_missing") #### condition to exclude when web urls return "NO Records"### 

table<-map(doc[keep], html_table, fill = TRUE) %>% 
    map2_df(extra_cols[keep], cbind) 
+0

문제는 마지막 작업'map2_df '에 있습니다. 단순히'map2_df'를'map2'와 바꾸면 작동합니다. – GGamba

+0

@ G.Gamba 데이터 프레임에 혼합 된 데이터 형식이 필요합니다. 예 : list (Bat1 = c ("0 *", "1 *", "DNB", "DNB", "DNB", "DNB", : – chdeepak96

답변

0

문제 다음 사용하고 있습니다 인 "-"일부 시간은 Runs 열입니다. 따라서 "-"가없는 경우 html_table은 정수 열로 해석합니다. "-"로 해석됩니다.

분명히 "-"는 NA으로 해석되어야합니다. 이것은 다음과 같이 type_convert에 의해 달성 될 수있다 :

table<-map(doc[keep], html_table, fill = TRUE) %>% 
    map(type_convert, na = c("", NA, "-")) %>% 
    map2_df(extra_cols[keep], cbind) 

na = c("", NA, "-")지도 "-"NA에.

+0

고맙습니다. – chdeepak96