Kaggle - Titanic 데이터 세트 ("train.csv"및 "test.csv"의 경우 URL- https://www.kaggle.com/c/titanic/data)에 대해 순진한 bayes 분류기를 교육하려고합니다. 그러나, '출력'정말 아무것도 포함하지 않는Titanic Kaggle 데이터 세트 Naive Bayes 분류기 오류 R 프로그래밍
library(e1071)
train_d <- read.csv("train.csv", stringsAsFactors = TRUE)
# columns chosen for training data-
# colnames(TD) OR names(TD)
# "Survived", "Pclass", "Sex", "Age", "SibSp", "Parch","Embarked"
train_data <- train_d[, c(2:3, 5:8, 12)]
# to find out which columns contain NA (missing values)-
colnames(train_data)[apply(is.na(train_data), 2, any)]
# mean(TD$age, na.rm = TRUE) # to find mean of 'age' which contains 'NA'
# which(is.na(age))
# fill in missing value (NA) with mean of 'Age' column-
train_data$Age[which(is.na(train_data$Age))] <- mean(train_data$Age, na.rm = TRUE)
# check whether there are any existing NAs-
which(is.na(train_data$Age))
# OR-
colnames(train_data)[apply(is.na(train_data), 2, any)]
test_d <- read.csv("test.csv", stringsAsFactors = TRUE)
# columns chosen for training data-
# "Pclass", "Sex", "Age", "SibSp", "Parch", "Embarked"
test_data <- test_d[, c(2, 4:7, 11)]
# find out missing values (NA)-
colnames(test_data)[apply(is.na(test_data), 2, any)]
# fill in missing value (NA) with mean of 'Age' column-
test_data$Age[which(is.na(test_data$Age))] <- mean(test_data$Age, na.rm = TRUE)
# check whether there are any existing NAs-
which(is.na(train_data$Age))
# OR-
colnames(train_data)[apply(is.na(train_data), 2, any)]
# training a naive-bayes classifier-
titanic_nb <- naiveBayes(Survived ~ Pclass + Sex + Age + SibSp + Parch + Embarked, data = train_data)
# predict using trained naive-bayes classifier-
output <- predict(titanic_nb, test_data, type = "class")
follows-로
내가 지금까지 함께 온 코드입니다. '출력'변수의 출력은
> output
factor(0)
Levels:
무엇이 잘못 되었나요?
감사합니다.
아마도 모든 문자열을 요인으로 변환 한 후 [this] (https://stackoverflow.com/questions/17904190/why-does-naivebayes-return-all-nas-for-multiclass-classification-in-r) 도움이 – akrun