2017-09-22 19 views
0

I는 다음과 같습니다 일부 데이터를 분석하는 Jupyter 노트북을 만들고있어 :Jupyter 노트북 - 파이썬 코드

The Data I'm analyzing

나는이 정보를 알 수 있습니다

The Questions

이를 내가 시도한 것이지만 작동하지 않고 part b를하는 방법을 완전히 망각하고 있습니다.

# Import relevant packages/modules 
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
from scipy import stats 

# Import relevant csv data file 
data = pd.read_csv("C:/Users/Hanna/Desktop/Sheridan College/Statistics for Data Science/Assignment 1/MATH37198_Assignment1_Individual/IGN_game_ratings.csv") 

# Part a: Determine the z-score of "Super Mario Kart" and print out result 
superMarioKart_zscore = data[data['Game']=='Super Mario Kart'] ['Score'].stats.zscore() 
print("Z-score of Super Mario Kart: ", superMarioKart_zscore) 

# Part b: The top 20 (most common) platforms 

# Part c: The average score of all the Shooter games 
averageShooterScore = data[data['Group']=='Game']['Score'].mean() 
# Print output 
print("The average score of all the Shooter games is: ", averageShooterScore) 

# Part d: The top two platforms witht the most perfect scores (10) 

# Part e: The probability of a game randomly selected that is an RPG 
# First find the number of games in the list that is an RPG 
numOfRPGGames = 0 
for game in data['Game']: 
    if data['Genre'] == 'RPG': 
     numOfRPGGames += 1 
# Divide this by the total number of games to find the probablility of selecting one 
print("The probability of selecting a game that is an RPG is: ", numOFRPGGames/totalNumGames) 

# Part f: The probability of a game randomly selected with a score less than 5 
# First find the number of games in the list with a score less than 5 using a for loop: 
numScoresLessThan5 = 0 
for game in data['Game']: 
    if data['Score'] < 5: 
     numScoresLessThan5 += 1 
# Divide this by the total number of games to find the probablility of selecting one 
print("The probability of selecting a game with a score less than 5 is: ", numScoresLessThan5/totalNumGames) 
+2

당신은 각각의 질문들로이 질문을 중단 할 수 있습니다 당신 그런 식으로 더 나은 답변을 얻게 될 것입니다. 특정 질문에 아직 초점을 맞추지 않았다면 [MCVE] (https://stackoverflow.com/help/mcve)를보고, 시도한 것은 무엇이며, 왜 작동하지 않으며 출력을 기대할 수 있습니까? 되려고. – johnchase

답변

0

팬더는 이러한 유형의 문제에 대해 뛰어난 내장 기능을 제공합니다. 다음은 CSV에서 가져온 일부 테스트 데이터를 사용하여 파트 b를 해결하기위한 제안입니다. 내가 사용하는 test.csv은 이러한 필드를 가지고 있지만 여전히 새 파일

Sample CSV structure 열 이름을 변경하고 가져 귀하의 경우에 작동합니다,

# Import relevant packages/modules 
import numpy as np 
import pandas as pd 

# Import a dummy csv data file 
data = pd.read_csv("./test.csv") 
# Visualize the file before the process 
print(data) 

# Extract the column you're interesting in counting 
initial_column = data['Name'] 

# Create object for receiving the output of the value_counts function 
count_object = pd.value_counts(initial_column) 

# Create an empty list for receiving the sorted values 
sorted_grouped_column = [] 

# You determine the number of items. In your exercise is 20. 
number_of_items = 3 
counter = 0 

for i in count_object.keys(): 
    if counter == number_of_items: 
     break 
    else: 
     sorted_grouped_column.append(i) 
     counter = counter + 1 

print(sorted_grouped_column) 
관련 문제