Chuanshuoge: python machine learning 2

//cmd

(tensor) C:\Users\bob>cd C:\Users\bob\python-machineLearning

(tensor) C:\Users\bob\python-machineLearning>pip install sklearn
Collecting sklearn
Downloading https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz
Requirement already satisfied: scikit-learn in c:\users\bob\anaconda3\lib\site-packages (from sklearn) (0.21.2)
Requirement already satisfied: joblib>=0.11 in c:\users\bob\anaconda3\lib\site-packages (from scikit-learn->sklearn) (0.13.2)
Requirement already satisfied: scipy>=0.17.0 in c:\users\bob\anaconda3\lib\site-packages (from scikit-learn->sklearn) (1.2.1)
Requirement already satisfied: numpy>=1.11.0 in c:\users\bob\anaconda3\lib\site-packages (from scikit-learn->sklearn) (1.16.4)
Building wheels for collected packages: sklearn
Building wheel for sklearn (setup.py) ... done
Stored in directory: C:\Users\bob\AppData\Local\pip\Cache\wheels\76\03\bb\589d421d27431bcd2c6da284d5f2286c8e3b2ea3cf1594c074
Successfully built sklearn
Installing collected packages: sklearn
Successfully installed sklearn-0.0

(tensor) C:\Users\bob\python-machineLearning>pip install pandas
Requirement already satisfied: pandas in c:\users\bob\anaconda3\lib\site-packages (0.24.2)
Requirement already satisfied: python-dateutil>=2.5.0 in c:\users\bob\anaconda3\lib\site-packages (from pandas) (2.8.0)
Requirement already satisfied: numpy>=1.12.0 in c:\users\bob\anaconda3\lib\site-packages (from pandas) (1.16.4)
Requirement already satisfied: pytz>=2011k in c:\users\bob\anaconda3\lib\site-packages (from pandas) (2019.1)
Requirement already satisfied: six>=1.5 in c:\users\bob\appdata\roaming\python\python37\site-packages (from python-dateutil>=2.5.0->pandas) (1.12.0)

(tensor) C:\Users\bob\python-machineLearning>pip install numpy
Requirement already satisfied: numpy in c:\users\bob\anaconda3\lib\site-packages (1.16.4)

-------------------------------------
go to https://archive.ics.uci.edu/ml/datasets/Student+Performance -> click data folder -> student.zip -> extract -> copy student-mat.csv to project directory

#test

import pandas as pd
import numpy as np
import keras
import sklearn
from sklearn import linear_model
from sklearn.utils import shuffle

data = pd.read_csv("student-mat.csv", sep=";")
#print top 5 of original data
print(data.head())

data = data[['G1','G2','G3','studytime','failures','absences']]
#print top 5 of reduced data fields
print(data.head())

-------------------------------------
#logs
G1 - first period grade (numeric: from 0 to 20)
G2 - second period grade (numeric: from 0 to 20)
G3 - final grade (numeric: from 0 to 20, output target)

school sex age address famsize Pstatus ... Walc health absences G1 G2 G3
0 GP F 18 U GT3 A ... 1 3 6 5 6 6
1 GP F 17 U GT3 T ... 1 3 4 5 5 6
2 GP F 15 U LE3 T ... 3 3 10 7 8 10
3 GP F 15 U GT3 T ... 1 5 2 15 14 15
4 GP F 16 U GT3 T ... 2 5 4 6 10 10

[5 rows x 33 columns]
G1 G2 G3 studytime failures absences
0 5 6 6 2 0 6
1 5 5 6 2 0 4
2 7 8 10 2 3 10
3 15 14 15 3 0 2
4 6 10 10 2 0 4

----------------------------------------

#test continue

#'G3' is the column we are predicting using the rest of the columns
predict = 'G3'

#x is the array of columns in data other than column name 'G3'
x = np.array(data.drop([predict],1))
#y is column 'G3'
y = np.array(data[predict])

#x_test, y_test are the real data value
#x_train, y_train are the 10% of the data set used for prediction, prediction will generate a math model, x_test, y_test are used tor testing the accuracy of the model.
x_train, x_text, y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.1)

#prediction method
linear = linear_model.LinearRegression()
linear.fit(x_train, y_train)

#prediction success rate
acuracy = linear.score(x_text, y_test)
print(acuracy)

#predicted slope and intercept
print("Co: \n", linear.coef_)
print("Intercept: \n", linear.intercept_)

predictions = linear.predict(x_text)

#print real data set with predicted value
for x in range(len(predictions)):
print(predictions[x], x_text[x], y_test[x])

---------------------------------------
#logs

#accuracy
0.8131211978854692
Co:
[ 0.16246545 0.97828472 -0.1944719 -0.26809678 0.03586954]
Intercept:
-1.5963742429164807

#predicted value, variables for predictions, real value
3.6128014305152814 [6 5 2 1 0] 0
7.035051917962379 [9 8 2 1 0] 0
10.249288008311625 [12 10 2 0 14] 11
13.346607601626296 [13 13 2 0 14] 14
6.985943972123177 [9 8 4 0 2] 8
10.031965729151835 [10 10 2 0 17] 10
9.565661730937075 [10 10 2 0 4] 10
12.746005563221631 [14 12 2 0 20] 13
-1.7828092550568728 [5 0 1 3 0] 0
7.01795042719044 [8 8 3 0 2] 10
11.031342787551868 [13 11 2 0 4] 11
8.424911568454709 [9 9 2 0 4] 10
9.74698670281766 [11 10 2 1 12] 10
5.954256371206496 [ 7 6 2 0 26] 6
12.804573674551168 [11 13 1 1 10] 13
8.69112254723661 [9 9 1 0 6] 10
15.360138913766692 [16 15 2 0 2] 15
5.861771966746274 [ 7 6 1 0 18] 6
14.251395206351592 [14 14 1 0 2] 14
9.331457207222437 [ 9 10 2 0 2] 10
8.32116603673902 [10 9 3 0 2] 9
13.934190478733061 [14 14 3 0 4] 14
9.033239773373271 [10 10 4 0 0] 10
7.212422329323918 [8 8 2 0 2] 8
12.183311149567002 [10 13 4 0 6] 13
11.807429434320646 [11 12 1 0 2] 11
13.150377665450387 [14 13 2 0 4] 13
7.335155154808971 [8 8 1 0 0] 11
6.532622843513421 [ 7 8 2 3 10] 10
15.647095220360393 [16 15 2 0 10] 15
7.110434831650663 [ 8 8 4 0 10] 8
9.208724381737385 [ 9 10 3 0 4] 10
9.422183577640226 [10 10 2 0 0] 10
18.78238940121379 [19 18 2 0 2] 18
-0.9216847973657908 [7 0 1 1 0] 0
9.65626033239987 [10 10 2 1 14] 9
7.967787683346893 [10 8 2 0 14] 9
9.291724585641276 [ 8 10 1 0 0] 11
9.331457207222437 [ 9 10 2 0 2] 9
7.465614146807921 [10 8 2 0 0] 9

--------------------------------------
reference:
https://www.youtube.com/watch?v=1BYu65vLKdA

Chuanshuoge

Saturday, 31 August 2019

python machine learning 2 - Linear Regression

1 comment:

Chuanshuoge, Calgary, Canada, Earth, Solar system, Milky Way Galaxy

_ChuanShuo^Ge - _传说^哥