Saturday 31 August 2019

python machine learning 2 - Linear Regression

//cmd

(tensor) C:\Users\bob>cd C:\Users\bob\python-machineLearning

(tensor) C:\Users\bob\python-machineLearning>pip install sklearn
Collecting sklearn
  Downloading https://files.pythonhosted.org/packages/1e/7a/dbb3be0ce9bd5c8b7e3d87328e79063f8b263b2b1bfa4774cb1147bfcd3f/sklearn-0.0.tar.gz
Requirement already satisfied: scikit-learn in c:\users\bob\anaconda3\lib\site-packages (from sklearn) (0.21.2)
Requirement already satisfied: joblib>=0.11 in c:\users\bob\anaconda3\lib\site-packages (from scikit-learn->sklearn) (0.13.2)
Requirement already satisfied: scipy>=0.17.0 in c:\users\bob\anaconda3\lib\site-packages (from scikit-learn->sklearn) (1.2.1)
Requirement already satisfied: numpy>=1.11.0 in c:\users\bob\anaconda3\lib\site-packages (from scikit-learn->sklearn) (1.16.4)
Building wheels for collected packages: sklearn
  Building wheel for sklearn (setup.py) ... done
  Stored in directory: C:\Users\bob\AppData\Local\pip\Cache\wheels\76\03\bb\589d421d27431bcd2c6da284d5f2286c8e3b2ea3cf1594c074
Successfully built sklearn
Installing collected packages: sklearn
Successfully installed sklearn-0.0

(tensor) C:\Users\bob\python-machineLearning>pip install pandas
Requirement already satisfied: pandas in c:\users\bob\anaconda3\lib\site-packages (0.24.2)
Requirement already satisfied: python-dateutil>=2.5.0 in c:\users\bob\anaconda3\lib\site-packages (from pandas) (2.8.0)
Requirement already satisfied: numpy>=1.12.0 in c:\users\bob\anaconda3\lib\site-packages (from pandas) (1.16.4)
Requirement already satisfied: pytz>=2011k in c:\users\bob\anaconda3\lib\site-packages (from pandas) (2019.1)
Requirement already satisfied: six>=1.5 in c:\users\bob\appdata\roaming\python\python37\site-packages (from python-dateutil>=2.5.0->pandas) (1.12.0)

(tensor) C:\Users\bob\python-machineLearning>pip install numpy
Requirement already satisfied: numpy in c:\users\bob\anaconda3\lib\site-packages (1.16.4)

-------------------------------------
go to https://archive.ics.uci.edu/ml/datasets/Student+Performance -> click data folder -> student.zip -> extract -> copy student-mat.csv to project directory

#test

import pandas as pd
import numpy as np
import keras
import sklearn
from sklearn import  linear_model
from sklearn.utils import shuffle

data = pd.read_csv("student-mat.csv", sep=";")
#print top 5 of original data
print(data.head())

data = data[['G1','G2','G3','studytime','failures','absences']]
#print top 5 of reduced data fields
print(data.head())

-------------------------------------
#logs
G1 - first period grade (numeric: from 0 to 20)
G2 - second period grade (numeric: from 0 to 20)
G3 - final grade (numeric: from 0 to 20, output target)

  school sex  age address famsize Pstatus  ...  Walc  health absences  G1  G2  G3
0     GP   F   18       U     GT3       A  ...     1       3        6   5   6   6
1     GP   F   17       U     GT3       T  ...     1       3        4   5   5   6
2     GP   F   15       U     LE3       T  ...     3       3       10   7   8  10
3     GP   F   15       U     GT3       T  ...     1       5        2  15  14  15
4     GP   F   16       U     GT3       T  ...     2       5        4   6  10  10

[5 rows x 33 columns]
   G1  G2  G3  studytime  failures  absences
0   5   6   6          2         0         6
1   5   5   6          2         0         4
2   7   8  10          2         3        10
3  15  14  15          3         0         2
4   6  10  10          2         0         4

----------------------------------------

#test continue

#'G3' is the column we are predicting using the rest of the columns
predict = 'G3'

#x is the array of columns in data other than column name 'G3'
x = np.array(data.drop([predict],1))
#y is column 'G3'
y = np.array(data[predict])

#x_test, y_test are the real data value
#x_train, y_train are the 10% of the data set used for prediction, prediction will generate a math model, x_test, y_test are used tor testing the accuracy of the model.
x_train, x_text, y_train, y_test = sklearn.model_selection.train_test_split(x,y,test_size=0.1)

#prediction method
linear = linear_model.LinearRegression()
linear.fit(x_train, y_train)

#prediction success rate
acuracy = linear.score(x_text, y_test)
print(acuracy)

#predicted slope and intercept
print("Co: \n", linear.coef_)
print("Intercept: \n", linear.intercept_)

predictions = linear.predict(x_text)

#print real data set with predicted value
for x in range(len(predictions)):
    print(predictions[x], x_text[x], y_test[x])

---------------------------------------
#logs

#accuracy
0.8131211978854692
Co:
 [ 0.16246545  0.97828472 -0.1944719  -0.26809678  0.03586954]
Intercept:
 -1.5963742429164807

#predicted value, variables for predictions, real value
3.6128014305152814 [6 5 2 1 0] 0
7.035051917962379 [9 8 2 1 0] 0
10.249288008311625 [12 10  2  0 14] 11
13.346607601626296 [13 13  2  0 14] 14
6.985943972123177 [9 8 4 0 2] 8
10.031965729151835 [10 10  2  0 17] 10
9.565661730937075 [10 10  2  0  4] 10
12.746005563221631 [14 12  2  0 20] 13
-1.7828092550568728 [5 0 1 3 0] 0
7.01795042719044 [8 8 3 0 2] 10
11.031342787551868 [13 11  2  0  4] 11
8.424911568454709 [9 9 2 0 4] 10
9.74698670281766 [11 10  2  1 12] 10
5.954256371206496 [ 7  6  2  0 26] 6
12.804573674551168 [11 13  1  1 10] 13
8.69112254723661 [9 9 1 0 6] 10
15.360138913766692 [16 15  2  0  2] 15
5.861771966746274 [ 7  6  1  0 18] 6
14.251395206351592 [14 14  1  0  2] 14
9.331457207222437 [ 9 10  2  0  2] 10
8.32116603673902 [10  9  3  0  2] 9
13.934190478733061 [14 14  3  0  4] 14
9.033239773373271 [10 10  4  0  0] 10
7.212422329323918 [8 8 2 0 2] 8
12.183311149567002 [10 13  4  0  6] 13
11.807429434320646 [11 12  1  0  2] 11
13.150377665450387 [14 13  2  0  4] 13
7.335155154808971 [8 8 1 0 0] 11
6.532622843513421 [ 7  8  2  3 10] 10
15.647095220360393 [16 15  2  0 10] 15
7.110434831650663 [ 8  8  4  0 10] 8
9.208724381737385 [ 9 10  3  0  4] 10
9.422183577640226 [10 10  2  0  0] 10
18.78238940121379 [19 18  2  0  2] 18
-0.9216847973657908 [7 0 1 1 0] 0
9.65626033239987 [10 10  2  1 14] 9
7.967787683346893 [10  8  2  0 14] 9
9.291724585641276 [ 8 10  1  0  0] 11
9.331457207222437 [ 9 10  2  0  2] 9
7.465614146807921 [10  8  2  0  0] 9

--------------------------------------
reference:
https://www.youtube.com/watch?v=1BYu65vLKdA

1 comment:

  1. Usually I never comment on blogs but your article is so convincing that I never stop myself to say something about it. You’re doing a great job Man learn Python Online Course

    ReplyDelete