크레이지J의 탐구생활

+ Tensorflow로 XOR 학습을 구현하기

XOR 학습을 수동으로 계산하지 않고, tensorflow의 api를 사용하면 심플해진다.

복잡한 네트웍도 쉽게 구현이 가능하다.

노드의 개수 및 레이어를 2-4-1 로 구성.

입력층에 노드 2개(feature 개수. x1, x2)

은닉층에는 4개

출력층은 1개의 노드. (Y)

back-propagation을 위한 골치 아픈 작업(미분)들을 할 필요없이 api 하나로 학습 가능! activation function도 원하는대로 쉽게 변경하고, 학습 알고리즘은 쉽게 바꿀 수 있다.

#!/usr/bin/env python3

# -*- coding: utf-8 -*-

"""

Created on Tue May 23 14:56:53 2017

@author: crazyj

"""

import numpy as np

import tensorflow as tf

# trainint set

X_train = np.array( [[0,0], [0,1], [1,0], [1,1]])

T_train = np.array( [[0], [1], [1], [0]] )

# placeholder

X = tf.placeholder(tf.float32, [None, 2])

T = tf.placeholder(tf.float32, [None, 1])

# variable

W1 = tf.Variable(tf.truncated_normal([2,4]))

b1 = tf.Variable(tf.zeros([4]))

W2 = tf.Variable(tf.truncated_normal([4,1]))

b2 = tf.Variable(tf.zeros([1]), dtype=tf.float32)

# model

A1 = tf.matmul(X, W1)+b1

Z1 = tf.sigmoid(A1)

A2 = tf.matmul(Z1, W2)+b2

Z2 = tf.sigmoid(A2)

learn_rate = 0.1

Cost = tf.reduce_mean(tf.reduce_sum(tf.square(Z2-T), 1))

train = tf.train.GradientDescentOptimizer(learn_rate).minimize(Cost)

predict = Z2

sess = tf.Session()

sess.run(tf.global_variables_initializer())

for i in range(5000):

_train, _Cost = sess.run([train, Cost], feed_dict={X:X_train, T:T_train})

print( "cost=", _Cost)

_predict = sess.run([predict], feed_dict={X:X_train})

print("predict=", _predict)

print("result=", np.array(np.array(_predict)>=0.5, np.int))

결과

cost= 0.021748

cost= 0.0217359

cost= 0.0217239

predict= [array([[ 0.10571096],

[ 0.86153752],

[ 0.84178925],

[ 0.17739831]], dtype=float32)]

result= [[[0]

[1]

[0]]]

+ 코드 설명

# trainint set

X_train = np.array( [[0,0], [0,1], [1,0], [1,1]])

T_train = np.array( [[0], [1], [1], [0]] )

훈련 데이터는 당연히 xor의 입력 조합에 따른 결과를 훈련시킨다. (0,0) -> 0 , (0,1) -> 1 , (1,0) -> 1 , (1,1)->1

# placeholder

X = tf.placeholder(tf.float32, [None, 2])

T = tf.placeholder(tf.float32, [None, 1])

플레이스 홀더는 텐서플로우내에서 돌아갈 입출력 변수들이라고 보면 된다. 노드에서 변수가 되는 것을 X 입력층은 (?,2) 매트릭스 크기(로우는 임의의 개수, 컬럼은 2개(x1,x2)), 출력층 T는 (n,1) 매트릭스 형태로 선언한다.

# variable

W1 = tf.Variable(tf.truncated_normal([2,4]))

b1 = tf.Variable(tf.zeros([4]))

W2 = tf.Variable(tf.truncated_normal([4,1]))

b2 = tf.Variable(tf.zeros([1]), dtype=tf.float32)

텐서 플로우 변수들을 정의한다. 이것은 그래프에 구성될 노드들의 변수의 형태를 정의한다.

네트웍을 2-4-1로 구성할 것이므로 여기에 따른 weight를 담을 변수와 bias 변수의 형태는 다음과 같다.

2개 노드는 placeholder로 X이고, 2-4연결되는 W(웨이트)는 2행(입력노드개수) 4열(출력노드개수) 매트릭스이다. bias는 4개(출력노드개수)이다.

4-1로 연결되는 파트의 W는 4x1 이렇게 b는 1개 이렇게 구성한다. 초기값들은 랜덤하게 채워준다. bias는 0으로 초기화해준다.

# model

A1 = tf.matmul(X, W1)+b1

Z1 = tf.sigmoid(A1)

A2 = tf.matmul(Z1, W2)+b2

Z2 = tf.sigmoid(A2)

이제 빠진 히든 노드들과 출력층 노드들을 구성한다.

2-4-1네트웍에서 2개는 X, 4개는 A1으로 정하고,

A1 = X x W1 + b1 으로 정의한다.

Z1=은 A1에 활성화함수 sigmoid를 적용한다.

A2=Z1 x W2 + b2로 정의한다. 앞 노드의 출력 결과에 weighted sum이다.

Z2 = A2에 sigmoid를 적용한 것으로 최종 output이다.

learn_rate = 0.1

Cost = tf.reduce_mean(tf.reduce_sum(tf.square(Z2-T), 1))

train = tf.train.GradientDescentOptimizer(learn_rate).minimize(Cost)

predict = Z2

이제 학습방식을 정한다.

cost function은 오차 제곱법을 쓰고, 학습방식은 gradient descent를 사용하여 훈련 그래프를 만든다.

예측값은 최종 노드 출력인 Z2가 된다.

sess = tf.Session()

sess.run(tf.global_variables_initializer())

for i in range(5000):

_train, _Cost = sess.run([train, Cost], feed_dict={X:X_train, T:T_train})

print( "cost=", _Cost)

텐서플로우로 학습을 돌린다. 세션을 만들고, 초기화하여 5000번 학습한다. 그래프의 꼭지인 train을 집어 넣고, feed_dict로 플레이스 홀더 X, T에 훈련 데이터를 넣는다. 학습1회마다 cost값을 출력해 본다.

_predict = sess.run([predict], feed_dict={X:X_train})

학습 완료후, 학습이 잘 되었는지 입력값 X_train을 넣고 출력값을 생성한다.

print("result=", np.array(np.array(_predict)>=0.5, np.int))

최종 결과로 시그모이드 함수에서 0.5이상이면 1로 미만이면 0으로 출력하여 binary 분류를 한다.

'AI(DeepLearning)' 카테고리의 다른 글

[tf] 더 복잡한 함수를 학습해보자 (0)	2017.06.01
[tf] unknown math polynomial function modeling (0)	2017.06.01
[tf] XOR manual solve (0)	2017.05.23
[R] multinomial classification. 다중분류 (0)	2017.05.19
[R] binary classification (0)	2017.05.19

XOR 학습

단층 퍼셉트론으로는 비선형이 학습이 안된다.

따라서 멀티 퍼셉트론을 사용. 입력 레이어를 제외하고 Two-Layer 구성.

Sigmoid를 사용. 0/1 binary구별로 함.

#!/usr/bin/env python3

# -*- coding: utf-8 -*-

"""

Created on Tue May 23 14:10:10 2017

@author: crazyj

"""

import numpy as np

import os

# xor simple network.

# X(2) - 2 - Y(1)

# sigmoid activation function use.

# manual gradient

#

# if fail?, try again!

# local minima problem exists...

# make deep and wide network.

#

X = np.array( [[0,0], [0,1], [1,0], [1,1]])

T = np.array( [[0], [1], [1], [0]] )

np.random.seed(int(os.times()[4]))

W1 = np.random.randn(2,2)

b1 = np.zeros([2])

W2 = np.random.randn(2,1)

b2 = np.zeros([1])

def Sigmoid(X):

return 1/(1+np.exp(-X))

def Predict(X, W1, b1, W2, b2):

Z1 = np.dot(X, W1)+b1

A1 = Sigmoid(Z1)

Z2 = np.dot(A1, W2)+b2

A2 = Sigmoid(Z2)

Y = A2

return Y

def Cost(X, W1, b1, W2, b2, T):

epsil = 1e-5

Z1 = np.dot(X, W1)+b1

A1 = Sigmoid(Z1)

Z2 = np.dot(A1, W2)+b2

A2 = Sigmoid(Z2)

Y = A2

return np.mean(-T*np.log(Y+epsil)-(1-T)*np.log(1-Y+epsil))

def Gradient(learning_rate, X, W1, b1, W2, b2, T):

Z1 = np.dot(X, W1)+b1

A1 = Sigmoid(Z1)

Z2 = np.dot(A1, W2)+b2

A2 = Sigmoid(Z2)

deltaY = A2-T

deltaA1 = np.dot(deltaY, W2.T) * (A1*(1-A1))

m = len(X)

gradW2 = np.dot(A1.T, deltaY)

gradW1 = np.dot(X.T, deltaA1)

W2 = W2-(learning_rate/m)*gradW2

b2 = b2-(learning_rate/m)*np.sum(deltaY)

W1 = W1-(learning_rate/m)*gradW1

b1 = b1-(learning_rate/m)*np.sum(deltaA1)

return (W1, b1, W2, b2)

for i in range(3000):

J= Cost(X,W1,b1,W2,b2,T)

W1,b1,W2,b2 = Gradient(1.0, X, W1, b1, W2, b2, T)

print ("Cost=",J)

Y = Predict(X, W1, b1, W2, b2)

print("predict=", Y)

결과

Cost= 0.351125685078

predict= [[ 0.50057071]

[ 0.49643107]

[ 0.99648031]

[ 0.00640712]]

실패?

다시 실행을 반복하다 보니 성공할때도 있다??? local minima 문제가 있음.

이를 해결하기 위해서는 여러번 시도해서 코스트가 낮아질 때까지 처음부터 반복(initialize 가 중요).하던가 network을 deep & wide하게 설계한다.

Cost= 0.00403719259697

predict= [[ 0.00475473]

[ 0.99634993]

[ 0.99634975]

[ 0.00409427]]

이건 성공 결과.

'AI(DeepLearning)' 카테고리의 다른 글

[tf] unknown math polynomial function modeling (0)	2017.06.01
[tf] XOR tensorflow로 학습구현 (1)	2017.05.23
[R] multinomial classification. 다중분류 (0)	2017.05.19
[R] binary classification (0)	2017.05.19
[R] linear regression Normal Equation (0)	2017.05.19

binary classification은 두 개로 분류하는 것이다.

이 binary classification을 하는 것을 여러개 붙이면 다중 분류도 가능하다.

즉, isAclass() 로 A이냐 아니냐 판단, isBclass()로 B이냐 아니냐 판단. isCclass()로 C이냐 아니냐 하는 판단 모듈들이 있으면, 조합하면 A, or B or C로 분류할 수 있다.

(x1, x2)의 특성을 같는 데이터 X를 세 가지(A,B,C)로 분류한 학습 데이터가 있다고 하자. 새로운 X가 올 경우 학습모델을 갖고 분류를 추정할 수 있다.

아래 그림에서 검은색은 훈련데이터로 이미 A, B, C 분류 결과도 있다. 이를 기반으로 학습을하여 다중 분류 모델을 만들고, 실습데이터(빨간색)로 분류하여 표시한 그래프이다.

원하는대로 적절하게 잘 분류하였다.

훈련 데이터는 (x1,x2) 좌표와 클래스 구분결과를 one hot인코딩한 데이터이다.

아래는 sigmoid만 사용한 방식.

#

# deep learning test

#

# Multinomial Classification... Softmax

# choose learning.... A,B,C

# A=left side, B=bottom side, C=right,up side

PLOTSHOW=TRUE

# training , X1=1 (bias)

X=rbind( c(1,1,1), c(1,1,5), c(1,2,6), c(1,2,3), c(1,4,6),

c(1,2,1), c(1,3,2), c(1,4,2), c(1,6,3), c(1,8,1),

c(1,1,10), c(1,4,8), c(1,6,6), c(1,7,5), c(1,9,3) )

# training result Y

Y=rbind ( c(1,0,0), c(1,0,0), c(1,0,0), c(1,0,0), c(1,0,0),

c(0,1,0), c(0,1,0), c(0,1,0), c(0,1,0), c(0,1,0),

c(0,0,1), c(0,0,1), c(0,0,1), c(0,0,1), c(0,0,1))

# searching parameter, A's W column, B's, C's

W=cbind( c(1,2,3), c(2,3,2), c(3,4,1) )

# drawing

if ( PLOTSHOW ) {

plot(1:10,1:10,type="n")

pchs = vector(length = nrow(X))

pchs[which(Y[,1]==1)]="A"

pchs[which(Y[,2]==1)]="B"

pchs[which(Y[,3]==1)]="C"

points(X[,2], X[,3], pch=pchs)

}

# most high probablity select

Onehot = function(T) {

OH=matrix(0, nrow=nrow(T), ncol=ncol(T))

ohw=apply(T,1,function(x) return(which.max(x)))

for (i in seq(ohw))

OH[i,ohw[i]]=1

return (OH)

}

# logistic function: sigmoid

# G(X,W)=1/(1+e^-z) , z=WX

G = function (X, W) {

Z=X %*% W

G = 1/(1+exp(-Z))

return (G)

}

Cost =function (X, W, Y) {

m = nrow(X)

return ( (-1)/m * sum(Y*log(G(X,W)) + (1-Y)*log(1-G(X,W))) )

}

Gradient = function (X, W, Y, alpha) {

m = nrow(X)

W = W + alpha/m * ( t(X) %*% (((Y-1)*exp(X%*%W)+Y) / (exp(X%*%W)+1)) )

return (W)

}

print( Cost(X, W, Y) )

#learning

alpha=0.1

for ( i in 1 : 600 ) {

W = Gradient(X,W,Y,alpha)

if ( i %% 100==0 ) {

print(paste("cnt=", i, " Cost=", Cost(X,W,Y), " W1(b)=", W[1,1], " W2=", W[2,1], " W3=", W[3,1] ))

}

# test

# classify

xmat = matrix( c(1,1,1, 1,2,4, 1,4,1, 1,9,2, 1,6,8, 1,3,4,

1,8,8, 1,6,6, 1,2,8, 1,9,5), byrow = T, ncol=3 )

qy = G( xmat, W )

print (xmat)

print (qy)

qy2=Onehot(qy)

print(qy2)

# drawing

if ( PLOTSHOW ) {

pchs = vector(length = nrow(xmat))

pchs[which(qy2[,1]==1)]="A"

pchs[which(qy2[,2]==1)]="B"

pchs[which(qy2[,3]==1)]="C"

points(xmat[,2], xmat[,3], pch=pchs, col="red")

}

#dev.off()

아래는 softmax의 확률 함수와 cost를 계산하는 함수는 다음과 같다.

# softmax ; make probablity ; S(yi)=e^yi / Sigma(e^yi)

# cross entropy cost function

# D(S,L) = Sigma Li.* -log(y^)

softmax와 cross entropy로 학습한 방식

#

# deep learning test

#

# Multinomial Classification... Softmax

# choose learning.... A,B,C

# A=left side, B=bottom side, C=right,up side

PLOTSHOW=TRUE

# training , X1=1 (bias)

X=rbind( c(1,1,1), c(1,1,5), c(1,2,6), c(1,2,3), c(1,4,6),

c(1,2,1), c(1,3,2), c(1,4,2), c(1,6,3), c(1,8,1),

c(1,1,10), c(1,4,8), c(1,6,6), c(1,7,5), c(1,9,3) )

# training result Y

Y=rbind ( c(1,0,0), c(1,0,0), c(1,0,0), c(1,0,0), c(1,0,0),

c(0,1,0), c(0,1,0), c(0,1,0), c(0,1,0), c(0,1,0),

c(0,0,1), c(0,0,1), c(0,0,1), c(0,0,1), c(0,0,1))

# searching parameter, A's W column, B's, C's

W=cbind( c(1,2,3), c(2,3,2), c(3,4,1) )

# drawing

if ( PLOTSHOW ) {

plot(1:10,1:10,type="n")

pchs = vector(length = nrow(X))

pchs[which(Y[,1]==1)]="A"

pchs[which(Y[,2]==1)]="B"

pchs[which(Y[,3]==1)]="C"

points(X[,2], X[,3], pch=pchs)

}

# softmax ; make probablity ; S(yi)=e^yi / Sigma(e^yi)

# yi = xw

Softmax = function(X, W) {

T=exp(X%*%W)

sume=apply(T, 1, sum)

return (T/sume)

}

# most high probablity select

Onehot = function(T) {

OH=matrix(0, nrow=nrow(T), ncol=ncol(T))

ohw=apply(T,1,function(x) return(which.max(x)))

for (i in seq(ohw))

OH[i,ohw[i]]=1

return (OH)

}

# cross entropy cost function

Cost =function (X, W, Y) {

# D(S,L) = Sigma Li.* -log(y^)

m = nrow(X)

return ( (-1)/m * sum(Y*log(Softmax(X,W)) ) )

}

Gradient = function (X, W, Y, alpha) {

m = nrow(X)

W = W - alpha/m * ( t(X) %*% (Softmax(X,W)-Y) )

return (W)

}

print( Cost(X, W, Y) )

#learning

alpha=0.1

for ( i in 1 : 2000 ) {

W = Gradient(X,W,Y,alpha)

if ( i %% 100==0 ) {

print(paste("cnt=", i, " Cost=", Cost(X,W,Y), " W1(b)=", W[1,1], " W2=", W[2,1], " W3=", W[3,1] ))

}

# test

# classify

xmat = matrix( c(1,1,1, 1,2,4, 1,4,1, 1,9,2, 1,6,8, 1,3,4,

1,8,8, 1,6,6, 1,2,8, 1,9,5), byrow = T, ncol=3 )

qy = Softmax( xmat, W )

print (xmat)

print (qy)

qy2=Onehot(qy)

print(qy2)

# drawing

if ( PLOTSHOW ) {

pchs = vector(length = nrow(xmat))

pchs[which(qy2[,1]==1)]="A"

pchs[which(qy2[,2]==1)]="B"

pchs[which(qy2[,3]==1)]="C"

points(xmat[,2], xmat[,3], pch=pchs, col="red")

}

#dev.off()

'AI(DeepLearning)' 카테고리의 다른 글

[tf] XOR tensorflow로 학습구현 (1)	2017.05.23
[tf] XOR manual solve (0)	2017.05.23
[R] binary classification (0)	2017.05.19
[R] linear regression Normal Equation (0)	2017.05.19
[R] linear regression (multi variable) 더하기 학습 (0)	2017.05.11

크레이지J의 탐구생활

전체 글

[tf] XOR tensorflow로 학습구현

'AI(DeepLearning)' 카테고리의 다른 글

[tf] XOR manual solve

'AI(DeepLearning)' 카테고리의 다른 글

[R] multinomial classification. 다중분류

'AI(DeepLearning)' 카테고리의 다른 글

+ Recent posts

티스토리툴바