Title: Ramon Casas and Pere Romeu on a Tandem
Creator: Ramon Casas
Creator Lifespan: 1866 - 1932
Creator Death Place: Barcelona
Creator Birth Place: Barcelona
Date Created: 1897
Location Created: Barcelona
Tag: Genre and society, Portrait
Physical Dimensions: w2155 x h1880 cm
Inventory number: 069806-000
Artwork history: Acquired 1964
Type: Painting
前言
softmax回归是logistic regression的推广
Logistic Regression是求2个后验概率:
$P(y=0|x;\theta)$ 和 $P(y=1|x;\theta)$
而Softmax Regerssion则是求k个概率,k即为分类的个数,最后取最大的概率
Softmax函数
在直接写softmax regression公式之前,先来介绍一下softmax函数
softmax function,又称归一化指数函数, 是logistic function的一种推广。该函数应用广泛,在多项式逻辑回归、多项线性判别分析、朴素贝叶斯分类、人工神经网络、深度学习等多种基于概率的多分类问题中都有应用。
其表达式为:
$
\sigma (z)_j = \frac{e^{z_j}}{\sum_{k=1}^{K}e^{z_k}} 1; j=1,2,3,…,K.
$
Hypothesis
假设:
$
h_\theta(x) = \left[
\begin{matrix}
P(y=1|x; \theta) \
P(y=2|x; \theta) \
\cdots \
P(y=k|x; \theta)
\end{matrix}
\right]
= \frac{1}{\sum_{j=1}^{K}e^{\theta_j^Tx}} \left[
\begin{matrix}
e^{\theta_1^Tx} \
e^{\theta_2^Tx} \
\cdots \
e^{\theta_k^Tx} \
\end{matrix}
\right]
$
- 保证所有概率之和等于1
Cost
- 同logistic,也采用对数损失函数
loss:
$
\begin{aligned}
J_\theta(x) = cost(h_\theta(x), y) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{K} I\{y^{(i)}=j\} log{\frac{e^{\theta_j^Tx^{(i)}}}{\sum_{k=1}^{K}e^{\theta_k^Tx^{(i)}}}} \right] + \frac{\lambda}{2}\theta^2
\end{aligned}
$
- 其中$I\{y^{(i)}=j\}$指的是:当{}内为真时,I值为1;当{}内为假时,I值为0
- ${\frac{e^{\theta_j^Tx^{(i)}}}{\sum_{k=1}^{K}e^{\theta_k^Tx^{(i)}}}}$项即为$h_\theta(x)$
- $\frac{\lambda}{2}\theta^2$为附加的正则项,即regularization项
Optimization
问题即求cost最小时的$\theta$的取值
这里采用Gradient Descent方法来求解$minimize(loss)$
令:
$W = (\omega_1, \omega_2, \omega_3, …, \omega_n, b)^T $
$X = (x_1, x_2, x_3, …, x_n, 1)^T $
$Y = (y_1, y_2, y_3, …, y_n)^T $
梯度下降法,先求梯度,即对$\theta$求导数得:
$
\frac{\partial J(\theta)}{\partial \theta} = - \frac{1}{m} \sum_{j=1}^{m} \left[x^{(i)}(I\{y^{(i)} = j\} - P(y^{(i)}=j|x^{(i)}; \theta)) \right] + \lambda \theta_j
$
其中:
$
P(y^{(i)}=j|x^{(i)}; \theta)) = log \frac{e^{\theta_j^Tx^{(i)}}}{\sum_{n=1}^{K}e^{\theta_n^Tx^{(i)}}}
$
沿着梯度下降更新$\theta$的值为:
$
\theta = \theta - \alpha \frac{\partial J(\theta)}{\partial \theta}
$
代码实现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
# coding=UTF-8
'''
@Author: Yewbs Wei
@Date: 2019-08-06 15:54:59
@LastEditors: Yewbs Wei
@LastEditTime: 2019-08-06 23:22:17
@Description: 使用梯度下降方法实现Softmax Regression
'''
import numpy as np
from sklearn import datasets
class SoftmaxRegression():
def __init__(self, X_train, Y_train, learning_rate=0.01, epochs=10000, threshold=0.0001, lam=0.1):
self.X_train = X_train
self.Y_train = Y_train
self.theta = np.random.rand(self.X_train.shape[1] + 1, len(np.unique(self.Y_train)))
self.grad = None
self.learning_rate = learning_rate
self.epochs = epochs
self.threshold = threshold
self.cost = None
self.lam = lam
def softmax(self, x):
'''
@description: 计算softmax函数
@param {type}
@return:
'''
return (np.exp(x).T / np.sum(np.exp(x), axis=1)).T
def softmax_hypo(self, x):
'''
@description: 计算hypothesis
x = (x1, x2, x3, ..., xn, 1)
@param {type}
@return:
'''
xx = np.dot(np.c_[np.ones(x.shape[0]), x], self.theta)
return self.softmax(xx)
def onthot(self, y):
'''
@description: 把y标签转换为onehot形式的向量
@param {type}
@return:
'''
n = len(np.unique(y))
m = y.shape[0]
b = np.zeros((m, n))
for i in range(m):
b[i,y[i]] = 1
return b
def softmax_cost(self, x, y):
'''
@description: 计算损失
@param {type}
@return:
'''
y_onehot = self.onthot(y)
self.cost = - 1.0 / y.shape[0] * np.sum(np.dot(y_onehot.T, np.log(self.softmax_hypo(x)))) \
+ self.lam/2.0 * np.sum(self.theta * self.theta)
def softmax_grad(self, x, y):
'''
@description: 计算梯度
@param {type}
@return:
'''
y_onehot = self.onthot(y)
self.grad = - 1.0 / y.shape[0] * \
np.dot(np.c_[np.ones(x.shape[0]), x].T, y_onehot - self.softmax_hypo(x)) \
+ self.lam * self.theta
def softmax_grad_desc(self):
'''
@description: 用梯度下降法来进行优化
@param {type}
@return:
'''
self.softmax_cost(self.X_train, self.Y_train)
cost_change = 1.0
i = 1
while i < self.epochs and cost_change > self.threshold:
pre_cost = self.cost
self.softmax_grad(self.X_train, self.Y_train)
self.theta = self.theta - self.learning_rate * self.grad
self.softmax_cost(self.X_train, self.Y_train)
cost_change = abs(self.cost - pre_cost)
i += 1
if i%100 == 0:
# 每100次输出一次loss大小
print("-------%d ---- loss = %f" %(i ,self.cost))
def fit(self):
'''
@description: emm, sklearn里面的fit函数
@param {type}
@return:
'''
self.softmax_grad_desc()
def main():
# 加载数据集
dataset = datasets.load_digits()
X = dataset.data[:, :]
Y = dataset.target[:, None]
sr = SoftmaxRegression(X, Y)
sr.fit()
main()