Tuning using HyperOpt in python

HyperOpt provides an optimization interface that accepts an evaluation function and parameter space, and can calculate the loss function value of a point in the parameter space. The user also specifies the distribution of parameters in the space.

Hyheropt has four important factors: specify the function to be minimized, the search space, the sampled data set (trails database) (optional), and the search algorithm (optional).

First, define an objective function, accept a variable, and return the loss value of a function after calculation. For example, to minimize the function q(x,y) = x**2 + y**2:

from hyperopt import hp
space = [hp.uniform(’x’, 0, 1), hp.normal(’y’, 0, 1)]

Then, define a parameter space, for example, x takes a value in the interval 0-1, and y is a real number, so

Third, specify the search algorithm, which is the value of the algo parameter of the hyperopt's fmin function. Currently supported algorithms are random search (corresponding to hyperopt.rand.suggest), simulated annealing (corresponding to hyperopt.anneal.suggest), and TPE algorithm. Give a chestnut:

from hyperopt import hp, fmin, rand, tpe, space_eval
best = fmin(q, space, algo=rand.suggest)print space_eval(space, best)

The search algorithm itself also has built-in parameters to determine how to optimize the objective function. We can specify the parameters of the search algorithm. For example, for TPE, specify jobs:

from functools import partial
from hyperopt import hp, fmin, tpe
algo = partial(tpe.suggest, n_startup_jobs=10)
best = fmin(q, space, algo=algo)print space_eval(space, best)

Regarding the parameter space setting, such as the optimization function q, enter fmin(q,space=hp.uniform('a',0,1)). The first parameter of the hp.uniform function is the label, and each hyperparameter is in the parameter The space must have a unique label. hp.uniform specifies the distribution of parameters. Other parameter distributions such as hp.choice return an option, which can be a list or tuple. options can be nested expressions to form conditional parameters.

hp.pchoice(label,p_options) returns an option of p_options with a certain probability. This option makes the possibility of each option in the search process uneven.

The hp.uniform(label,low,high) parameter is evenly distributed between low and high. hp.quniform(label,low,high,q), the value of the parameter is round(uniform(low,high)/q)*q, which is suitable for those discrete values.

hp.loguniform(label,low,high) draw exp(uniform(low,high)), the value range of the variable is [exp(low),exp(high)]

hp.randint(label,upper) returns one in [ 0, upper) A random integer in the interval before closing and opening.

The search space can contain lists and dictionaries.

from hyperopt import hp
list_space = [
hp.uniform(’a’, 0, 1),
hp.loguniform(’b’, 0, 1)]
tuple_space = (
hp.uniform(’a’, 0, 1),
hp.loguniform(’b’, 0, 1))
dict_space = {
’a’: hp.uniform(’a’, 0, 1),
’b’: hp.loguniform(’b’, 0, 1)}

Use the sample function to sample from the parameter space:

from hyperopt.pyll.stochasti import sample
print sample(list_space)
# => [0.13, .235]print sample(nested_space)
# => [[{case: 1, ’a’, 0.12}, {case: 2, ’b’: 2.3}],# ’extra_literal_string’,# 3]

Use functions in parameter space:

from hyperopt.pyll import scope
def foo(x):return str(x)3
expr_space = {
’a’: 1 + hp.uniform(’a’, 0, 1),
’b’: scope.minimum(hp.loguniform(’b’, 0, 1), 10),
’c’: scope.call(foo, args=(hp.randint(’c’, 5),)),

—————– This is a slightly short faint secant line————————————–

I found a piece of code on the blog that uses the perceptron to discriminate iris data. The learning rate used is 0.1. Iterated 40 times and got a result of 82% accuracy on the test set. Using hyperopt to optimize the parameters, the accuracy rate was increased to 91%.

from sklearn import datasets
import numpy as np
from sklearn.cross_validation import train_test_split
from sklearn.metrics import accuracy_score
iris = datasets.load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train_std = sc.transform(X_train)
X_test_std = sc.transform(X_test)from sklearn.linear_model import Perceptron
ppn = Perceptron(n_iter=40, eta0=0.1, random_state=0)
ppn.fit(X_train_std, y_train)
y_pred = ppn.predict(X_test_std)print accuracy_score(y_test, y_pred)def percept(args):global X_train_std,y_train,y_test
    ppn = Perceptron(n_iter=int(args["n_iter"]),eta0=args["eta"]*0.01,random_state=0)
    ppn.fit(X_train_std, y_train)
    y_pred = ppn.predict(X_test_std)return -accuracy_score(y_test, y_pred)from hyperopt import fmin,tpe,hp,partial
space = {"n_iter":hp.choice("n_iter",range(30,50)),
algo = partial(tpe.suggest,n_startup_jobs=10)
best = fmin(percept,space,algo = algo,max_evals=100)
print best
print percept(best)
#0.822222222222#{'n_iter': 14, 'eta': 0.12877033763511717}#-0.911111111111

Xgboost has many parameters. Write the code of xgboost as a function, and then pass it into fmin for parameter optimization, and use the cross-validated auc as the optimization target. The larger the auc, the better, because fmin is the minimum value, so find the minimum value of -auc. The data set used is a data set with 202 columns, the first column is sample id, the last column is label, and the middle 200 columns are attributes.

#coding:utf-8import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
import xgboost as xgb
from random import shuffle
from xgboost.sklearn import XGBClassifier
from sklearn.cross_validation import cross_val_score
import pickle
import time
from hyperopt import fmin, tpe, hp,space_eval,rand,Trials,partial,STATUS_OKdef loadFile(fileName = "E://zalei//browsetop200Pca.csv"):
    data = pd.read_csv(fileName,header=None)
    data = data.valuesreturn data
data = loadFile()
label = data[:,-1]
attrs = data[:,:-1]
labels = label.reshape((1,-1))
label = labels.tolist()[0]
minmaxscaler = MinMaxScaler()
attrs = minmaxscaler.fit_transform(attrs)
index = range(0,len(label))
trainIndex = index[:int(len(label)*0.7)]
print len(trainIndex)
testIndex = index[int(len(label)*0.7):]
print len(testIndex)
attr_train = attrs[trainIndex,:]print attr_train.shape
attr_test = attrs[testIndex,:]print attr_test.shape
label_train = labels[:,trainIndex].tolist()[0]
print len(label_train)
label_test = labels[:,testIndex].tolist()[0]
print len(label_test)
print np.mat(label_train).reshape((-1,1)).shapedef GBM(argsDict):
    max_depth = argsDict["max_depth"] + 5
    n_estimators = argsDict['n_estimators'] * 5 + 50
    learning_rate = argsDict["learning_rate"] * 0.02 + 0.05
    subsample = argsDict["subsample"] * 0.1 + 0.7
    min_child_weight = argsDict["min_child_weight"]+1print "max_depth:" + str(max_depth)
    print "n_estimator:" + str(n_estimators)
    print "learning_rate:" + str(learning_rate)
    print "subsample:" + str(subsample)
    print "min_child_weight:" + str(min_child_weight)
    global attr_train,label_train
    gbm = xgb.XGBClassifier(nthread=4,    #进程数
                            max_depth=max_depth,  #最大深度
                            n_estimators=n_estimators,   #树的数量
                            learning_rate=learning_rate, #学习率
                            subsample=subsample,      #采样数
                            min_child_weight=min_child_weight,   #孩子数
                            max_delta_step = 10,  #10步不降则停止
    metric = cross_val_score(gbm,attr_train,label_train,cv=5,scoring="roc_auc").mean()
    print metric
    return -metric
space = {"max_depth":hp.randint("max_depth",15),
         "n_estimators":hp.randint("n_estimators",10),  #[0,1,2,3,4,5] -> [50,]"learning_rate":hp.randint("learning_rate",6),  #[0,1,2,3,4,5] -> 0.05,0.06"subsample":hp.randint("subsample",4),#[0,1,2,3] -> [0.7,0.8,0.9,1.0]"min_child_weight":hp.randint("min_child_weight",5), #
algo = partial(tpe.suggest,n_startup_jobs=1)
best = fmin(GBM,space,algo=algo,max_evals=4)print best
print GBM(best)