sklearn机器学习算法速查

zz/2024/7/13 11:09:05

常见的机器学习算法

以下是最常用的机器学习算法,大部分数据问题都可以通过它们解决:

  1. 线性回归 (Linear Regression)

  2. 逻辑回归 (Logistic Regression)

  3. 决策树 (Decision Tree)

  4. 支持向量机(SVM)

  5. 朴素贝叶斯 (Naive Bayes)

  6. K邻近算法(KNN)

  7. K-均值算法(K-means)

  8. 随机森林 (Random Forest)

  9. 降低维度算法(Dimensionality Reduction Algorithms)

  10. Gradient Boost和Adaboost算法

这里写图片描述

图1:主要是对sklearn中的主要方法进行分类

这里写图片描述 
图2:分别对降维和参数查找的方法进行列举 
这里写图片描述

图3:常用数据预处理方法 
这里写图片描述

1.线性回归 (Linear Regression)

#Import Library
#Import other necessary libraries like pandas, numpy...
from sklearn import linear_model
#Load Train and Test datasets
#Identify feature and response variable(s) and values must be 
numeric and numpy arraysx_train=input_variables_values_training_datasets
y_train=target_variables_values_training_datasets
x_test=input_variables_values_test_datasets# Create linear regression object
linear = linear_model.LinearRegression()# Train the model using the training sets and check score
linear.fit(x_train, y_train)
linear.score(x_train, y_train)#Equation coefficient and Intercept
print('Coefficient: \n', linear.coef_)
print('Intercept: \n', linear.intercept_)#Predict Output
predicted= linear.predict(x_test)

2.逻辑回归 (Logistic Regression)

#Import Library
from sklearn.linear_model import LogisticRegression
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset# Create logistic regression objectmodel = LogisticRegression()# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)#Equation coefficient and Intercept
print('Coefficient: \n', model.coef_)
print('Intercept: \n', model.intercept_)#Predict Output
predicted= model.predict(x_test)

3.决策树 (Decision Tree)

#Import Library
#Import other necessary libraries like pandas, numpy...from sklearn import tree
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset# Create tree object 
model = tree.DecisionTreeClassifier(criterion='gini') # for classification, here you can change the algorithm as gini or entropy (information gain) by default it is gini  # model = tree.DecisionTreeRegressor() for regression# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)#Predict Output
predicted= model.predict(x_test)

4.支持向量机(SVM)

#Import Library
from sklearn import svm
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create SVM classification object model = svm.SVC() # there is various option associated with it, this is simple for classification. You can refer link, for mo# re detail.# Train the model using the training sets and check score
model.fit(X, y)
model.score(X, y)#Predict Output
predicted= model.predict(x_test)

5.朴素贝叶斯 (Naive Bayes)

#Import Library
from sklearn.naive_bayes import GaussianNB
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset# Create SVM classification object 
model = GaussianNB() # there is other distribution for multinomial classes like Bernoulli Naive Bayes, Refer link# Train the model using the training sets and check score
model.fit(X, y)#Predict Output
predicted= model.predict(x_test)

6.K邻近算法(KNN)

#Import Library
from sklearn.neighbors import KNeighborsClassifier#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create KNeighbors classifier object 
model = KNeighborsClassifier(n_neighbors=6) # default value for n_neighbors is 5# Train the model using the training sets and check score
model.fit(X, y)#Predict Output
predicted= model.predict(x_test)

7.K-均值算法(K-means )

#Import Library
from sklearn.cluster import KMeans#Assumed you have, X (attributes) for training data set and x_test(attributes) of test_dataset
# Create KNeighbors classifier object model 
model = KMeans(n_clusters=3, random_state=0)# Train the model using the training sets and check score
model.fit(X)#Predict Output
predicted= model.predict(x_test)

8.随机森林 (Random Forest)

#random forest
#import library
from sklearn.ensemble import  RandomForestClassifier
#assumed you have x(predictor)and y(target) for training data set and x_test(predictor)of test_dataset
#create random forest object
model=RandomForestClassifier()
#train the model using the training sets and chek score
model.fit(x,y)
#predict output
predict=model.presort(x_test)

9.降低维度算法(Dimensionality Reduction Algorithms)

#Import Library
from sklearn import decomposition
#Assumed you have training and test data set as train and test
# Create PCA obeject 
pca= decomposition.PCA(n_components=k) #default value of k =min(n_sample, n_features)
# For Factor analysis
#fa= decomposition.FactorAnalysis()
# Reduced the dimension of training dataset using PCAtrain_reduced = pca.fit_transform(train)#Reduced the dimension of test dataset
test_reduced = pca.transform(test)

10.Gradient Boost和Adaboost算法

#Import Library
from sklearn.ensemble import GradientBoostingClassifier
#Assumed you have, X (predictor) and Y (target) for training data set and x_test(predictor) of test_dataset
# Create Gradient Boosting Classifier object
model= GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0)# Train the model using the training sets and check score
model.fit(X, y)
#Predict Output
predicted= model.predict(x_test)


以下实例中predict数据时为了验证其拟合度,采用的是训练集数据作为参数,实际中应该采用的是测试集,不要被误导了!!!

这里写图片描述 
这里写图片描述 
这里写图片描述 
这里写图片描述 
这里写图片描述 
这里写图片描述 
这里写图片描述 
这里写图片描述 
这里写图片描述 
这里写图片描述 
这里写图片描述

这里写图片描述

这里写图片描述 
这里写图片描述

这里写图片描述

这里写图片描述 
这里写图片描述 
参考:http://blog.csdn.net/han_xiaoyang/article/details/51191386



http://www.ngui.cc/zz/2762630.html

相关文章

python多线程编写

多任务可以由多进程完成,也可以由一个进程内的多线程完成。进程是由若干线程组成的,一个进程至少有一个线程。由于线程是操作系统直接支持的执行单元,因此,高级语言通常都内置多线程的支持,Python也不例外,…

xgboost特征工程--探索数据集的基本信息

知道数据集的基本信息对我们建模有用,那么如何分析数据集的特点呢? 我们以Kaggle2017年举办的Two Sigma Connect: Rental Listing Inquiries竞赛数据为例进行数据集探索分析。 可以参考kernel中更多数据分析示例:https://www.kaggle.com/c/…

sklearn.metrics中的评估方法介绍(accuracy_score, recall_score, roc_curve, roc_auc_score, confusion_matrix)

1、accuracy_score 分类准确率分数是指所有分类正确的百分比。分类准确率这一衡量分类器的标准比较容易理解,但是它不能告诉你响应值的潜在分布,并且它也不能告诉你分类器犯错的类型。 sklearn.metrics.accuracy_score(y_true, y_pred, normalizeTrue, …

hdoj2043 密码 字符串题--水题

分析&#xff1a;注意题目中应该满足的两个条件&#xff0c;第一个条件容易丢失。 (1).密码长度大于等于8&#xff0c;且不要超过16。 (2).密码中的字符应该来自下面“字符类别”中四组中的至少三组。 #include <iostream> #include <algorithm> #include <m…

hdoj2111 Saving HDU --贪心

分析&#xff1a;题不难&#xff0c;直接贴代码吧&#xff01; #include <iostream> #include <algorithm> #include <map> #include <string> #没有这行会报错 using namespace std;struct treasure {int pi; //单价int pm; //体积 };int cmp(…

奇异值分解(SVD)和主成分分析(PCA)(讲解很清楚明了)

奇异值分解(SVD)原文链接&#xff1a;http://www.cnblogs.com/pinard/p/6251584.html 主成分分析(PCA)原文链接&#xff1a;http://www.cnblogs.com/pinard/p/6239403.html

牛牛打响指--大数做除法

链接&#xff1a;https://www.nowcoder.com/questionTerminal/442cbe24e08447729543510c2eb47082 来源&#xff1a;牛客网 牛牛在地上捡到了一个手套&#xff0c;他带上手套发现眼前出现了很多个小人&#xff0c;当他打一下响指&#xff0c;这些小人的数量就会发生以下变化&…

xgboost相比传统gbdt有何不同?xgboost为什么快?如何支持并行?

传统GBDT以CART作为基分类器&#xff0c;xgboost还支持线性分类器&#xff0c;这个时候xgboost相当于带L1和L2正则化项的逻辑斯蒂回归&#xff08;分类问题&#xff09;或者线性回归&#xff08;回归问题&#xff09;。传统GBDT在优化时只用到一阶导数信息&#xff0c;xgboost则…

markdown(md)文件的基本常用编辑语法

.md即markdown文件的基本常用编写语法&#xff08;图文并茂&#xff09; 原文&#xff1a;https://www.cnblogs.com/liugang-vip/p/6337580.html 起因&#xff1a; 因为现在的前端基本上都用上了前端构建工具&#xff0c;那就难免要写一些readme等等的说明性文件&#xff0c…

Ocpc 效果广告中对达成率较好的广告主出价提权重设想

一、背景 OCPC 是效果广告最近两年比较好的业务模式&#xff0c;目标是通过对广告主 ROI 进行优化&#xff0c;在满足广告主达成的情况下&#xff0c;提高平台整体收益。OCPC广告在实际的投放过程中存在&#xff1a; 1、个别广告主因为广告创意、媒体&#xff08;平台&#xf…