时间:2021-05-19
本文实例为大家分享了基于Spark实现随机森林的具体代码,供大家参考,具体内容如下
public class RandomForestClassficationTest extends TestCase implements Serializable{ /** * */ private static final long serialVersionUID = 7802523720751354318L; class PredictResult implements Serializable{ /** * */ private static final long serialVersionUID = -168308887976477219L; double label; double prediction; public PredictResult(double label,double prediction){ this.label = label; this.prediction = prediction; } @Override public String toString(){ return this.label + " : " + this.prediction ; } } public void test_randomForest() throws JAXBException{ SparkConf sparkConf = new SparkConf(); sparkConf.setAppName("RandomForest"); sparkConf.setMaster("local"); SparkContext sc = new SparkContext(sparkConf); String dataPath = RandomForestClassficationTest.class.getResource("/").getPath() + "/sample_libsvm_data.txt"; RDD dataSet = MLUtils.loadLibSVMFile(sc, dataPath); RDD[] rddList = dataSet.randomSplit(new double[]{0.7,0.3},1); RDD trainingData = rddList[0]; RDD testData = rddList[1]; ClassTag labelPointClassTag = trainingData.elementClassTag(); JavaRDD trainingJavaData = new JavaRDD(trainingData,labelPointClassTag); int numClasses = 2; Map categoricalFeatureInfos = new HashMap(); int numTrees = 3; String featureSubsetStrategy = "auto"; String impurity = "gini"; int maxDepth = 4; int maxBins = 32; /** * 1 numClasses分类个数为2 * 2 numTrees 表示的是随机森林中树的个数 * 3 featureSubsetStrategy * 4 */ final RandomForestModel model = RandomForest.trainClassifier(trainingJavaData, numClasses, categoricalFeatureInfos, numTrees, featureSubsetStrategy, impurity, maxDepth, maxBins, 1); JavaRDD testJavaData = new JavaRDD(testData,testData.elementClassTag()); JavaRDD predictRddResult = testJavaData.map(new Function(){ /** * */ private static final long serialVersionUID = 1L; public PredictResult call(LabeledPoint point) throws Exception { // TODO Auto-generated method stub double pointLabel = point.label(); double prediction = model.predict(point.features()); PredictResult result = new PredictResult(pointLabel,prediction); return result; } }); List predictResultList = predictRddResult.collect(); for(PredictResult result:predictResultList){ System.out.println(result.toString()); } System.out.println(model.toDebugString()); }}得到的随机森林的展示结果如下:
TreeEnsembleModel classifier with 3 trees Tree 0:If (feature 435 <= 0.0)If (feature 516 <= 0.0)Predict: 0.0Else (feature 516 > 0.0)Predict: 1.0Else (feature 435 > 0.0)Predict: 1.0Tree 1:If (feature 512 <= 0.0)Predict: 1.0Else (feature 512 > 0.0)Predict: 0.0Tree 2:If (feature 377 <= 1.0)Predict: 0.0Else (feature 377 > 1.0)If (feature 455 <= 0.0)Predict: 1.0Else (feature 455 > 0.0)Predict: 0.0以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持。
声明:本页内容来源网络,仅供用户参考;我单位不保证亦不表示资料全面及准确无误,也不保证亦不表示这些资料为最新信息,如因任何原因,本网内容或者用户因倚赖本网内容造成任何损失或损害,我单位将不会负任何法律责任。如涉及版权问题,请提交至online#300.cn邮箱联系删除。
H2O中的随机森林算法介绍及其项目实战(python实现)包的引入:fromh2o.estimators.random_forestimportH2ORando
本文是基于Windows10系统环境,实现python生成随机数、随机字符、随机字符串:Windows10PyCharm2018.3.5forWindows(e
java实现输出随机图片实例代码输出随机图片(CAPTCHA图像):CompletelyAutomatedPublicTuringTesttoTellCompu
引言想通过随机森林来获取数据的主要特征1、理论随机森林是一个高度灵活的机器学习方法,拥有广泛的应用前景,从市场营销到医疗保健保险。既可以用来做市场营销模拟的建模
本文实例为大家分享了python实现双色球随机选号的具体代码,供大家参考,具体内容如下双色球随机选号实现代码fromrandomimportrandrange,