MapReduceTopK统计加排序

Hadoop技术内幕中指出Top K算法有两步,一是统计词频,二是找出词频最高的前K个词。在网上找了很多MapReduce的Top K案例,这些案例都只有排序功能,所以自己写了个案例。

这个案例分两个步骤,第一个是就是wordCount案例,二就是排序功能。

IntWritable count = StringTokenizer st = String word = st.nextToken().replaceAll("\"", "").replace("'", "").replace(".", "" context.write( @SuppressWarnings("unused" count ++ context.write(key, @SuppressWarnings("deprecation" Configuration conf = Job job = job.setJarByClass(WordCount. job.setMapperClass(Map. job.setReducerClass(Reduce. job.setMapOutputKeyClass(Text. job.setMapOutputValueClass(IntWritable. job.setOutputKeyClass(Text. job.setOutputValueClass(IntWritable. FileInputFormat.addInputPath(job, FileOutputFormat.setOutputPath(job, }

IntWritable outKey = Text outValue = StringTokenizer st = String element = } Reducer<IntWritable, Text, Text, IntWritable> Context context) tm.put( String path = context.getConfiguration().get("topKout" mos = Set<Entry<MyInt, String>> set = mos.write("topKMOS", @SuppressWarnings("deprecation" Path outPath = Configuration conf = conf.set("topKout" Job job = job.setJarByClass(Sort. job.setMapperClass(Map. job.setReducerClass(Reduce. job.setMapOutputKeyClass(IntWritable. job.setMapOutputValueClass(Text. job.setOutputKeyClass(Text. job.setOutputValueClass(IntWritable. MultipleOutputs.addNamedOutput(job,"topKMOS",TextOutputFormat. FileInputFormat.addInputPath(job, job.waitForCompletion( }

}

String in = "hdfs://localhost:9000/input/MaDing.text" String wordCout = "hdfs://localhost:9000/out/wordCout" String sort = "hdfs://localhost:9000/out/sort" String topK = "hdfs://localhost:9000/out/topK" }

更多相关文章
一周排行
Tags