博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
mongodb mapreduce 总结
阅读量:4201 次
发布时间:2019-05-26

本文共 7247 字,大约阅读时间需要 24 分钟。

mongodb  mapreduce

官方详细说明地址:https://docs.mongodb.org/manual/reference/method/db.collection.mapReduce/#mapreduce-map-mtd

1.语法结构:

db.collection.mapReduce(	,	
, { out:
, query:
, sort:
, limit:
, finalize:
, scope:
, jsMode:
, verbose:
, bypassDocumentValidation:
});
详细说明:

干净概念:

在map reduce finalize函数中,函数里面应该是干净的,不能出现连接数据库的操作等,但是也可以使用一些函数,如下:

Available Properties 	argsMaxKeyMinKeyAvailable Functions 	 	assert()BinData()DBPointer()DBRef()doassert()emit()gc()HexData()hex_md5()isNumber()isObject()ISODate()isString()	Map()MD5()NumberInt()NumberLong()ObjectId()print()printjson()printjsononeline()sleep()Timestamp()tojson()tojsononeline()tojsonObject()UUID()version()

1.map

格式:

function() {   ...   emit(key, value);}
把每一个document、转变成0个或者多个emit。用来做mapreduce的初始数据

转变成0行的方式:

function() {    if (this.status == 'A')        emit(this.cust_id, 1);}
转变成多行的方式:

function() {    this.items.forEach(function(item){ emit(item.sku, 1); });}

REQUIREMENTS:

1.在map函数中,this代表当前的document

2.不允许访问数据库

3.不能和外部的function进行交互

可以从 scope中取值。

4.emit里面的数据大小 最大为MongoDB’s . 的一半大,The maximum BSON document size is 16 megabytes.

因此emit里面的数据不能超过8MB

5.一个document 可能得到0个,1个,多个 emit

2.reduce

function(key, values) {   ...   return result;}
REQUIREMENT:

1.不能访问数据库和外部函数

2.当一个key 只有一个数据的时候,reduce函数将不被执行,当前的值作为reduce的结果

3.reduce函数可能被调用多次,譬如在分片的情况下需要多次合并,因此,reduce的结果格式,是可以作为下一次reduce的传入数据。英文如下:

  • MongoDB can invoke the reduce function more than once for thesame key. In this case, the previous output from thereducefunction for that key will become one of the input values to the nextreduce function invocation for that key.

4.可以从 scope中取值。

总之,reduce的结果格式,需要和map函数的emit部分的格式一致,这样才能多次自行reduce

3. OPTIONS

3.1 out 有两种格式

out: 
out: {
:
[, db:
] [, sharded:
] [, nonAtomic:
] }
第一种格式,默认为
out: { replace: 
[, db:
] [, sharded: false ] [, nonAtomic: false ] }
action的取值:

   replace:整体替换,相当于如果这个collection存在,则清空,在插入结果

   merge:如果插入的key结果在collection中存在,则会被覆盖,没有的继续存在

   reduce:和collection中的结果合并,如果key存在,将使用reduce 将插入的数据和存在的数据进行reduce处理。

reduce比较适合cron隔断时间执行某个时间的数据,然后结果会合并起来,这样多次执行和一次执行的结果是一样的,这样的好处是可以实时的查看一部分数据。

db的取值:

默认是input的对应的数据库,这里可以自定output数据的库

sharded的取值:

设置为true为启用分片,您需要在output databse中enable sharding,mapreduce将把_id作为shard key将output collection放到不同的分片上。

nonAtomic的取值:

非原子的意思,默认为false,也就是原子性,mapreduce在执行的时候将锁表

只能应用 action为merge或reduce的时候才能设置为true

如果设置为ture,将不锁表,客户端访问有可能读取到output的中间数据。

3.2 finalize Function

function(key, reducedValue) {   ...   return modifiedObject;}
不可以访问数据库和其他函数

可以访问scope中定义的参数

例子:

var mapFunction2 = function() {                       for (var idx = 0; idx < this.items.length; idx++) {                           var key = this.items[idx].sku;                           var value = {                                         count: 1,                                         qty: this.items[idx].qty                                       };                           emit(key, value);                       }                    };
var reduceFunction2 = function(keySKU, countObjVals) {                     reducedVal = { count: 0, qty: 0 };                     for (var idx = 0; idx < countObjVals.length; idx++) {                         reducedVal.count += countObjVals[idx].count;                         reducedVal.qty += countObjVals[idx].qty;                     }                     return reducedVal;                  };
var finalizeFunction2 = function (key, reducedVal) {                       reducedVal.avg = reducedVal.qty/reducedVal.count;                       return reducedVal;                    };
db.orders.mapReduce( mapFunction2,                     reduceFunction2,                     {                       out: { merge: "map_reduce_example" },                       query: { ord_date:                                  { $gt: new Date('01/01/2012') }                              },                       finalize: finalizeFunction2                     }                   )

This operation uses the query field to select only thosedocuments withord_date greater than newDate(01/01/2012). Then it output the results to a collectionmap_reduce_example. If themap_reduce_example collectionalready exists, the operation will merge the existing contents withthe results of this map-reduce operation.

db.collection.mapReduce() takes the following parameters:

Field Type Description
map function

A JavaScript function that associates or “maps” a value with akey and emits the key and value pair.

See for more information.

reduce function

A JavaScript function that “reduces” to a single object all thevalues associated with a particular key.

See for more information.

options document A document that specifies additional parameters todb.collection.mapReduce().
bypassDocumentValidation boolean

Optional. Enables to bypass document validationduring the operation. This lets you insert documents that do notmeet the validation requirements.

New in version 3.2.

The following table describes additional arguments thatdb.collection.mapReduce() can accept.

Field Type Description
out string or document

Specifies the location of the result of the map-reduce operation.You can output to a collection, output to a collection with anaction, or output inline. You may output to a collection whenperforming map reduce operations on the primary members of the set;on members you may only use the inline output.

See for more information.

query document Specifies the selection criteria using for determining the documents input to themap function.
sort document Sorts the input documents. This option is useful foroptimization. For example, specify the sort key to be the same asthe emit key so that there are fewer reduce operations. The sort keymust be in an existing index for this collection.
limit number Specifies a maximum number of documents for the input into themap function.
finalize function

Optional. Follows the reduce method and modifies the output.

See for more information.

scope document Specifies global variables that are accessible in the map,reduce and finalize functions.
jsMode boolean

Specifies whether to convert intermediate data into BSONformat between the execution of the map and reducefunctions. Defaults to false.

If false:

  • Internally, MongoDB converts the JavaScript objects emittedby the mapfunction to BSON objects. These BSONobjects are then converted back to JavaScript objects whencalling the reduce function.
  • The map-reduce operation places the intermediate BSON objectsin temporary, on-disk storage. This allows the map-reduceoperation to execute over arbitrarily large data sets.

If true:

  • Internally, the JavaScript objects emitted during mapfunction remain as JavaScript objects. There is no need toconvert the objects for the reduce function, whichcan result in faster execution.
  • You can only use jsMode for result sets with fewer than500,000 distinct key arguments to the mapper’s emit()function.

The jsMode defaults to false.

verbose Boolean Specifies whether to include the timing information in theresult information. The verbose defaults to true to includethe timing information.

转载地址:http://dodli.baihongyu.com/

你可能感兴趣的文章
ssh客户端后台运行
查看>>
哥去求职,才说了一句话考官就让我出去
查看>>
【React Native】把现代web科技带给移动开发者(一)
查看>>
【GoLang】Web工作方式
查看>>
Launch Sublime Text 3 from the command line
查看>>
【数据库之mysql】mysql的安装(一)
查看>>
【数据库之mysql】 mysql 入门教程(二)
查看>>
【HTML5/CSS/JS】A list of Font Awesome icons and their CSS content values(一)
查看>>
【HTML5/CSS/JS】<br>与<p>标签区别(二)
查看>>
【HTML5/CSS/JS】开发跨平台应用工具的选择(三)
查看>>
【心灵鸡汤】Give it five minutes不要让一个好主意随风而去
查看>>
【React Native】Invariant Violation: Application AwesomeProject has not been registered
查看>>
【ReactNative】真机上无法调试 could not connect to development server
查看>>
【XCode 4.6】常用快捷键 特别是格式化代码ctrl+i
查看>>
【iOS游戏开发】icon那点事 之 实际应用(二)
查看>>
【iOS游戏开发】icon那点事 之 图标设计(三)
查看>>
【IOS游戏开发】之测试发布(Distribution)
查看>>
【IOS游戏开发】之IPA破解原理
查看>>
【一天一道LeetCode】#45. Jump Game II
查看>>
【一天一道LeetCode】#46. Permutations
查看>>