Changeset [28f91f1b016d0b5c569d87b154daddd426413a22] by Shayan Pooya
January 8th, 2014 @ 06:10 AM
Add a test to distinguish between group_node_label and group_label.
When we run this test on a disco cluster with more than one
node, we can
see the improvements of the group_node_lable over group_label.
With
group_node_label, for each intermediate step, one task is created
on each
node and the inputs are processed locally. With the later, only
one
task is created on one of the nodes and all of the input is shipped
to
that node. If the size of the input files is large enough, we can
see
significant improvements.
https://github.com/discoproject/disco/commit/28f91f1b016d0b5c569d87...
Committed by Shayan Pooya
- A tests/test_pipe.py
Create your profile
Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป
Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.