Changeset [28f91f1b016d0b5c569d87b154daddd426413a22] by Shayan Pooya

January 8th, 2014 @ 06:10 AM

Add a test to distinguish between group_node_label and group_label.

When we run this test on a disco cluster with more than one node, we can
see the improvements of the group_node_lable over group_label. With
group_node_label, for each intermediate step, one task is created on each
node and the inputs are processed locally. With the later, only one
task is created on one of the nodes and all of the input is shipped to
that node. If the size of the input files is large enough, we can see
significant improvements.
https://github.com/discoproject/disco/commit/28f91f1b016d0b5c569d87...

Committed by Shayan Pooya

  • A tests/test_pipe.py
New-ticket Create new ticket

Create your profile

Help contribute to this project by taking a few moments to create your personal profile. Create your profile ยป

Disco is an open-source implementation of the Map-Reduce framework for distributed computing. As the original framework, Disco supports parallel computations over large data sets on unreliable cluster of computers.

Shared Ticket Bins