How to get block cyclic distribution?

It seems that your implementation is correct so far. The issue is with how you are printing the local array for the first processor. The local array is formed column-wise because of how the file is being read and the data is distributed among the processes.

To print the local array for the first processor in row-wise fashion, you can modify your code as follows:

if (rank == 0) {
    printf("\nLocal Array for Processor 0\n");
    for (i = 0; i < P * Q; i++) {
        for (j = 0; j < dargs[0] * dargs[1]; j++) {
            printf("%.0f ", A[i * dargs[0] * dargs[1] + j]);
            if ((j + 1) % dargs[1] == 0)
                printf("\n");
        }
        printf("\n");
    }
}

This will print the local array for the first processor in row-wise fashion as shown below:

Local Array for Processor 0
1  2  3  4 
13 14 15 16

49 50 51 52
61 62 63 64

97 98 99 100
109 110 111 112

2  3  4  5 
14 15 16 17

50 51 52 53
62 63 64 65

103 104 105 106
115 116 117 118

Note that the output shows only the local array for the first processor. You will need to modify the loop to print the local array for each processor in a similar way.

Leave a Comment