BodoError: distributed return of array not valid since it is replicated

This error usually happens when you’re specifying your return variable to be distributed while Bodo determined that it’s going to be replicated.

Here’s an example:

import pandas as pd
import bodo

@bodo.jit(distributed=['res'])
def test(df):
    res = df.groupby('A').B.mean()
    return res 

df = pd.DataFrame({'A' : range(10), 'B' : range(10,20)})
res = test(df)

In this case, input is replicated and so the return value, res, is replicated as well. Therefore, we get the error below.

BodoError: distributed return of array $res.126 not valid since it is replicated

There’re two solutions to fix this error:

  1. Simplest case, remove distributed flag for res.
    OR
  2. If you want it to be distributed, make sure the input is distributed as well. Here, this can be achieved by moving df definition inside bodo.jit function.
import pandas as pd
import bodo

@bodo.jit(distributed=['res'])
def test():
    df = pd.DataFrame({'A' : range(10), 'B' : range(10,20)})
    res = df.groupby('A').B.mean()
    return res 

res = test(df)
1 Like