The way partitioner and backend is, partitioner will tag the nodes to lower to the backend and backend will will receive all tagged nodes and preprocess them as a delegate.
Some operators may have better performance in the memory format other than contiguous. One way to do that is to insert to_dim_op
to describe memory format permutation and merge if there two opposite one next to each other.