Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] when should Colon : keep offset information #246

Open
johnnychen94 opened this issue Jun 16, 2021 · 3 comments
Open

[RFC] when should Colon : keep offset information #246

johnnychen94 opened this issue Jun 16, 2021 · 3 comments
Milestone

Comments

@johnnychen94
Copy link
Member

johnnychen94 commented Jun 16, 2021

In a normal 1-based indexing array world, this is clear: : serves as a length placeholder.

During #220 #228 and #245, I've realized that we haven't yet had a clear and consistent definition of the role of : in the OffsetArray world. Let's take reshape as an example, it might also apply to all other operations where : is allowed, e.g., getindex, setindex!.

I propose the rule of thumb is to keep offset information if it's unambiguous. It comes with one and only one extra rule: if all inds inputs are range type, keep offset information for the corresponding dimension where : is placed at.

A = OffsetArray(rand(4, 4, 4), -1, -2, -3)

reshape(A, :) # (0:64, )

reshape(A, 1:8, :) # (1:8, -1:6)
reshape(A, :, 1:8) # (0:7, 1:8)

reshape(A, 1:8, 1:8, :) # (1:8, 1:8, -2:-2)
reshape(A, 1:8, :, 1:8) # (1:8, -1:-1, 1:8)
reshape(A, :, 1:8, 1:8) # (0:0, 1:8, 1:8)

reshape(A, 1:8, 1:2, :) # (1:8, 1:2, -2:1)
reshape(A, 1:8, :, 1:2) # (1:8, -1:2, 1:2)
reshape(A, :, 1:8, 1:2) # (0:3, 1:8, 1:2)

All other cases should be consistent with the Base case. For example:

reshape(A, 8, :) # (1:8, 1:8)

In this case, it's ambiguous whether : is used as a length placeholder or axes placeholder so we should stick to the Base case; otherwise, I can foresee a lot of type piracy involved.

@johnnychen94 johnnychen94 added this to the v2.0 milestone Jun 16, 2021
@mcabbott
Copy link

mcabbott commented Jul 15, 2021

The present behaviour of making : always start from 1 struck me as sensible. Here are some examples where the proposed rule (if I understood right) would give offsets that seem counter-intuitive:

M = reshape(1.0:16, 0:3, 3:6) .+ 0.0

reshape(M, 0:7, :)  # does this 2nd dimension share an identity with the 3:6 one?

reshape(M, 0:3, 1, :)  # ... yet this 3rd dimension (with the same length) does not?

reshape(M, axes(M,1), :, axes(M,2))  # here I'm inserting a trivial dimension

BTW TensorCast now handles offsets, and some kinds of reshaping. My 2nd and 3rd examples would be written like this, it knows that the last index of B was the last index of M, and thus it will preserve its axis:

@cast B[i,_,j] := M[i,j]

The closest you can get to my 1st example above is @cast A[(i,j),k] := M[i,(j,k)] k in 1:2 but this is more like reshape(M, :, OneTo(2)): the combined index (i,j) always starts at 1 (as does the trivial index created by _).

@jishnub
Copy link
Member

jishnub commented Jul 17, 2021

This might be relevant:

reshaping is essentially an appeal to the linear index representation, and that differs for 1-dimensional objects (for which the linear index can start anywhere) and higher-dimensional objects (for which linear indexing always starts at 1).

In this sense reshaping for arrays should always use linear indices of the parent, and we view the reshape methods defined here as re-indexing the reshaped array (and unrelated to the Cartesian indices of the parent array). If ranges of indices are specified as arguments to reshape, these become the indices along the corresponding axes of the reshaped array. A colon in that case implies that we retain the axes of the reshaped array along that dimension. This is also consistent with what the OffsetArray constructor does, where a colon retains the axis of the parent array along the corresponding dimension.

@timholy
Copy link
Member

timholy commented Jul 18, 2021

Agreed with @jishnub:

  • starts with 1 if there is more than one remaining axis in the parent array (that's the linearindices rule for multidimensional arrays)
  • starts with 1 if the child cannot have same axis as the parent (e.g., like reshaping a 2x6 into a 3x4 with : in the second slot)
  • adopts the parent axis otherwise.

That last bit could merit a bit of thinking during actual implementation wrt what types get created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants