-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Enable models for sparsely sampled fMRI series #376
base: master
Are you sure you want to change the base?
Conversation
1374b89
to
90b861d
Compare
Codecov Report
@@ Coverage Diff @@
## master #376 +/- ##
===========================================
- Coverage 73.79% 45.78% -28.02%
===========================================
Files 23 23
Lines 2492 2514 +22
Branches 621 628 +7
===========================================
- Hits 1839 1151 -688
- Misses 471 1240 +769
+ Partials 182 123 -59
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #376 +/- ##
==========================================
- Coverage 62.29% 61.98% -0.32%
==========================================
Files 27 27
Lines 4554 4611 +57
Branches 1173 1189 +16
==========================================
+ Hits 2837 2858 +21
- Misses 1433 1462 +29
- Partials 284 291 +7
Continue to review full report at Codecov.
|
@satra @tyarkoni I think this is ready for review. There's almost certainly a better way to do the integration, but this should work. In passing, I had to fix the thing where |
else: | ||
from scipy.interpolate import interp1d | ||
x = np.arange(n) | ||
f = interp1d(x, self.values.values.ravel(), kind=kind) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is ok for upsampling not fine for downsampling. depending on frequency content this will introduce aliasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure avoiding aliasing is in scope... or at least, I'm not sure we want to implicitly start low-pass filtering the input signal without the user's explicit consent. I'd be okay adding an explicit TemporalFilter
transformation (actually I think this is already in the spec and just not yet implemented), and the user can then call that themselves before the resampling step if they like. But doing some magic internally to pick a suitable filter or otherwise avoid aliasing means (a) reproducibility across packages is limited, and (b) we're taking on the responsibility for producing sensible results, and to my mind this really falls on the user.
# Use a unit that fits an whole number of times into both | ||
# the interscan interval (TR) and the integration window (TA) | ||
dt = gcd(TR, TA) | ||
if dt > SR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm probably missing something, but if we have to give up on a value that neatly bins TR
and TA
, why not just use SR
at that point? E.g., if dt = 50 and SR = 10, then this will give us dt = 10, so we're at the SR anyway. If dt = 50 and SR = 17, we get dt = 25, which both prevents neat binning and is bigger than we wanted. Is the idea that at least every dt//SR
^th bin will align nicely this way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
25 also permits neat binning...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, duh. The inversion tripped me up. That is indeed the thing I was missing. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That said, I'm still not sure we shouldn't just use the SR
. Since the user could have set the sr
on the collection manually, the compromise solution here risks explicitly ignoring their intentions. This isn't an entirely benign decision, because in subsequent transformations, if a mismatch in sampling rates is detected, that will trigger a resampling in at least one variable, when potentially none might have been needed. They could also have code that depends on knowing what the SR is.
I think maybe we need to come up with a general policy about this, because it crops up in other places too (e.g., the issue about what to do if event files contain very short durations). I.e., the general question is, in a situation where the user appears to be picking sampling rates that are objectively inefficient or result in loss of information, should we be paternalistic and fix it for them, or just let them know they can probably do better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we could also do, though it's a bit annoying, is internally maintain a flag that indicates whether or not the user has explicitly overridden the default sampling rate. If SR=10
purely because that's the default, overriding it seems fine. If the user explicitly set it themselves, maybe we don't want to pick a different value for them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we're trying to thread a needle where we make smart default choices based on our expertise on the one hand, and on the other, load a gun, cock it and point it at the foot of anyone who looks like they might know what they're doing...
Currently we reserve the footgun for people who use ToDense
to get themselves a sampling interval that is relatively prime to gcd(TR, TA)
. If someone knows enough to manipulate the default sampling rate, they can learn that Convolve
with a sparse acquisition paradigm will choose a slightly different interval than with a continuous acquisition paradigm unless the variable is already dense.
But perhaps there's a better general approach here? I admit that I chose this simply because building a boxcar function and taking a mean is easier than learning how to do this with interp1d
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general issue potentially affects anyone who sets the sampling_rate
explicitly in BIDSLayout.get_collections
or load_variables
. A sophisticated user could certainly know enough about their dataset to think "I'll use a sampling rate of 50, because that's high enough to account for my short-duration events, but still manageable computationally". Thereafter, I think that user could reasonably expect that if they call ToDense
or Convolve
without explicitly specifying a different sampling rate, the resulting variable will have an effective sampling rate of 50.
I think making decisions for the user internally is fine if there's an explicit argument in the transformation that implies as much. I.e., if ToDense
has default sampling_rate='auto'
, then the docstring can just explain that we will act intelligently unless a numeric value is passed, and the result may not equal what's currently set in the collection. That seems fine. The issue is that in this case, there's no provision to specify the sampling rate in Convolve
. I was against doing that because I don't think the spec should have that argument, but I guess we can make it a pybids-only argument, if only for the sake of making it clear to a sophisticated user what's going on.
Hey, unrelated to the above thread, doesn't this implementation run counter to the suggestion previously that we handle as much of the densification logic as possible in the (implicit) |
I'm trying to think through this. Variables have no notion of a "default" sampling rate, as currently written, so moving this logic into Just to make clear, my distinction between typical and power users is pretty much whether they're doing this in JSON or Python. For the latter, most things are possible, so getting to the right behaviors for model writers is my priority. |
It doesn't necessarily have to live in If you look at the logic in the base As far as as I can tell, this would elegantly solve the problem while adhering to the general principle that the transformation-specific logic should be kept as simple as possible, and the complexity should be hidden in the base class as much as possible (to make it easier to implement new transformations and minimize the maintenance burden). What do you think? |
I think that seems like a decent approach. So a question: Should I clean up the tests here and get this merged, or start on that refactor? I think for @satra's purposes, it might be best to get a working sparse implementation in ASAP. |
Merging this as-is seems fine. The refactor will probably be pretty straightforward, and if any problems arise, they'll likely be coming from integration with 0.8 (though I think it should be fine), so we may as well hold off until that's ready. |
@mgxd Have you had a chance to give this a shot? Anything I can do to help? |
I've merged the 0.8 upgrade in, so any work on this going forward should be done in the context of poldracklab/fitlins@7ee02a2. @mgxd @satra Ping me for anything I can do to help. |
Okay. Once that's in, I'll try to re-assess and see if I can see the path forward your way. (I can't remember if it ever clicked, so I guess it didn't click very well...) |
The general idea was that computing the Maybe it makes sense to add a helper method in Assuming the densification is done implicitly, I think you then won't need to do much else in |
I just realized that I hadn't tested with Dense variables yet, and it will probably fail because I'm getting the onsets from var, not df. I'll change that in a sec. |
Yet another good reason to prioritize #320 (not that I needed more)! Will try to get to that shortly. |
Yep, I am adding at least a few tests for convolve as part of #411 |
Closes #252.