-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Feature: support timm
features_only functionality
#373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Also, the functionality is part of my model CI so it should be fairly stable: https://github.com/rwightman/pytorch-image-models/blob/master/tests/test_models.py#L177 |
Hello @rwightman , big respect for the fantastic work you have done on |
@JulienMaille basically yes, but you should look at the links, including usage for my efficientdet impl, and the docs. You can specify specific out_indices and any backbone created with There is a bit of variation in what out_indices mean, as it's 0 based index and some models have different numbers of possible feature taps. I plan to extend the API in the future to support str based It's possible to specify
Comparing models with different striding like vgg (which has stride=1-32 features, and most other nets with stride=2-32)
One gotcha is that some models use forward hooks to grab the features from deep activations in the net. These will not work with torchscript (will be fixed in PyTorch someday soon). But it's limited to Xception and TF NasNet models right now, and I may change the default to use non-activated output from a higher level in the net instead for those. |
To try it out, without breaking any of the existing models or timm usage, one could create something like a TimmGeneric/TimmFeaturesEncoder that acts as an adapter for this and lets the user pass any timm model as a string and which stage features they want. |
I did exactly this for the past 50mins. Will share something so you can tell if I'm heading in the right direction. EDIT: JulienMaille@096aff4 tested with a Unet-EfficientNet-b0 |
@JulienMaille is get_stages() used by any functions besides the forward()? That won't really work as there isn't consistent layer/stage names from model to model. The idea with features_only is that it's constructed knowing the structure of the original backbone so it just spits out a list of the feature_maps specified by out_indices for you. The feature wrapper already cuts off head and later modules that aren't used.
|
@rwightman I thought that
self.encoder.make_dilated(
stage_list=[4, 5],
dilation_list=[2, 4]
) |
@JulienMaille Hmm, yeah, make_dilated depends on get_stages and that would not necessarily work universally. Although I could probably make it work for a number of models by generating stage indices.. hmm. The equivalent functionality for timm factory is to specify the output_stride as one of 32 (the deafult), 16, or 8. I think 4 might work on a few but I haven't tested. A number of models are limited to only output_stride=32 though, so it may err out on some backbones. |
@rwightman I had a look at all the failures from qubvel test module Unet+timm-u-adv_inception_v3: Exception Calculated padded input size per channel: (2 x 2). Kernel size: (3 x 3). Kernel size can't be greater than actual input size
Unet+timm-u-cspdarknet53: Exception assert torch.Size([128, 128]) == torch.Size([64, 64])
Unet+timm-u-cspdarknet53_iabn: Exception Please install InplaceABN:'pip install git+https://github.com/mapillary/inplace_abn.git@v1.0.12'
Unet+timm-u-cspresnext50_iabn: Exception Please install InplaceABN:'pip install git+https://github.com/mapillary/inplace_abn.git@v1.0.12'
Unet+timm-u-darknet53: Exception assert torch.Size([128, 128]) == torch.Size([64, 64])
Unet+timm-u-densenet264d_iabn: Exception Please install InplaceABN:'pip install git+https://github.com/mapillary/inplace_abn.git@v1.0.12'
Unet+timm-u-dla***: Exception assert torch.Size([128, 128]) == torch.Size([64, 64])
Unet+timm-u-ecaresnet50d_pruned: Exception Given groups=1, weight of size [16, 3072, 3, 3], expected input[1, 2840, 4, 4] to have 3072 channels, but got 2840 channels instead inplace_abn fails to install on my system, some models seems to return a tensor with an unexpected size. |
Hi @rwightman and @JulienMaille |
@JulienMaille Yes, there are some models that are a bit different in terms of the output strides they support (or require an extra module like the '*iabn'/tresnet models). It's a small number relative to the whole that have those exceptions. I feel it's be net benefit to have the universal timm encoder, but maybe indicate it's beta/alpha or could be issues with specific backbones. For the tests, a subset of known 'good' timm models could be specified (via inclusion/exclusion model names, wildcard/regex, etc). While it's being vetted, manual encoders could still be defined if desired. Once the corner cases have been dealt with in the universal encoder (both here and driving some changes on my end), there shouldnt' be any need to define custom encoders for timm models in the future. |
@rwightman which models should be excluded from tests? Right now this is what I'm filtering |
@rwightman Creating my model through |
merged to master |
can you please add to the API a way to make the network output logits and last layer features? |
Maybe I did not understand you, but model return logits of the last layer if activation function argument is not provided |
I understand I couldn't find a way in the current API that allows something like: it would be really helpful if there was such a thing in many research areas including training models with custom loss one could use PyTorch hooks to achieve this for each model but there wouldn't be a consistent API for all models and would require looking up the names that each model gives for the final features layer. Thank you for the quick reply! |
@rwightman what would be needed to use ViT in segmentation models? |
I've noticed more and more
timm
backbones being added here, which is great, but a lot of the effort is currently duplicating some features oftimm
, ie tracking channel numbers, modifying the networks, etc.timm
has afeatures_only
arg in the model factory that will return a model setup as a backbone to produce pyramid features. It has a .features_info attribute you can query to understand what the channels of each output, the approx reduction factor is, etc.I've adapted the unet and deeplab impl here in the past to use this successfully, although it was quick hack and train work, nothing to serve as a clean example.
If this was supported, any timm model (vit excluded right now) can be used as a backbone in generic fashion, just by model name string passed to creation fn, possibly a small config mapping of model types to index specificiations (some models have slightly different
out_indices
alignment to strides if they happen be a stride 64 model, or don't have a stride=2 feature, etc). All tap points are the latest possible point for a given feature map stride. Some, but not all of the timm backbones also support anoutput_stride=
arg that will dilate the blocks appropriately for 8, 16 network strides.Some references:
For most of the models, the featuers are extracted by flattening part of the backbone model via wrapper. A few models where the feature taps are embedded deep within the model use hooks, which causes some issues with torchscript but that will likely be fixed soon in PyTorch.
The text was updated successfully, but these errors were encountered: