Skip to content

Incorporate SelecSLS Models #65

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 30, 2019
Merged

Incorporate SelecSLS Models #65

merged 3 commits into from
Dec 30, 2019

Conversation

mehtadushy
Copy link
Contributor

Hi Ross

I have ported my SelecSLS Net (https://github.com/mehtadushy/SelecSLS-Pytorch) implementation to your framework, and have also trained a couple of variants using your training setup.
These would be of interest to you because of their significantly smaller GPU memory footprint than ResNets, and their much faster inference speed, all the while being at par with ResNet50 (SelecSLS60/60_B) in terms of accuracy.

The URLs for the pre-trained models will take a couple of days to go online. Meanwhile you can get the models for stopgap testing from http://people.mpi-inf.mpg.de/~dmehta/xnect_models/SelecSLS42_B.pth and http://people.mpi-inf.mpg.de/~dmehta/xnect_models/SelecSLS60_B.pth .

Best

@rwightman
Copy link
Collaborator

@mehtadushy thanks, I'll take a look at this next week

@rwightman
Copy link
Collaborator

@mehtadushy Looks good, compelling GPU resource utilization for accuracy levels. I'm going to merge as is and then tweak a few style/consistency things myself (lower case model strings/entry point fns, etc).

I downloaded your weights files, I plan to add the hash to the filename and host a copy in a separate GitHub release within this repo that mentions the origin so that it works with the model zoo downloader. Just like HRNet, Res2Net (https://github.com/rwightman/pytorch-image-models/releases) Is that okay?

@rwightman rwightman merged commit fb3a0f4 into huggingface:master Dec 30, 2019
@mehtadushy
Copy link
Contributor Author

Yes, that would be ok as long as a note refers back to the original repository (https://github.com/mehtadushy/SelecSLS-Pytorch) for license terms, and to the paper details (https://arxiv.org/abs/1907.00837) for citation.

@rwightman
Copy link
Collaborator

rwightman commented Dec 31, 2019

@mehtadushy okay, done... it's on the master now and created a release with the requested links and info for the weights. I made a few changes to bring in line with some naming prefs. Checkpoint compatibility maintained.

https://github.com/rwightman/pytorch-image-models/releases/tag/v0.1-selecsls

@mehtadushy
Copy link
Contributor Author

Thanks!

@rwightman
Copy link
Collaborator

@mehtadushy Do you happen to have any of the hparams used for training the SelecSLS? I was going to run some experiments with them for some new augmentations since they're faster throughput/bigger batch then the my ResNet usuals, but my first pass with my usual LR and hparams for ResNet wasn't quite as good.

@mehtadushy
Copy link
Contributor Author

mehtadushy commented Jan 8, 2020

I trained them a long time ago, and did not seem to have saved the exact hyperparameters used, but the following ones which I have been using for recent related experiments are very close.

First train with RMSProp for around 100 epochs
CUDA_VISIBLE_DEVICES=0,1 ./distributed_train.sh 2 <dataset/path> --model <model_name> --sched step --epochs 150 --warmup-epochs 7 --lr 0.011 --opt rmsproptf --opt-eps 0.001 --decay-rate 0.92 --decay-epochs 3 --min-lr 5e-5 --batch-size 256 -j 16 --reprob 0.4 --remode pixel --amp --output ../Pytorch/logs/imagenet_selesls_fancy/ --model-ema --model-ema-force-cpu --color-jitter 0.2 (I initialized this with a checkpoint from one of my previous attempts with SGD, but I think should be ok without an initial checkpoint. If there is an initial checkpoint, then --warmup-epochs 0 --lr 0.01)

I don't let the RMSProp training run through the entire epoch range, and stop it close to 100, and have SGD with a smaller batch size do the tail end of training. I let this run for around 30 epochs
CUDA_VISIBLE_DEVICES=1 ./distributed_train.sh 1 <dataset/path> --model --sched cosine --epochs 40 --warmup-epochs 0 --lr 1e-3 --min-lr 5e-5 --batch-size 256 -j 16 --reprob 0.4 --remode pixel --amp --output . ./Pytorch/logs/imagenet_selecsls_fancy/ --model-ema --model-ema-force-cpu --initial-checkpoint <path_to_checkpoint_from_rmsproptf_run> --color-jitter 0.1

Then for the EMA part to have a slightly more diverse history, I lower the batch size further, and run for around 10 epochs (I stop at 10 even though this runs for 20)
CUDA_VISIBLE_DEVICES=0 ./distributed_train.sh 1 <dataset/path> --model --sched cosine --epochs 10 --warmup-epochs 0 --lr 1e-4 --min-lr 5e-5 --batch-size 128 -j 16 --reprob 0.2 --remode pixel --amp --output ../Pytorch/logs/imagenet_selecsls_fancy/ --model-ema --initial-checkpoint <path_to_checkpoint_from_sgd_run> --color-jitter 0.05

In addition to EMA, I also used SWA with a couple of runs with adam, starting from the EMA and non EMA weights of the previous run, but I don't remember the exact details of that, and have not been using it for my recent experiments. Without this, you should be able to get within 0.2 of the reported top-1 performance.

@rwightman
Copy link
Collaborator

@mehtadushy thanks for the details, I try something along those lines and see what I get.

guoriyue pushed a commit to guoriyue/pytorch-image-models that referenced this pull request May 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants