Spaces:
Runtime error
Runtime error
| # FAQ | |
| ___ | |
| Q1: What if I want to use other network backbones, such as ResNet [1], instead of only those provided ones (e.g., Xception)? | |
| A: The users could modify the provided core/feature_extractor.py to support more network backbones. | |
| ___ | |
| Q2: What if I want to train the model on other datasets? | |
| A: The users could modify the provided dataset/build_{cityscapes,voc2012}_data.py and dataset/segmentation_dataset.py to build their own dataset. | |
| ___ | |
| Q3: Where can I download the PASCAL VOC augmented training set? | |
| A: The PASCAL VOC augmented training set is provided by Bharath Hariharan et al. [2] Please refer to their [website](http://home.bharathh.info/pubs/codes/SBD/download.html) for details and consider citing their paper if using the dataset. | |
| ___ | |
| Q4: Why the implementation does not include DenseCRF [3]? | |
| A: We have not tried this. The interested users could take a look at Philipp Krähenbühl's [website](http://graphics.stanford.edu/projects/densecrf/) and [paper](https://arxiv.org/abs/1210.5644) for details. | |
| ___ | |
| Q5: What if I want to train the model and fine-tune the batch normalization parameters? | |
| A: If given the limited resource at hand, we would suggest you simply fine-tune | |
| from our provided checkpoint whose batch-norm parameters have been trained (i.e., | |
| train with a smaller learning rate, set `fine_tune_batch_norm = false`, and | |
| employ longer training iterations since the learning rate is small). If | |
| you really would like to train by yourself, we would suggest | |
| 1. Set `output_stride = 16` or maybe even `32` (remember to change the flag | |
| `atrous_rates` accordingly, e.g., `atrous_rates = [3, 6, 9]` for | |
| `output_stride = 32`). | |
| 2. Use as many GPUs as possible (change the flag `num_clones` in train.py) and | |
| set `train_batch_size` as large as possible. | |
| 3. Adjust the `train_crop_size` in train.py. Maybe set it to be smaller, e.g., | |
| 513x513 (or even 321x321), so that you could use a larger batch size. | |
| 4. Use a smaller network backbone, such as MobileNet-v2. | |
| ___ | |
| Q6: How can I train the model asynchronously? | |
| A: In the train.py, the users could set `num_replicas` (number of machines for training) and `num_ps_tasks` (we usually set `num_ps_tasks` = `num_replicas` / 2). See slim.deployment.model_deploy for more details. | |
| ___ | |
| Q7: I could not reproduce the performance even with the provided checkpoints. | |
| A: Please try running | |
| ```bash | |
| # Run the simple test with Xception_65 as network backbone. | |
| sh local_test.sh | |
| ``` | |
| or | |
| ```bash | |
| # Run the simple test with MobileNet-v2 as network backbone. | |
| sh local_test_mobilenetv2.sh | |
| ``` | |
| First, make sure you could reproduce the results with our provided setting. | |
| After that, you could start to make a new change one at a time to help debug. | |
| ___ | |
| Q8: What value of `eval_crop_size` should I use? | |
| A: Our model uses whole-image inference, meaning that we need to set `eval_crop_size` equal to `output_stride` * k + 1, where k is an integer and set k so that the resulting `eval_crop_size` is slightly larger the largest | |
| image dimension in the dataset. For example, we have `eval_crop_size` = 513x513 for PASCAL dataset whose largest image dimension is 512. Similarly, we set `eval_crop_size` = 1025x2049 for Cityscapes images whose | |
| image dimension is all equal to 1024x2048. | |
| ___ | |
| Q9: Why multi-gpu training is slow? | |
| A: Please try to use more threads to pre-process the inputs. For, example change [num_readers = 4](https://github.com/tensorflow/models/blob/master/research/deeplab/train.py#L457). | |
| ___ | |
| ## References | |
| 1. **Deep Residual Learning for Image Recognition**<br /> | |
| Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun<br /> | |
| [[link]](https://arxiv.org/abs/1512.03385), In CVPR, 2016. | |
| 2. **Semantic Contours from Inverse Detectors**<br /> | |
| Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, Jitendra Malik<br /> | |
| [[link]](http://home.bharathh.info/pubs/codes/SBD/download.html), In ICCV, 2011. | |
| 3. **Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials**<br /> | |
| Philipp Krähenbühl, Vladlen Koltun<br /> | |
| [[link]](http://graphics.stanford.edu/projects/densecrf/), In NIPS, 2011. | |