microsoft
/

rad-dino

@@ -58,6 +58,8 @@ Underlying biases of the training datasets may not be well characterized.
 ## Getting started
 Let us first write an auxiliary function to download a chest X-ray.
 ```python
@@ -73,6 +75,8 @@ Let us first write an auxiliary function to download a chest X-ray.
 ...
 ```
 Now let us download the model and encode an image.
 ```python
@@ -82,13 +86,17 @@ Now let us download the model and encode an image.
 >>>
 >>> # Download the model
 >>> repo = "microsoft/rad-dino"
->>> model = AutoModel.from_pretrained(repo)
 >>>
 >>> # The processor takes a PIL image, performs resizing, center-cropping, and
 >>> # intensity normalization using stats from MIMIC-CXR, and returns a
 >>> # dictionary with a PyTorch tensor ready for the encoder
 >>> processor = AutoImageProcessor.from_pretrained(repo)
->>>
 >>> # Download and preprocess a chest X-ray
 >>> image = download_sample_image()
 >>> image.size  # (width, height)
@@ -97,7 +105,7 @@ Now let us download the model and encode an image.
 >>>
 >>> # Encode the image!
 >>> with torch.inference_mode():
->>>     outputs = model(**inputs)
 >>>
 >>> # Look at the CLS embeddings
 >>> cls_embeddings = outputs.pooler_output
@@ -124,6 +132,62 @@ We will use [`einops`](https://einops.rocks/) (install with `pip install einops`
 torch.Size([1, 768, 37, 37])
 ```
 ## Training details
 ### Training data
@@ -225,4 +289,4 @@ We used [SimpleITK](https://simpleitk.org/) and [Pydicom](https://pydicom.github
 ## Model card contact
-Fernando Pérez-García ([`fperezgarcia@microsoft.com`](mailto:fperezgarcia@microsoft.com)).

 ## Getting started
+### Get some data
 Let us first write an auxiliary function to download a chest X-ray.
 ```python
 ...
 ```
+### Load the model
 Now let us download the model and encode an image.
 ```python
 >>>
 >>> # Download the model
 >>> repo = "microsoft/rad-dino"
+>>> rad_dino = AutoModel.from_pretrained(repo)
 >>>
 >>> # The processor takes a PIL image, performs resizing, center-cropping, and
 >>> # intensity normalization using stats from MIMIC-CXR, and returns a
 >>> # dictionary with a PyTorch tensor ready for the encoder
 >>> processor = AutoImageProcessor.from_pretrained(repo)
+```
+### Encode an image
+```python
 >>> # Download and preprocess a chest X-ray
 >>> image = download_sample_image()
 >>> image.size  # (width, height)
 >>>
 >>> # Encode the image!
 >>> with torch.inference_mode():
+>>>     outputs = rad_dino(**inputs)
 >>>
 >>> # Look at the CLS embeddings
 >>> cls_embeddings = outputs.pooler_output
 torch.Size([1, 768, 37, 37])
 ```
+### Weights for fine-tuning
+We have released a checkpoint compatible with
+[the original DINOv2 code](https://github.com/facebookresearch/dinov2) to help
+researchers fine-tune our model.
+First, let us write code to load a
+[`safetensors` checkpoint](https://huggingface.co/docs/safetensors).
+```python
+>>> import safetensors
+>>> def safetensors_to_state_dict(checkpoint_path: str) -> dict[str, torch.Tensor]:
+...     state_dict = {}
+...     with safe_open(checkpoint_path, framework="pt") as ckpt_file:
+...         for key in ckpt_file.keys():
+...             state_dict[key] = ckpt_file.get_tensor(key)
+...     return state_dict
+...
+```
+We can now use the hub model and load the RAD-DINO weights.
+Let's clone the DINOv2 repository so we can import the code for the head.
+```shell
+git clone https://github.com/facebookresearch/dinov2.git
+cd dinov2
+```
+```python
+>>> import torch
+>>> rad_dino_gh = torch.hub.load(".", "dinov2_vitb14")
+>>> backbone_state_dict = safetensors_to_state_dict("backbone_compatible.safetensors")
+>>> rad_dino_gh.load_state_dict(backbone_state_dict, strict=True)
+<All keys matched successfully>
+```
+The weights of the head are also released:
+```python
+>>> from dinov2.layers import DINOHead
+>>> rad_dino_head_gh = DINOHead(
+...    in_dim=768,
+...    out_dim=65536,
+...    hidden_dim=2048,
+...    bottleneck_dim=256,
+...    nlayers=3,
+... )
+>>> head_state_dict = safetensors_to_state_dict("dino_head.safetensors")
+>>> rad_dino_head_gh.load_state_dict(head_state_dict, strict=True)
+<All keys matched successfully>
+```
+### Configs and augmentation
+The configuration files [`ssl_default_config.yaml`](./ssl_default_config.yaml) and [`vitb14_cxr.yaml`](./vitb14_cxr.yaml), and the [`augmentations`](./augmentations.py) module are also available in the repository to help researchers reproduce the training procedure with our hyperparameters.
 ## Training details
 ### Training data
 ## Model card contact
+Fernando Pérez-García ([`fperezgarcia@microsoft.com`](mailto:fperezgarcia@microsoft.com)).