Spaces:
Runtime error
Runtime error
| import streamlit as st | |
| from streamlit_extras.switch_page_button import switch_page | |
| st.title("VITMAE") | |
| st.success("""[Original tweet](https://twitter.com/mervenoyann/status/1740688304784183664) (December 29, 2023)""", icon="ℹ️") | |
| st.markdown(""" """) | |
| st.markdown("""Just read VitMAE paper, sharing some highlights 🧶 | |
| ViTMAE is a simply yet effective self-supervised pre-training technique, where authors combined vision transformer with masked autoencoder. | |
| The images are first masked (75 percent of the image!) and then the model tries to learn about the features through trying to reconstruct the original image! | |
| """) | |
| st.markdown(""" """) | |
| st.image("pages/VITMAE/image_1.jpeg", use_column_width=True) | |
| st.markdown(""" """) | |
| st.markdown("""The image is not masked, but rather only the visible patches are fed to the encoder (and that is the only thing encoder sees!). | |
| Next, a mask token is added to where the masked patches are (a bit like BERT, if you will) and the mask tokens and encoded patches are fed to decoder. | |
| The decoder then tries to reconstruct the original image. | |
| """) | |
| st.markdown(""" """) | |
| st.image("pages/VITMAE/image_2.jpeg", use_column_width=True) | |
| st.markdown(""" """) | |
| st.markdown("""As a result, the authors found out that high masking ratio works well in fine-tuning for downstream tasks and linear probing 🤯🤯 | |
| """) | |
| st.markdown(""" """) | |
| st.image("pages/VITMAE/image_3.jpeg", use_column_width=True) | |
| st.markdown(""" """) | |
| st.markdown("""If you want to try the model or fine-tune, all the pre-trained VITMAE models released released by Meta are available on [Huggingface](https://t.co/didvTL9Zkm). | |
| We've built a [demo](https://t.co/PkuACJiKrB) for you to see the intermediate outputs and reconstruction by VITMAE. | |
| Also there's a nice [notebook](https://github.com/NielsRogge/Transformers-Tutorials/blob/master/ViTMAE/ViT_MAE_visualization_demo.ipynb) by [@NielsRogge](https://twitter.com/NielsRogge). | |
| """) | |
| st.markdown(""" """) | |
| st.image("pages/VITMAE/image_4.jpeg", use_column_width=True) | |
| st.markdown(""" """) | |
| st.info(""" | |
| Ressources: | |
| [Masked Autoencoders Are Scalable Vision Learners](https://arxiv.org/abs/2111.06377v3) | |
| by LKaiming He, Xinlei Chen, Saining Xie, Yanghao Li, Piotr Dollár, Ross Girshick (2021) | |
| [GitHub](https://github.com/facebookresearch/mae) | |
| [Hugging Face documentation](https://huggingface.co/docs/transformers/model_doc/vit_mae)""", icon="📚") | |
| st.markdown(""" """) | |
| st.markdown(""" """) | |
| st.markdown(""" """) | |
| col1, col2, col3 = st.columns(3) | |
| with col1: | |
| if st.button('Previous paper', use_container_width=True): | |
| switch_page("OneFormer") | |
| with col2: | |
| if st.button('Home', use_container_width=True): | |
| switch_page("Home") | |
| with col3: | |
| if st.button('Next paper', use_container_width=True): | |
| switch_page("DINOV2") |