Spaces:
				
			
			
	
			
			
		Running
		
			on 
			
			Zero
	
	
	
			
			
	
	
	
	
		
		
		Running
		
			on 
			
			Zero
	Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -14,7 +14,7 @@ fullWidth: true | |
| 14 | 
             
            ---
         | 
| 15 |  | 
| 16 | 
             
            <p align="center">
         | 
| 17 | 
            -
              <img src="figs/logo.png" width="50%" />
         | 
| 18 | 
             
            </p>
         | 
| 19 |  | 
| 20 |  | 
| @@ -33,135 +33,14 @@ fullWidth: true | |
| 33 |  | 
| 34 | 
             
            Chunbo Hao<sup>*</sup>, Ruibin Yuan<sup>*</sup>, Jixun Yao, Qixin Deng, Xinyi Bai, Wei Xue, Lei Xie<sup>†</sup>
         | 
| 35 |  | 
| 36 | 
            -
             | 
| 37 | 
             
            ----
         | 
| 38 |  | 
|  | |
| 39 |  | 
| 40 | 
             
            SongFormer is a music structure analysis framework that leverages multi-resolution self-supervised representations and heterogeneous supervision, accompanied by the large-scale multilingual dataset SongFormDB and the high-quality benchmark SongFormBench to foster fair and reproducible research.
         | 
| 41 |  | 
| 42 | 
            -
            
         | 
| 43 | 
            -
             | 
| 44 | 
            -
            ## News and Updates
         | 
| 45 | 
            -
             | 
| 46 | 
            -
            ## 📋 To-Do List
         | 
| 47 | 
            -
             | 
| 48 | 
            -
            - [x] Complete and push inference code to GitHub
         | 
| 49 | 
            -
            - [x] Upload model checkpoint(s) to Hugging Face Hub
         | 
| 50 | 
            -
            - [ ] Upload the paper to arXiv
         | 
| 51 | 
            -
            - [x] Fix readme
         | 
| 52 | 
            -
            - [ ] Deploy an out-of-the-box inference version on Hugging Face (via Inference API or Spaces)
         | 
| 53 | 
            -
            - [ ] Publish the package to PyPI for easy installation via `pip`
         | 
| 54 | 
            -
            - [ ] Open-source evaluation code
         | 
| 55 | 
            -
            - [ ] Open-source training code
         | 
| 56 | 
            -
             | 
| 57 | 
            -
            ## Installation
         | 
| 58 | 
            -
             | 
| 59 | 
            -
            ### Setting up Python Environment
         | 
| 60 | 
            -
             | 
| 61 | 
            -
            ```bash
         | 
| 62 | 
            -
            git clone https://github.com/ASLP-lab/SongFormer.git
         | 
| 63 | 
            -
             | 
| 64 | 
            -
            # Get MuQ and MusicFM source code
         | 
| 65 | 
            -
            git submodule update --init --recursive
         | 
| 66 | 
            -
             | 
| 67 | 
            -
            conda create -n songformer python=3.10 -y
         | 
| 68 | 
            -
            conda activate songformer
         | 
| 69 | 
            -
            ```
         | 
| 70 | 
            -
             | 
| 71 | 
            -
            For users in mainland China, you may need to set up pip mirror source:
         | 
| 72 | 
            -
             | 
| 73 | 
            -
            ```bash
         | 
| 74 | 
            -
            pip config set global.index-url https://pypi.mirrors.ustc.edu.cn/simple
         | 
| 75 | 
            -
            ```
         | 
| 76 | 
            -
             | 
| 77 | 
            -
            Install dependencies:
         | 
| 78 | 
            -
             | 
| 79 | 
            -
            ```bash
         | 
| 80 | 
            -
            pip install -r requirements.txt
         | 
| 81 | 
            -
            ```
         | 
| 82 | 
            -
             | 
| 83 | 
            -
            We tested this on Ubuntu 22.04.1 LTS and it works normally. If you cannot install, you may need to remove version constraints in `requirements.txt`
         | 
| 84 | 
            -
             | 
| 85 | 
            -
            ### Download Pre-trained Models
         | 
| 86 | 
            -
             | 
| 87 | 
            -
            ```bash
         | 
| 88 | 
            -
            cd src/SongFormer
         | 
| 89 | 
            -
            # For users in mainland China, you can modify according to the py file instructions to use hf-mirror.com for downloading
         | 
| 90 | 
            -
            python utils/fetch_pretrained.py
         | 
| 91 | 
            -
            ```
         | 
| 92 | 
            -
             | 
| 93 | 
            -
            After downloading, you can verify the md5sum values in `src/SongFormer/ckpts/MusicFM/md5sum.txt` match the downloaded files:
         | 
| 94 | 
            -
             | 
| 95 | 
            -
            ```bash
         | 
| 96 | 
            -
            md5sum ckpts/MusicFM/msd_stats.json
         | 
| 97 | 
            -
            md5sum ckpts/MusicFM/pretrained_msd.pt
         | 
| 98 | 
            -
            md5sum ckpts/SongFormer.safetensors
         | 
| 99 | 
            -
            # md5sum ckpts/SongFormer.pt
         | 
| 100 | 
            -
            ```
         | 
| 101 | 
            -
             | 
| 102 | 
            -
            ## Inference
         | 
| 103 | 
            -
             | 
| 104 | 
            -
            ## Inference
         | 
| 105 | 
            -
             | 
| 106 | 
            -
            ### 1. One-Click Inference with HuggingFace Space (coming soon)
         | 
| 107 | 
            -
             | 
| 108 | 
            -
            Available at: [https://huggingface.co/spaces/ASLP-lab/SongFormer](https://huggingface.co/spaces/ASLP-lab/SongFormer)
         | 
| 109 | 
            -
             | 
| 110 | 
            -
            ### 2. Gradio App
         | 
| 111 |  | 
| 112 | 
            -
            First, cd to the project root directory and activate the environment:
         | 
| 113 | 
            -
             | 
| 114 | 
            -
            ```bash
         | 
| 115 | 
            -
            conda activate songformer
         | 
| 116 | 
            -
            ```
         | 
| 117 | 
            -
             | 
| 118 | 
            -
            You can modify the server port and listening address in the last line of `app.py` according to your preference.
         | 
| 119 | 
            -
             | 
| 120 | 
            -
            > If you're using an HTTP proxy, please ensure you include:
         | 
| 121 | 
            -
            >
         | 
| 122 | 
            -
            > ```bash
         | 
| 123 | 
            -
            > export no_proxy="localhost, 127.0.0.1, ::1"
         | 
| 124 | 
            -
            > export NO_PROXY="localhost, 127.0.0.1, ::1"
         | 
| 125 | 
            -
            > ```
         | 
| 126 | 
            -
            >
         | 
| 127 | 
            -
            > Otherwise, Gradio may incorrectly assume the service hasn't started, causing startup to exit directly.
         | 
| 128 | 
            -
             | 
| 129 | 
            -
            When first running `app.py`, it will connect to Hugging Face to download MuQ-related weights. We recommend creating an empty folder in an appropriate location and using `export HF_HOME=XXX` to point to this folder, so cache will be stored there for easy cleanup and transfer.
         | 
| 130 | 
            -
             | 
| 131 | 
            -
            And for users in mainland China, you may need `export HF_ENDPOINT=https://hf-mirror.com`. For details, refer to https://hf-mirror.com/
         | 
| 132 | 
            -
             | 
| 133 | 
            -
            ```bash
         | 
| 134 | 
            -
            python app.py
         | 
| 135 | 
            -
            ```
         | 
| 136 | 
            -
             | 
| 137 | 
            -
            ### 3. Python Code
         | 
| 138 | 
            -
             | 
| 139 | 
            -
            You can refer to the file `src/SongFormer/infer/infer.py`. The corresponding execution script is located at `src/SongFormer/infer.sh`. This is a ready-to-use, single-machine, multi-process annotation script.
         | 
| 140 | 
            -
             | 
| 141 | 
            -
            Below are some configurable parameters from the `src/SongFormer/infer.sh` script. You can set `CUDA_VISIBLE_DEVICES` to specify which GPUs to use:
         | 
| 142 | 
            -
             | 
| 143 | 
            -
            ```bash
         | 
| 144 | 
            -
            -i              # Input SCP folder path, each line containing the absolute path to one audio file
         | 
| 145 | 
            -
            -o              # Output directory for annotation results
         | 
| 146 | 
            -
            --model         # Annotation model; the default is 'SongFormer', change if using a fine-tuned model
         | 
| 147 | 
            -
            --checkpoint    # Path to the model checkpoint file
         | 
| 148 | 
            -
            --config_pat    # Path to the configuration file
         | 
| 149 | 
            -
            -gn             # Total number of GPUs to use — should match the number specified in CUDA_VISIBLE_DEVICES
         | 
| 150 | 
            -
            -tn             # Number of processes to run per GPU
         | 
| 151 | 
            -
            ```
         | 
| 152 | 
            -
             | 
| 153 | 
            -
            You can control which GPUs are used by setting the `CUDA_VISIBLE_DEVICES` environment variable.
         | 
| 154 | 
            -
             | 
| 155 | 
            -
            ### 4. CLI Inference
         | 
| 156 | 
            -
             | 
| 157 | 
            -
            Coming soon
         | 
| 158 | 
            -
             | 
| 159 | 
            -
            ### 4. Pitfall
         | 
| 160 | 
            -
             | 
| 161 | 
            -
            - You may need to modify line 121 in `src/third_party/musicfm/model/musicfm_25hz.py` to:
         | 
| 162 | 
            -
            `S = torch.load(model_path, weights_only=False)["state_dict"]`
         | 
| 163 | 
            -
             | 
| 164 | 
            -
            ## Training
         | 
| 165 |  | 
| 166 | 
             
            ## Citation
         | 
| 167 |  | 
| @@ -180,15 +59,4 @@ If our work and codebase is useful for you, please cite as: | |
| 180 | 
             
            ````
         | 
| 181 | 
             
            ## License
         | 
| 182 |  | 
| 183 | 
            -
            Our code is released under CC-BY-4.0 License.
         | 
| 184 | 
            -
             | 
| 185 | 
            -
            ## Contact Us
         | 
| 186 | 
            -
             | 
| 187 | 
            -
             | 
| 188 | 
            -
            <p align="center">
         | 
| 189 | 
            -
                <a href="http://www.nwpu-aslp.org/">
         | 
| 190 | 
            -
                    <img src="figs/aslp.png" width="400"/>
         | 
| 191 | 
            -
                </a>
         | 
| 192 | 
            -
            </p>
         | 
| 193 | 
            -
             | 
| 194 | 
            -
             | 
|  | |
| 14 | 
             
            ---
         | 
| 15 |  | 
| 16 | 
             
            <p align="center">
         | 
| 17 | 
            +
              <img src="https://github.com/ASLP-lab/SongFormer/blob/main/figs/logo.png?raw=true" width="50%" />
         | 
| 18 | 
             
            </p>
         | 
| 19 |  | 
| 20 |  | 
|  | |
| 33 |  | 
| 34 | 
             
            Chunbo Hao<sup>*</sup>, Ruibin Yuan<sup>*</sup>, Jixun Yao, Qixin Deng, Xinyi Bai, Wei Xue, Lei Xie<sup>†</sup>
         | 
| 35 |  | 
|  | |
| 36 | 
             
            ----
         | 
| 37 |  | 
| 38 | 
            +
            **For more information, please visit our [github repository](https://github.com/ASLP-lab/SongFormer)**
         | 
| 39 |  | 
| 40 | 
             
            SongFormer is a music structure analysis framework that leverages multi-resolution self-supervised representations and heterogeneous supervision, accompanied by the large-scale multilingual dataset SongFormDB and the high-quality benchmark SongFormBench to foster fair and reproducible research.
         | 
| 41 |  | 
| 42 | 
            +
            
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 43 |  | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
| 44 |  | 
| 45 | 
             
            ## Citation
         | 
| 46 |  | 
|  | |
| 59 | 
             
            ````
         | 
| 60 | 
             
            ## License
         | 
| 61 |  | 
| 62 | 
            +
            Our code is released under CC-BY-4.0 License.
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
