File size: 5,927 Bytes
7b7cd7f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 |
# BirdNET Audio Prediction Script
This script loads a WAV file and uses the BirdNET ONNX model to predict bird species from audio recordings. It supports both single-window analysis (first 3 seconds) and moving window analysis (entire file) with species name mapping.
## Features
- **Species Name Mapping**: Uses `BirdNET_GLOBAL_6K_V2.4_Labels.txt` to display actual bird species names instead of class indices
- **Moving Window Analysis**: Analyzes entire audio files using overlapping 3-second windows
- **Single Window Mode**: Quick analysis of just the first 3 seconds
- **Configurable Parameters**: Adjustable confidence thresholds, overlap ratios, and result counts
- **Detection Summary**: Comprehensive overview of all detections with timestamps and confidence scores
## Requirements
- Python 3.7+
- The model expects audio input of exactly 3 seconds duration at 48kHz sample rate (144,000 samples)
- BirdNET labels file: `BirdNET_GLOBAL_6K_V2.4_Labels.txt`
## Installation
Install the required dependencies:
```bash
pip install -r requirements.txt
```
Required packages:
- `numpy>=1.21.0`
- `librosa>=0.9.0`
- `onnxruntime>=1.12.0`
## Usage
### Moving Window Analysis (Full File)
Analyze the entire audio file with overlapping windows:
```bash
python predict_audio.py audio.wav
```
### Single Window Analysis (First 3 seconds only)
Quick analysis of just the beginning:
```bash
python predict_audio.py audio.wav --single-window
```
### Advanced Usage Examples
```bash
# High sensitivity analysis with more results
python predict_audio.py audio.wav --confidence 0.1 --top-k 15
# Fine-grained analysis with 75% window overlap
python predict_audio.py audio.wav --overlap 0.75 --confidence 0.3
# Custom model and labels files
python predict_audio.py audio.wav --model custom_model.onnx --labels custom_labels.txt
```
### Command Line Arguments
- `audio_file`: Path to the WAV audio file (required)
- `--model`: Path to the ONNX model file (default: `model.onnx`)
- `--labels`: Path to the species labels file (default: `BirdNET_GLOBAL_6K_V2.4_Labels.txt`)
- `--top-k`: Number of top predictions to show (default: 5)
- `--overlap`: Window overlap ratio 0.0-1.0 (default: 0.5 = 50% overlap)
- `--confidence`: Minimum confidence threshold for detections (default: 0.1)
- `--batch-size`: Batch size for inference processing (default: 128)
- `--single-window`: Analyze only first 3 seconds instead of full file
## Output Examples
### Single Window Output
```
Loading labels from: BirdNET_GLOBAL_6K_V2.4_Labels.txt
Loaded 6522 species labels
Loading ONNX model: model.onnx
Loading first 3 seconds of audio file: bird_recording.wav
Audio loaded successfully. Shape: (144000,)
Running inference on single window...
Top 5 predictions for first 3 seconds:
1. American Robin: 0.892456
2. Song Sparrow: 0.234567
3. House Finch: 0.123789
4. Northern Cardinal: 0.089234
5. Blue Jay: 0.056789
```
### Moving Window Output
```
Loading labels from: BirdNET_GLOBAL_6K_V2.4_Labels.txt
Loaded 6522 species labels
Loading ONNX model: model.onnx
Loading full audio file: long_recording.wav
Audio loaded successfully. Duration: 45.32 seconds
Creating windows with 50% overlap...
Created 28 windows of 3 seconds each
Running inference on all windows...
Processing window 1/28 (t=0.0s)
Processing window 11/28 (t=15.0s)
Processing window 21/28 (t=30.0s)
Completed inference on 28 windows
Analyzing detections with confidence threshold 0.1...
=== DETECTION SUMMARY ===
Audio duration: 45.32 seconds
Windows analyzed: 28
Species detected (>0.10 confidence): 4
Top detections:
American Robin
Max confidence: 0.892456
Detections: 12
Time range: 0.0s - 18.0s
1.5s: 0.892456
3.0s: 0.845231
4.5s: 0.723456
Song Sparrow
Max confidence: 0.567890
Detections: 6
Time range: 22.5s - 36.0s
24.0s: 0.567890
25.5s: 0.445678
27.0s: 0.334567
House Finch
Max confidence: 0.345678
Detections: 3
Time range: 38.5s - 42.0s
39.0s: 0.345678
```
## Technical Details
### Model Input/Output
- **Input**: Audio array of shape `[batch_size, 144000]` (3 seconds at 48kHz)
- **Output**: Classification scores for 6522 bird species
### Audio Preprocessing
The script automatically handles:
- Loading audio files with librosa (supports WAV, MP3, FLAC, etc.)
- Resampling to 48kHz if necessary
- Padding with zeros or truncating to exactly 3 seconds (144,000 samples)
- Converting to float32 format
### Moving Window Analysis
- Creates overlapping 3-second windows from the full audio
- Default 50% overlap means windows at 0s, 1.5s, 3s, 4.5s, etc.
- Higher overlap (e.g., 75%) provides more fine-grained analysis but takes longer
- Each window is analyzed independently, then results are aggregated
### Batch Processing
- Windows are processed in configurable batches (default: 128 windows per batch)
- Significantly improves performance by utilizing vectorized operations
- Automatically handles memory management and progress reporting
- Optimal batch size depends on available system memory and model complexity
### Species Labels
- Uses the official BirdNET labels file with 6522 species
- Format: `Scientific_name_Common Name` per line
- Script extracts and displays the common names (part after underscore)
## Performance Tips
- Use `--single-window` for quick identification of prominent species
- Increase `--overlap` (0.75-0.9) for detailed analysis of complex recordings
- Lower `--confidence` (0.05-0.1) to catch weaker signals
- Higher `--confidence` (0.3-0.5) for only very confident detections
- Use `--top-k 1` to see only the most confident detection per analysis
- **Batch Processing**: Default `--batch-size 128` provides optimal performance
- Increase batch size (256, 512) if you have more GPU/RAM memory
- Decrease batch size (32, 64) if you encounter memory issues
- Batch processing significantly improves performance on longer audio files
|