File size: 5,927 Bytes
7b7cd7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
# BirdNET Audio Prediction Script

This script loads a WAV file and uses the BirdNET ONNX model to predict bird species from audio recordings. It supports both single-window analysis (first 3 seconds) and moving window analysis (entire file) with species name mapping.

## Features

- **Species Name Mapping**: Uses `BirdNET_GLOBAL_6K_V2.4_Labels.txt` to display actual bird species names instead of class indices
- **Moving Window Analysis**: Analyzes entire audio files using overlapping 3-second windows
- **Single Window Mode**: Quick analysis of just the first 3 seconds
- **Configurable Parameters**: Adjustable confidence thresholds, overlap ratios, and result counts
- **Detection Summary**: Comprehensive overview of all detections with timestamps and confidence scores

## Requirements

- Python 3.7+
- The model expects audio input of exactly 3 seconds duration at 48kHz sample rate (144,000 samples)
- BirdNET labels file: `BirdNET_GLOBAL_6K_V2.4_Labels.txt`

## Installation

Install the required dependencies:

```bash
pip install -r requirements.txt
```

Required packages:

- `numpy>=1.21.0`
- `librosa>=0.9.0`
- `onnxruntime>=1.12.0`

## Usage

### Moving Window Analysis (Full File)

Analyze the entire audio file with overlapping windows:

```bash
python predict_audio.py audio.wav
```

### Single Window Analysis (First 3 seconds only)

Quick analysis of just the beginning:

```bash
python predict_audio.py audio.wav --single-window
```

### Advanced Usage Examples

```bash
# High sensitivity analysis with more results
python predict_audio.py audio.wav --confidence 0.1 --top-k 15

# Fine-grained analysis with 75% window overlap
python predict_audio.py audio.wav --overlap 0.75 --confidence 0.3

# Custom model and labels files
python predict_audio.py audio.wav --model custom_model.onnx --labels custom_labels.txt
```

### Command Line Arguments

- `audio_file`: Path to the WAV audio file (required)
- `--model`: Path to the ONNX model file (default: `model.onnx`)
- `--labels`: Path to the species labels file (default: `BirdNET_GLOBAL_6K_V2.4_Labels.txt`)
- `--top-k`: Number of top predictions to show (default: 5)
- `--overlap`: Window overlap ratio 0.0-1.0 (default: 0.5 = 50% overlap)
- `--confidence`: Minimum confidence threshold for detections (default: 0.1)
- `--batch-size`: Batch size for inference processing (default: 128)
- `--single-window`: Analyze only first 3 seconds instead of full file

## Output Examples

### Single Window Output

```
Loading labels from: BirdNET_GLOBAL_6K_V2.4_Labels.txt
Loaded 6522 species labels
Loading ONNX model: model.onnx
Loading first 3 seconds of audio file: bird_recording.wav
Audio loaded successfully. Shape: (144000,)
Running inference on single window...

Top 5 predictions for first 3 seconds:
 1. American Robin: 0.892456
 2. Song Sparrow: 0.234567
 3. House Finch: 0.123789
 4. Northern Cardinal: 0.089234
 5. Blue Jay: 0.056789
```

### Moving Window Output

```
Loading labels from: BirdNET_GLOBAL_6K_V2.4_Labels.txt
Loaded 6522 species labels
Loading ONNX model: model.onnx
Loading full audio file: long_recording.wav
Audio loaded successfully. Duration: 45.32 seconds
Creating windows with 50% overlap...
Created 28 windows of 3 seconds each
Running inference on all windows...
Processing window 1/28 (t=0.0s)
Processing window 11/28 (t=15.0s)
Processing window 21/28 (t=30.0s)
Completed inference on 28 windows
Analyzing detections with confidence threshold 0.1...

=== DETECTION SUMMARY ===
Audio duration: 45.32 seconds
Windows analyzed: 28
Species detected (>0.10 confidence): 4

Top detections:

American Robin
  Max confidence: 0.892456
  Detections: 12
  Time range: 0.0s - 18.0s
      1.5s: 0.892456
      3.0s: 0.845231
      4.5s: 0.723456

Song Sparrow
  Max confidence: 0.567890
  Detections: 6
  Time range: 22.5s - 36.0s
     24.0s: 0.567890
     25.5s: 0.445678
     27.0s: 0.334567

House Finch
  Max confidence: 0.345678
  Detections: 3
  Time range: 38.5s - 42.0s
     39.0s: 0.345678
```

## Technical Details

### Model Input/Output

- **Input**: Audio array of shape `[batch_size, 144000]` (3 seconds at 48kHz)
- **Output**: Classification scores for 6522 bird species

### Audio Preprocessing

The script automatically handles:

- Loading audio files with librosa (supports WAV, MP3, FLAC, etc.)
- Resampling to 48kHz if necessary
- Padding with zeros or truncating to exactly 3 seconds (144,000 samples)
- Converting to float32 format

### Moving Window Analysis

- Creates overlapping 3-second windows from the full audio
- Default 50% overlap means windows at 0s, 1.5s, 3s, 4.5s, etc.
- Higher overlap (e.g., 75%) provides more fine-grained analysis but takes longer
- Each window is analyzed independently, then results are aggregated

### Batch Processing

- Windows are processed in configurable batches (default: 128 windows per batch)
- Significantly improves performance by utilizing vectorized operations
- Automatically handles memory management and progress reporting
- Optimal batch size depends on available system memory and model complexity

### Species Labels

- Uses the official BirdNET labels file with 6522 species
- Format: `Scientific_name_Common Name` per line
- Script extracts and displays the common names (part after underscore)

## Performance Tips

- Use `--single-window` for quick identification of prominent species
- Increase `--overlap` (0.75-0.9) for detailed analysis of complex recordings
- Lower `--confidence` (0.05-0.1) to catch weaker signals
- Higher `--confidence` (0.3-0.5) for only very confident detections
- Use `--top-k 1` to see only the most confident detection per analysis
- **Batch Processing**: Default `--batch-size 128` provides optimal performance
  - Increase batch size (256, 512) if you have more GPU/RAM memory
  - Decrease batch size (32, 64) if you encounter memory issues
  - Batch processing significantly improves performance on longer audio files