justinchuby commited on
Commit
7b7cd7f
·
verified ·
1 Parent(s): 840da38

Upload folder using huggingface_hub

Browse files
Files changed (7) hide show
  1. BirdNET_GLOBAL_6K_V2.4_Labels.txt +0 -0
  2. LICENSE +19 -0
  3. README.md +9 -3
  4. USAGE.md +188 -0
  5. model.onnx +3 -0
  6. predict_audio.py +446 -0
  7. requirements.txt +3 -0
BirdNET_GLOBAL_6K_V2.4_Labels.txt ADDED
The diff for this file is too large to render. See raw diff
 
LICENSE ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Copyright (c) 2024 birdnet-team
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in all
11
+ copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19
+ SOFTWARE.
README.md CHANGED
@@ -1,3 +1,9 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # BirdNET ONNX
6
+
7
+ ONNX model converted and optimized from `BirdNET_GLOBAL_6K_V2.4_Model_FP32.tflite`.
8
+
9
+ Source: https://github.com/birdnet-team/BirdNET-Analyzer
USAGE.md ADDED
@@ -0,0 +1,188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # BirdNET Audio Prediction Script
2
+
3
+ This script loads a WAV file and uses the BirdNET ONNX model to predict bird species from audio recordings. It supports both single-window analysis (first 3 seconds) and moving window analysis (entire file) with species name mapping.
4
+
5
+ ## Features
6
+
7
+ - **Species Name Mapping**: Uses `BirdNET_GLOBAL_6K_V2.4_Labels.txt` to display actual bird species names instead of class indices
8
+ - **Moving Window Analysis**: Analyzes entire audio files using overlapping 3-second windows
9
+ - **Single Window Mode**: Quick analysis of just the first 3 seconds
10
+ - **Configurable Parameters**: Adjustable confidence thresholds, overlap ratios, and result counts
11
+ - **Detection Summary**: Comprehensive overview of all detections with timestamps and confidence scores
12
+
13
+ ## Requirements
14
+
15
+ - Python 3.7+
16
+ - The model expects audio input of exactly 3 seconds duration at 48kHz sample rate (144,000 samples)
17
+ - BirdNET labels file: `BirdNET_GLOBAL_6K_V2.4_Labels.txt`
18
+
19
+ ## Installation
20
+
21
+ Install the required dependencies:
22
+
23
+ ```bash
24
+ pip install -r requirements.txt
25
+ ```
26
+
27
+ Required packages:
28
+
29
+ - `numpy>=1.21.0`
30
+ - `librosa>=0.9.0`
31
+ - `onnxruntime>=1.12.0`
32
+
33
+ ## Usage
34
+
35
+ ### Moving Window Analysis (Full File)
36
+
37
+ Analyze the entire audio file with overlapping windows:
38
+
39
+ ```bash
40
+ python predict_audio.py audio.wav
41
+ ```
42
+
43
+ ### Single Window Analysis (First 3 seconds only)
44
+
45
+ Quick analysis of just the beginning:
46
+
47
+ ```bash
48
+ python predict_audio.py audio.wav --single-window
49
+ ```
50
+
51
+ ### Advanced Usage Examples
52
+
53
+ ```bash
54
+ # High sensitivity analysis with more results
55
+ python predict_audio.py audio.wav --confidence 0.1 --top-k 15
56
+
57
+ # Fine-grained analysis with 75% window overlap
58
+ python predict_audio.py audio.wav --overlap 0.75 --confidence 0.3
59
+
60
+ # Custom model and labels files
61
+ python predict_audio.py audio.wav --model custom_model.onnx --labels custom_labels.txt
62
+ ```
63
+
64
+ ### Command Line Arguments
65
+
66
+ - `audio_file`: Path to the WAV audio file (required)
67
+ - `--model`: Path to the ONNX model file (default: `model.onnx`)
68
+ - `--labels`: Path to the species labels file (default: `BirdNET_GLOBAL_6K_V2.4_Labels.txt`)
69
+ - `--top-k`: Number of top predictions to show (default: 5)
70
+ - `--overlap`: Window overlap ratio 0.0-1.0 (default: 0.5 = 50% overlap)
71
+ - `--confidence`: Minimum confidence threshold for detections (default: 0.1)
72
+ - `--batch-size`: Batch size for inference processing (default: 128)
73
+ - `--single-window`: Analyze only first 3 seconds instead of full file
74
+
75
+ ## Output Examples
76
+
77
+ ### Single Window Output
78
+
79
+ ```
80
+ Loading labels from: BirdNET_GLOBAL_6K_V2.4_Labels.txt
81
+ Loaded 6522 species labels
82
+ Loading ONNX model: model.onnx
83
+ Loading first 3 seconds of audio file: bird_recording.wav
84
+ Audio loaded successfully. Shape: (144000,)
85
+ Running inference on single window...
86
+
87
+ Top 5 predictions for first 3 seconds:
88
+ 1. American Robin: 0.892456
89
+ 2. Song Sparrow: 0.234567
90
+ 3. House Finch: 0.123789
91
+ 4. Northern Cardinal: 0.089234
92
+ 5. Blue Jay: 0.056789
93
+ ```
94
+
95
+ ### Moving Window Output
96
+
97
+ ```
98
+ Loading labels from: BirdNET_GLOBAL_6K_V2.4_Labels.txt
99
+ Loaded 6522 species labels
100
+ Loading ONNX model: model.onnx
101
+ Loading full audio file: long_recording.wav
102
+ Audio loaded successfully. Duration: 45.32 seconds
103
+ Creating windows with 50% overlap...
104
+ Created 28 windows of 3 seconds each
105
+ Running inference on all windows...
106
+ Processing window 1/28 (t=0.0s)
107
+ Processing window 11/28 (t=15.0s)
108
+ Processing window 21/28 (t=30.0s)
109
+ Completed inference on 28 windows
110
+ Analyzing detections with confidence threshold 0.1...
111
+
112
+ === DETECTION SUMMARY ===
113
+ Audio duration: 45.32 seconds
114
+ Windows analyzed: 28
115
+ Species detected (>0.10 confidence): 4
116
+
117
+ Top detections:
118
+
119
+ American Robin
120
+ Max confidence: 0.892456
121
+ Detections: 12
122
+ Time range: 0.0s - 18.0s
123
+ 1.5s: 0.892456
124
+ 3.0s: 0.845231
125
+ 4.5s: 0.723456
126
+
127
+ Song Sparrow
128
+ Max confidence: 0.567890
129
+ Detections: 6
130
+ Time range: 22.5s - 36.0s
131
+ 24.0s: 0.567890
132
+ 25.5s: 0.445678
133
+ 27.0s: 0.334567
134
+
135
+ House Finch
136
+ Max confidence: 0.345678
137
+ Detections: 3
138
+ Time range: 38.5s - 42.0s
139
+ 39.0s: 0.345678
140
+ ```
141
+
142
+ ## Technical Details
143
+
144
+ ### Model Input/Output
145
+
146
+ - **Input**: Audio array of shape `[batch_size, 144000]` (3 seconds at 48kHz)
147
+ - **Output**: Classification scores for 6522 bird species
148
+
149
+ ### Audio Preprocessing
150
+
151
+ The script automatically handles:
152
+
153
+ - Loading audio files with librosa (supports WAV, MP3, FLAC, etc.)
154
+ - Resampling to 48kHz if necessary
155
+ - Padding with zeros or truncating to exactly 3 seconds (144,000 samples)
156
+ - Converting to float32 format
157
+
158
+ ### Moving Window Analysis
159
+
160
+ - Creates overlapping 3-second windows from the full audio
161
+ - Default 50% overlap means windows at 0s, 1.5s, 3s, 4.5s, etc.
162
+ - Higher overlap (e.g., 75%) provides more fine-grained analysis but takes longer
163
+ - Each window is analyzed independently, then results are aggregated
164
+
165
+ ### Batch Processing
166
+
167
+ - Windows are processed in configurable batches (default: 128 windows per batch)
168
+ - Significantly improves performance by utilizing vectorized operations
169
+ - Automatically handles memory management and progress reporting
170
+ - Optimal batch size depends on available system memory and model complexity
171
+
172
+ ### Species Labels
173
+
174
+ - Uses the official BirdNET labels file with 6522 species
175
+ - Format: `Scientific_name_Common Name` per line
176
+ - Script extracts and displays the common names (part after underscore)
177
+
178
+ ## Performance Tips
179
+
180
+ - Use `--single-window` for quick identification of prominent species
181
+ - Increase `--overlap` (0.75-0.9) for detailed analysis of complex recordings
182
+ - Lower `--confidence` (0.05-0.1) to catch weaker signals
183
+ - Higher `--confidence` (0.3-0.5) for only very confident detections
184
+ - Use `--top-k 1` to see only the most confident detection per analysis
185
+ - **Batch Processing**: Default `--batch-size 128` provides optimal performance
186
+ - Increase batch size (256, 512) if you have more GPU/RAM memory
187
+ - Decrease batch size (32, 64) if you encounter memory issues
188
+ - Batch processing significantly improves performance on longer audio files
model.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:03f58f51eec117866e4896ceb90dda4723d3d3d9eb9a3be0e82a6e626274ce40
3
+ size 51722453
predict_audio.py ADDED
@@ -0,0 +1,446 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """BirdNET Audio Classification Script
3
+
4
+ This script loads a WAV file and uses the BirdNET ONNX model to predict bird species.
5
+ The model expects audio input of shape [batch_size, 144000] (3 seconds at 48kHz).
6
+
7
+ Created using Copilot.
8
+ """
9
+
10
+ from __future__ import annotations
11
+
12
+ import numpy as np
13
+ import librosa
14
+ import onnxruntime as ort
15
+ import argparse
16
+ import os
17
+ from collections import defaultdict
18
+
19
+
20
+ def load_audio(
21
+ file_path: str, target_sr: int = 48000, duration: float = 3.0
22
+ ) -> np.ndarray:
23
+ """
24
+ Load and preprocess audio file for BirdNET model.
25
+
26
+ Args:
27
+ file_path (str): Path to the audio file
28
+ target_sr (int): Target sample rate (48kHz for BirdNET)
29
+ duration (float): Duration in seconds (3.0 for BirdNET)
30
+
31
+ Returns:
32
+ np.ndarray: Preprocessed audio array of shape [144000]
33
+ """
34
+ try:
35
+ # Load audio file
36
+ audio, sr = librosa.load(file_path, sr=target_sr, duration=duration)
37
+
38
+ # Ensure we have exactly 144000 samples (3 seconds at 48kHz)
39
+ target_length = int(target_sr * duration)
40
+
41
+ if len(audio) < target_length:
42
+ # Pad with zeros if too short
43
+ audio = np.pad(audio, (0, target_length - len(audio)))
44
+ elif len(audio) > target_length:
45
+ # Truncate if too long
46
+ audio = audio[:target_length]
47
+
48
+ return audio.astype(np.float32)
49
+
50
+ except Exception as e:
51
+ raise RuntimeError(f"Error loading audio file {file_path}: {str(e)}")
52
+
53
+
54
+ def load_labels(labels_path: str) -> list[str]:
55
+ """
56
+ Load BirdNET species labels from the labels file.
57
+
58
+ Args:
59
+ labels_path (str): Path to the labels file
60
+
61
+ Returns:
62
+ list[str]: List of species names
63
+ """
64
+ try:
65
+ labels = []
66
+ with open(labels_path, "r", encoding="utf-8") as f:
67
+ for line in f:
68
+ line = line.strip()
69
+ if line:
70
+ # Format: "Scientific_name_Common Name"
71
+ # Extract the common name part after the underscore
72
+ if "_" in line:
73
+ common_name = line.split("_", 1)[1]
74
+ labels.append(common_name)
75
+ else:
76
+ labels.append(line)
77
+ return labels
78
+ except Exception as e:
79
+ raise RuntimeError(f"Error loading labels file {labels_path}: {str(e)}")
80
+
81
+
82
+ def load_audio_full(file_path: str, target_sr: int = 48000) -> np.ndarray:
83
+ """
84
+ Load full audio file for moving window analysis.
85
+
86
+ Args:
87
+ file_path (str): Path to the audio file
88
+ target_sr (int): Target sample rate (48kHz for BirdNET)
89
+
90
+ Returns:
91
+ np.ndarray: Full audio array
92
+ """
93
+ try:
94
+ # Load entire audio file
95
+ audio, sr = librosa.load(file_path, sr=target_sr)
96
+ return audio.astype(np.float32)
97
+ except Exception as e:
98
+ raise RuntimeError(f"Error loading audio file {file_path}: {str(e)}")
99
+
100
+
101
+ def create_audio_windows(
102
+ audio: np.ndarray, window_size: int = 144000, overlap: float = 0.5
103
+ ) -> tuple[np.ndarray, list[float]]:
104
+ """
105
+ Create overlapping windows from audio for analysis.
106
+
107
+ Args:
108
+ audio (np.ndarray): Full audio array
109
+ window_size (int): Size of each window (144000 for 3 seconds at 48kHz)
110
+ overlap (float): Overlap ratio (0.5 = 50% overlap)
111
+
112
+ Returns:
113
+ tuple[np.ndarray, list[float]]: (windows array, timestamps)
114
+ """
115
+ step_size = int(window_size * (1 - overlap))
116
+ windows = []
117
+ timestamps = []
118
+
119
+ for start in range(0, len(audio) - window_size + 1, step_size):
120
+ end = start + window_size
121
+ window = audio[start:end]
122
+
123
+ # Ensure window is exactly the right size
124
+ if len(window) == window_size:
125
+ windows.append(window)
126
+ # Calculate timestamp in seconds
127
+ timestamps.append(start / 48000.0)
128
+
129
+ return np.array(windows), timestamps
130
+
131
+
132
+ def load_onnx_model(model_path: str) -> ort.InferenceSession:
133
+ """
134
+ Load ONNX model for inference.
135
+
136
+ Args:
137
+ model_path (str): Path to the ONNX model file
138
+
139
+ Returns:
140
+ ort.InferenceSession: Loaded ONNX model session
141
+ """
142
+ try:
143
+ # Create inference session
144
+ session = ort.InferenceSession(model_path)
145
+ return session
146
+
147
+ except Exception as e:
148
+ raise RuntimeError(f"Error loading ONNX model {model_path}: {str(e)}")
149
+
150
+
151
+ def predict_audio(session: ort.InferenceSession, audio_data: np.ndarray) -> np.ndarray:
152
+ """
153
+ Run inference on audio data using the ONNX model.
154
+
155
+ Args:
156
+ session (ort.InferenceSession): ONNX model session
157
+ audio_data (np.ndarray): Audio data of shape [144000] or [batch, 144000]
158
+
159
+ Returns:
160
+ np.ndarray: Model predictions
161
+ """
162
+ try:
163
+ # Ensure we have batch dimension
164
+ if len(audio_data.shape) == 1:
165
+ input_data = np.expand_dims(audio_data, axis=0)
166
+ else:
167
+ input_data = audio_data
168
+
169
+ # Get input name from the model
170
+ input_name = session.get_inputs()[0].name
171
+
172
+ # Run inference
173
+ outputs = session.run(None, {input_name: input_data})
174
+
175
+ return outputs[0]
176
+
177
+ except Exception as e:
178
+ raise RuntimeError(f"Error during model inference: {str(e)}")
179
+
180
+
181
+ def predict_audio_batch(
182
+ session: ort.InferenceSession,
183
+ windows_batch: np.ndarray,
184
+ batch_size: int = 128,
185
+ show_progress: bool = True,
186
+ ) -> np.ndarray:
187
+ """
188
+ Run inference on batches of audio windows for better performance.
189
+
190
+ Args:
191
+ session (ort.InferenceSession): ONNX model session
192
+ windows_batch (np.ndarray): Array of windows, shape [num_windows, 144000]
193
+ batch_size (int): Number of windows to process in each batch
194
+ show_progress (bool): Whether to show progress updates
195
+
196
+ Returns:
197
+ np.ndarray: All predictions concatenated, shape [num_windows, num_classes]
198
+ """
199
+ try:
200
+ all_predictions = []
201
+ num_windows = len(windows_batch)
202
+
203
+ # Get input name from the model
204
+ input_name = session.get_inputs()[0].name
205
+
206
+ # Process in batches
207
+ batch_num = 0
208
+ for start_idx in range(0, num_windows, batch_size):
209
+ end_idx = min(start_idx + batch_size, num_windows)
210
+ current_batch = windows_batch[start_idx:end_idx]
211
+ batch_num += 1
212
+
213
+ if show_progress and (batch_num % 5 == 0 or batch_num == 1):
214
+ progress = (end_idx / num_windows) * 100
215
+ print(
216
+ f" Batch {batch_num}: processing windows {start_idx + 1}-{end_idx} ({progress:.1f}%)"
217
+ )
218
+
219
+ # Run inference on current batch
220
+ outputs = session.run(None, {input_name: current_batch})
221
+ batch_predictions = outputs[0]
222
+
223
+ all_predictions.append(batch_predictions)
224
+
225
+ # Concatenate all batch results
226
+ return np.concatenate(all_predictions, axis=0)
227
+
228
+ except Exception as e:
229
+ raise RuntimeError(f"Error during batch model inference: {str(e)}")
230
+
231
+
232
+ def analyze_detections(
233
+ all_predictions: np.ndarray,
234
+ timestamps: list[float],
235
+ labels: list[str],
236
+ confidence_threshold: float = 0.1,
237
+ ) -> dict[str, list[dict[str, float | int]]]:
238
+ """
239
+ Analyze predictions across all windows and summarize detections.
240
+
241
+ Args:
242
+ all_predictions (np.ndarray): Predictions from all windows, shape [num_windows, num_classes]
243
+ timestamps (list[float]): Timestamps for each window
244
+ labels (list[str]): Species labels
245
+ confidence_threshold (float): Minimum confidence for detection
246
+
247
+ Returns:
248
+ dict[str, list[dict[str, float | int]]]: Summary of detections with timestamps
249
+ """
250
+ detections = defaultdict(list)
251
+
252
+ # all_predictions is now shape [num_windows, num_classes] from batch processing
253
+ for i, (predictions, timestamp) in enumerate(zip(all_predictions, timestamps)):
254
+ # predictions is now a 1D array of scores for this window
255
+ scores = predictions
256
+
257
+ # Find all detections above threshold
258
+ above_threshold = np.where(scores > confidence_threshold)[0]
259
+
260
+ for idx in above_threshold:
261
+ confidence = float(scores[idx])
262
+ species_name = labels[idx] if idx < len(labels) else f"Class {idx}"
263
+
264
+ detections[species_name].append(
265
+ {"timestamp": timestamp, "confidence": confidence, "window": i}
266
+ )
267
+
268
+ return dict(detections)
269
+
270
+
271
+ def main() -> int:
272
+ parser = argparse.ArgumentParser(
273
+ description="BirdNET Audio Classification with Moving Window"
274
+ )
275
+ parser.add_argument("audio_file", help="Path to the WAV audio file")
276
+ parser.add_argument(
277
+ "--model", default="model.onnx", help="Path to the ONNX model file"
278
+ )
279
+ parser.add_argument(
280
+ "--labels",
281
+ default="BirdNET_GLOBAL_6K_V2.4_Labels.txt",
282
+ help="Path to the labels file",
283
+ )
284
+ parser.add_argument(
285
+ "--top-k",
286
+ type=int,
287
+ default=5,
288
+ help="Number of top predictions to show per window",
289
+ )
290
+ parser.add_argument(
291
+ "--overlap", type=float, default=0.5, help="Window overlap ratio (0.0-1.0)"
292
+ )
293
+ parser.add_argument(
294
+ "--confidence",
295
+ type=float,
296
+ default=0.1,
297
+ help="Minimum confidence threshold for detections",
298
+ )
299
+ parser.add_argument(
300
+ "--batch-size",
301
+ type=int,
302
+ default=128,
303
+ help="Batch size for inference (default: 128)",
304
+ )
305
+ parser.add_argument(
306
+ "--single-window",
307
+ action="store_true",
308
+ help="Analyze only first 3 seconds (single window)",
309
+ )
310
+
311
+ args = parser.parse_args()
312
+
313
+ # Check if files exist
314
+ if not os.path.exists(args.audio_file):
315
+ print(f"Error: Audio file '{args.audio_file}' not found.")
316
+ return 1
317
+
318
+ if not os.path.exists(args.model):
319
+ print(f"Error: Model file '{args.model}' not found.")
320
+ return 1
321
+
322
+ if not os.path.exists(args.labels):
323
+ print(f"Error: Labels file '{args.labels}' not found.")
324
+ return 1
325
+
326
+ try:
327
+ # Load labels
328
+ print(f"Loading labels from: {args.labels}")
329
+ labels = load_labels(args.labels)
330
+ print(f"Loaded {len(labels)} species labels")
331
+
332
+ # Load ONNX model
333
+ print(f"Loading ONNX model: {args.model}")
334
+ session = load_onnx_model(args.model)
335
+
336
+ # Print model info
337
+ input_info = session.get_inputs()[0]
338
+ output_info = session.get_outputs()[0]
339
+ print(f"Model input: {input_info.name}, shape: {input_info.shape}")
340
+ print(f"Model output: {output_info.name}, shape: {output_info.shape}")
341
+
342
+ if args.single_window:
343
+ # Single window analysis (original behavior)
344
+ print(f"Loading first 3 seconds of audio file: {args.audio_file}")
345
+ audio_data = load_audio(args.audio_file)
346
+ print(f"Audio loaded successfully. Shape: {audio_data.shape}")
347
+
348
+ print("Running inference on single window...")
349
+ predictions = predict_audio(session, audio_data)
350
+
351
+ # Get scores
352
+ predictions = np.array(predictions)
353
+ if len(predictions.shape) > 1:
354
+ scores = predictions[0]
355
+ else:
356
+ scores = predictions
357
+
358
+ # Get top-k predictions
359
+ top_indices = np.argsort(scores)[-args.top_k :][::-1]
360
+
361
+ print(f"\nTop {args.top_k} predictions for first 3 seconds:")
362
+ for i, idx in enumerate(top_indices):
363
+ confidence = float(scores[idx])
364
+ species_name = labels[idx] if idx < len(labels) else f"Class {idx}"
365
+ print(f"{i + 1:2d}. {species_name}: {confidence:.6f}")
366
+
367
+ else:
368
+ # Moving window analysis
369
+ print(f"Loading full audio file: {args.audio_file}")
370
+ full_audio = load_audio_full(args.audio_file)
371
+ audio_duration = len(full_audio) / 48000.0
372
+ print(f"Audio loaded successfully. Duration: {audio_duration:.2f} seconds")
373
+
374
+ # Create windows
375
+ print(f"Creating windows with {args.overlap * 100:.0f}% overlap...")
376
+ windows, timestamps = create_audio_windows(full_audio, overlap=args.overlap)
377
+ print(f"Created {len(windows)} windows of 3 seconds each")
378
+
379
+ # Run batch inference on all windows
380
+ print(
381
+ f"Running batch inference on {len(windows)} windows (batch size: {args.batch_size})..."
382
+ )
383
+ num_batches = (len(windows) + args.batch_size - 1) // args.batch_size
384
+ print(f"Processing {num_batches} batches...")
385
+
386
+ # Use batch prediction for better performance
387
+ all_predictions = predict_audio_batch(session, windows, args.batch_size)
388
+ print(f"Completed batch inference on {len(windows)} windows")
389
+
390
+ # Analyze detections across all windows
391
+ print(
392
+ f"Analyzing detections with confidence threshold {args.confidence}..."
393
+ )
394
+ detections = analyze_detections(
395
+ all_predictions, timestamps, labels, args.confidence
396
+ )
397
+
398
+ # Sort species by maximum confidence
399
+ sorted_species = sorted(
400
+ detections.items(),
401
+ key=lambda x: max(det["confidence"] for det in x[1]),
402
+ reverse=True,
403
+ )
404
+
405
+ print("\n=== DETECTION SUMMARY ===")
406
+ print(f"Audio duration: {audio_duration:.2f} seconds")
407
+ print(f"Windows analyzed: {len(windows)}")
408
+ print(
409
+ f"Species detected (>{args.confidence:.2f} confidence): {len(sorted_species)}"
410
+ )
411
+
412
+ if sorted_species:
413
+ print("\nTop detections:")
414
+ for species, detections_list in sorted_species[: args.top_k]:
415
+ max_conf = max(det["confidence"] for det in detections_list)
416
+ num_detections = len(detections_list)
417
+ first_detection = min(det["timestamp"] for det in detections_list)
418
+ last_detection = max(det["timestamp"] for det in detections_list)
419
+
420
+ print(f"\n{species}")
421
+ print(f" Max confidence: {max_conf:.6f}")
422
+ print(f" Detections: {num_detections}")
423
+ print(
424
+ f" Time range: {first_detection:.1f}s - {last_detection:.1f}s"
425
+ )
426
+
427
+ # Show strongest detections for this species
428
+ strong_detections = sorted(
429
+ detections_list, key=lambda x: x["confidence"], reverse=True
430
+ )[:3]
431
+ for det in strong_detections:
432
+ print(f" {det['timestamp']:6.1f}s: {det['confidence']:.6f}")
433
+ else:
434
+ print(
435
+ f"No detections found above confidence threshold {args.confidence}"
436
+ )
437
+
438
+ return 0
439
+
440
+ except Exception as e:
441
+ print(f"Error: {str(e)}")
442
+ return 1
443
+
444
+
445
+ if __name__ == "__main__":
446
+ exit(main())
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ numpy>=1.21.0
2
+ librosa>=0.9.0
3
+ onnxruntime>=1.20.0