Spaces:

xray918
/

my_gradio

Runtime error

App Files Files Community

my_gradio / guides /07_streaming /02_object-detection-from-webcam-with-webrtc.md

xray918

Upload folder using huggingface_hub

0ad74ed verified about 1 year ago

preview code

raw

history blame contribute delete

5.18 kB

	# Real Time Object Detection from a Webcam Stream with WebRTC

	Tags: VISION, STREAMING, WEBCAM

	In this guide, we'll use YOLOv10 to perform real-time object detection in Gradio from a user's webcam feed. We'll utilize the latest streaming features introduced in Gradio 5.0. You can see the finished product in action below:

	<video src="https://github.com/user-attachments/assets/4584cec6-8c1a-401b-9b61-a4fe0718b558" controls
	height="600" width="600" style="display: block; margin: auto;" autoplay="true" loop="true">
	</video>

	## Setting up

	Start by installing all the dependencies. Add the following lines to a `requirements.txt` file and run `pip install -r requirements.txt`:

	```bash
	opencv-python
	twilio
	gradio>=5.0
	gradio-webrtc
	onnxruntime-gpu
	```

	We'll use the ONNX runtime to speed up YOLOv10 inference. This guide assumes you have access to a GPU. If you don't, change `onnxruntime-gpu` to `onnxruntime`. Without a GPU, the model will run slower, resulting in a laggy demo.

	We'll use OpenCV for image manipulation and the [Gradio WebRTC](https://github.com/freddyaboulton/gradio-webrtc) custom component to use [WebRTC](https://webrtc.org/) under the hood, achieving near-zero latency.

	Note: If you want to deploy this app on any cloud provider, you'll need to use the free Twilio API for their [TURN servers](https://www.twilio.com/docs/stun-turn). Create a free account on Twilio. If you're not familiar with TURN servers, consult this [guide](https://www.twilio.com/docs/stun-turn/faq#faq-what-is-nat).

	## The Inference Function

	We'll download the YOLOv10 model from the Hugging Face hub and instantiate a custom inference class to use this model.

	The implementation of the inference class isn't covered in this guide, but you can find the source code [here](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n/blob/main/inference.py#L9) if you're interested. This implementation borrows heavily from this [github repository](https://github.com/ibaiGorordo/ONNX-YOLOv8-Object-Detection).

	We're using the `yolov10-n` variant because it has the lowest latency. See the [Performance](https://github.com/THU-MIG/yolov10?tab=readme-ov-file#performance) section of the README in the YOLOv10 GitHub repository.

	```python
	from huggingface_hub import hf_hub_download
	from inference import YOLOv10

	model_file = hf_hub_download(
	repo_id="onnx-community/yolov10n", filename="onnx/model.onnx"
	)

	model = YOLOv10(model_file)

	def detection(image, conf_threshold=0.3):
	image = cv2.resize(image, (model.input_width, model.input_height))
	new_image = model.detect_objects(image, conf_threshold)
	return new_image
	```

	Our inference function, `detection`, accepts a numpy array from the webcam and a desired confidence threshold. Object detection models like YOLO identify many objects and assign a confidence score to each. The lower the confidence, the higher the chance of a false positive. We'll let users adjust the confidence threshold.

	The function returns a numpy array corresponding to the same input image with all detected objects in bounding boxes.

	## The Gradio Demo

	The Gradio demo is straightforward, but we'll implement a few specific features:

	1. Use the `WebRTC` custom component to ensure input and output are sent to/from the server with WebRTC.
	2. The [WebRTC](https://github.com/freddyaboulton/gradio-webrtc) component will serve as both an input and output component.
	3. Utilize the `time_limit` parameter of the `stream` event. This parameter sets a processing time for each user's stream. In a multi-user setting, such as on Spaces, we'll stop processing the current user's stream after this period and move on to the next.

	We'll also apply custom CSS to center the webcam and slider on the page.

	```python
	import gradio as gr
	from gradio_webrtc import WebRTC

	css = """.my-group {max-width: 600px !important; max-height: 600px !important;}
	.my-column {display: flex !important; justify-content: center !important; align-items: center !important;}"""

	with gr.Blocks(css=css) as demo:
	gr.HTML(
	"""
	<h1 style='text-align: center'>
	YOLOv10 Webcam Stream (Powered by WebRTC ⚡️)
	</h1>
	"""
	)
	with gr.Column(elem_classes=["my-column"]):
	with gr.Group(elem_classes=["my-group"]):
	image = WebRTC(label="Stream", rtc_configuration=rtc_configuration)
	conf_threshold = gr.Slider(
	label="Confidence Threshold",
	minimum=0.0,
	maximum=1.0,
	step=0.05,
	value=0.30,
	)

	image.stream(
	fn=detection, inputs=[image, conf_threshold], outputs=[image], time_limit=10
	)

	if __name__ == "__main__":
	demo.launch()
	```

	## Conclusion

	Our app is hosted on Hugging Face Spaces [here](https://huggingface.co/spaces/freddyaboulton/webrtc-yolov10n).

	You can use this app as a starting point to build real-time image applications with Gradio. Don't hesitate to open issues in the space or in the [WebRTC component GitHub repo](https://github.com/freddyaboulton/gradio-webrtc) if you have any questions or encounter problems.