pion / mediadevices Goto Github PK
View Code? Open in Web Editor NEWGo implementation of the MediaDevices API.
Home Page: https://pion.ly/
License: MIT License
Go implementation of the MediaDevices API.
Home Page: https://pion.ly/
License: MIT License
In Darwin, we can get access to the camera through https://developer.apple.com/av-foundation/. To access this API, we need to use either Swift/Objective C. In fact, there's a project that has done this before (https://github.com/dialup-inc/ascii/blob/master/camera/cam_avfoundation.mm). So, we can probably simply learn how they did it and adapt it to this project.
When calling GetUserMedia with audio and video constraints, if one of audio/video failed to be initialized, other one is left opened. Then, the next GetUserMedia call always fail with invalid state: driver is already opened
.
Receive RTP from WebRTC remote track, decode, process (by Audio/VideoTransform
), and encode.
This realizes tiling (or picture-in-picture) of multiple streams into one stream, and audio mixing.
Due to the differences between audio and video and lack of generics in Go, there are many duplicated codes in https://github.com/pion/mediadevices/blob/38deddc4f0bb0ceae8391ad33b474c3ecdb0c267/track.go.
Although this is ok, it's still better to reduce duplicated codes so that it's easier to maintain.
Select a fallback codec implementation if the first one was failed to initialize.
For example with the case of vaapi (higher priority) and vpx, try vaapi first, and if the environment doesn't have video acceleration hardware, use vpx.
It could be something like:
codec.Register(webrtc.VP8, codec.VideoEncoderFallbacks(
codec.VideoEncoderBuilder(vaapi.NewVP8Encoder),
codec.VideoEncoderBuilder(vpx.NewVP8Encoder),
)
One problem is that user code doesn't have a way to know which implementation is used, so it's difficult to pass codec specific parameter if using fallback codec.
Adding prop.Codec.ImplementationName string
and pass multiple CodecParams
for each ImplementationName
as map[string]interface{}
?
Currently, there is a randomness of audio device selection. (since Go map is unordered.)
Audio source device may contains monitor
which is a loopback of audio output.
Adding little bit higher priority to non-monitor device and/or system default device might stabilize selection and suit for typical use.
In WebAPI, media setting can be updated by applyConstraints()
.
https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrack/applyConstraints
Example:
Track (ID: video, Label: 10afd1bd-dee0-452a-85eb-b49cbea60194) ended with error: EOF
Pixel format is scored by complete match at now.
For example, I want to select one of YUY2
and UYVY
but not want to use JPEG
.
Lines 99 to 112 in 949e850
Any other improvements are welcomed!
As of now, our video encoders, openh264 and vpx, only support I420 format. Therefore, we have ToI420
converter that helps to handle any kind of image format.
However, ToI420
can only handle YCbCr images at this moment. So, the motivation for adding more image formats would be,
VideoTransform
, having a flexible input will also open a door for people to use a library such as https://godoc.org/github.com/disintegration/imaging, which relies highly on NRGBA
PeerConnection
is only used to list up supported codecs.
Constructor directly supplying list of codecs would make it capable for non-WebRTC purposes as same as the Web API.
Only selected prop is passed to Video/AudioRecord
at now.
For example on screen capture driver, FrameRate is not discrete.
It would be nice to pass the selected prop and also the requested prop to read such parameters.
It would be useful if a custom image processor (func CustomImageProcessor(r video.Reader) video.Reader
) can be inserted between device and codec.
Personally, I would like to use this package as a replacement of gstreamer. Clock overlay to the image is what I want to insert by using it.
cvendor/lib/openh264/libopenh264.a
is compiled for x86_64 Linux.
Would be nice to have more portable way for multiple environments.
Not yet digged into the details.
Same source code works on go1.13.8.
Linux 5.4.19-100.fc30.x86_64 #1 SMP Tue Feb 11 22:27:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
go version go1.14 linux/amd64
Some scaling kernels are available in x/image/draw
package.
https://pkg.go.dev/golang.org/x/image/draw?tab=doc#pkg-variables
Maybe as map[string]interface{}
?
prop.Codec.Quality
can be dropped instead.
For linux, libx11 XOpenDisplay, XDefaultRootWindow, XShmGetImage, ShmPutImage
could be used.
As of now, mediadevices
uses many interfaces:
While using interfaces makes the design very flexible, it doesn't give that benefit for free. Following are some of the downsides:
While I've laid out some of the downsides of using interfaces above, I still think that they're great and should be used appropriately. So, I think we should try to get rid of some of the interfaces and replace them with structs, we should remove the ones that don't require flexibilities.
In my opinion, we should convert MediaDevices
and MediaStream
interfaces to structs.
Note: Hopefully when pion/webrtc v3 is ready, Tracker
and LocalTrack
interfaces can get merged to pion/webrtc
.
Hello
I want to make a video call without a browser by connecting my RaspberryPi with a microphone and a webcam.
SFU uses Janus.
The video-room example is being modified. I want advice.
I want to set the input from the microphone as an audio track in pion. How can I do that?
Do you have any samples?
=== RUN TestMeasureBitRateDynamic
##[error] TestMeasureBitRateDynamic: measurement_test.go:95: expected: 25600.000000 (with 8.000000 precision), but got 25585.959280
--- FAIL: TestMeasureBitRateDynamic (5.00s)
https://github.com/pion/mediadevices/pull/139/checks?check_run_id=598870347 (restarted)
Drop frames to limit output framerate.
Not sure what API can we use Windows yet. We need to do some research.
In Web API, bitrate of the codecs are controlled by SDP like:
a=mid:audio
b=AS:000
but it's too complicated for this package.
Directly configuring them on GetUserMedia
would be better for us.
For example, like
s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
Audio: func(c *mediadevices.MediaTrackConstraints) {
c.Codec = webrtc.Opus
c.BitRate = 32000 // 32kbps
...
},
Video: func(c *mediadevices.MediaTrackConstraints) {
c.Codec = videoCodecName
c.BitRate = 100000 // 100kbps
c.KeyFrameInterval = 100
...
},
})
or
s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
Audio: func(c *mediadevices.MediaTrackConstraints, c2 *mediadevices.CodecParameters) {
c.Codec = webrtc.Opus
c2.BitRate = 32000 // 32kbps
...
},
Video: func(c *mediadevices.MediaTrackConstraints, c2 *mediadevices.CodecParameters) {
c.Codec = videoCodecName
c2.BitRate = 100000 // 100kbps
c2.KeyFrameInterval = 100
...
},
})
As of now, VideoEncoderBuilder
and AudioEncoderBuilder
only return io.ReadCloser
and `error:
BuildAudioEncoder(r audio.Reader, p prop.Media) (io.ReadCloser, error)
BuildVideoEncoder(r video.Reader, p prop.Media) (io.ReadCloser, error)
While returning io.ReadCloser
is very idiomatic, io.ReadCloser
is not enough for our need. The main limitation is rate-limiting, we can't adjust the codec parameters on the fly or decreasing/increasing the bitrate as needed depending on the current network speed and quality.
So, instead of returning io.ReadCloser
, it's better to instead return a new interface that embeds io.ReadCloser
and has another method that updates the BaseParam:
package codec
import "io"
type ReadCloser interface {
io.ReadCloser
Update(params BaseParams) error
}
Currently, source media properties are passed to encoder directly, however, Video/AudioTransformer may change the media properties.
e.g. VideoTransformer may change frame rate and size, AudioTransformer may change number of the channels
Pure Go implementation of this through image.Image
interface requires huge amount of overhead.
It's not a problem for now, but I would like to leave a note.
Following test just checks OnEnded
callback.
package main
import (
"testing"
"time"
"github.com/pion/mediadevices"
_ "github.com/pion/mediadevices/pkg/codec/vpx"
"github.com/pion/mediadevices/pkg/frame"
"github.com/pion/webrtc/v2"
)
func TestMain(t *testing.T) {
configs := map[string]webrtc.Configuration{
"WithSTUN": {
ICEServers: []webrtc.ICEServer{
{URLs: []string{"stun:stun.l.google.com:19302"}},
},
},
"WithoutSTUN": {
ICEServers: []webrtc.ICEServer{},
},
}
for name, config := range configs {
t.Run(name, func(t *testing.T) {
peerConnection, err := webrtc.NewPeerConnection(config)
if err != nil {
t.Fatal(err)
}
md := mediadevices.NewMediaDevices(peerConnection)
s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
Video: func(c *mediadevices.MediaTrackConstraints) {
c.CodecName = videoCodecName
c.FrameFormat = frame.FormatI420
c.Enabled = true
c.Width = 640
c.Height = 480
},
})
if err != nil {
t.Fatal(err)
}
trackers := s.GetTracks()
if len(trackers) != 1 {
t.Fatal("wrong number of the tracks")
}
peerConnection.AddTrack(trackers[0].Track())
trackers[0].OnEnded(func(err error) {
t.Error(err)
})
time.Sleep(10 * time.Second)
trackers[0].OnEnded(func(err error) {})
peerConnection.Close()
trackers[0].Stop()
time.Sleep(time.Second)
})
}
}
with treating camera read timeout as error:
diff --git a/pkg/driver/camera/camera_linux.go b/pkg/driver/camera/camera_linux.go
index cee43b2..f7202f8 100644
--- a/pkg/driver/camera/camera_linux.go
+++ b/pkg/driver/camera/camera_linux.go
@@ -4,6 +4,7 @@ package camera
import "C"
import (
+ "errors"
"image"
"io"
@@ -97,6 +98,7 @@ func (c *camera) VideoRecord(p prop.Media) (video.Reader, error) {
switch err.(type) {
case nil:
case *webcam.Timeout:
+ return nil, errors.New("read timeout")
continue
default:
// Camera has been stopped.
It fails with stun server only on go1.14rc1.
$ go1.14rc1 test . -v
=== RUN TestMain
=== RUN TestMain/WithSTUN
TestMain/WithSTUN: main_test.go:51: read timeout
=== RUN TestMain/WithoutSTUN
--- FAIL: TestMain (32.97s)
--- FAIL: TestMain/WithSTUN (21.91s)
--- PASS: TestMain/WithoutSTUN (11.06s)
FAIL
FAIL github.com/pion/mediadevices/examples/simple 32.986s
FAIL
$ go1.13 test . -v
=== RUN TestMain
=== RUN TestMain/WithSTUN
=== RUN TestMain/WithoutSTUN
--- PASS: TestMain (27.74s)
--- PASS: TestMain/WithSTUN (16.67s)
--- PASS: TestMain/WithoutSTUN (11.07s)
PASS
ok github.com/pion/mediadevices/examples/simple 27.756s
I will check it again once next RC of Go1.14 gets available.
This goal is to make it easy for cross OS development with Linux by containerizing the dependencies for each OS in its own Docker image.
The runtime:
It is MediaTrackConstraints.deviceId
in Web API.
In order to specify what codecs should be used, users need to:
import (
...
_ "github.com/pion/mediadevices/pkg/codec/openh264" // This is required to register h264 video encoder
...
)
GetUserMedia
:// From github.com/pion/webrtc
package webrtc
const (
PCMU = "PCMU"
PCMA = "PCMA"
G722 = "G722"
Opus = "OPUS"
VP8 = "VP8"
VP9 = "VP9"
H264 = "H264"
)
// From example
md.GetUserMedia(mediadevices.MediaStreamConstraints{
Audio: func(c *mediadevices.MediaTrackConstraints) {
c.CodecName = webrtc.Opus
c.Enabled = true
c.BitRate = 32000 // 32kbps
},
Video: func(c *mediadevices.MediaTrackConstraints) {
c.CodecName = webrtc.H264
c.FrameFormat = frame.FormatYUY2
c.Enabled = true
c.Width = 640
c.Height = 480
c.BitRate = 100000 // 100kbps
},
})
From the points above, it shows that the current design (using import as a side effect only to register the codec) requires implicit knowledge from the users, they need to know what kind of codec name that is being registered because they have to give the same CodecName that's registered in the imports.
Not only this design is confusing and error-prone, it's also not scalable and inflexible. What if we want to specify codec specific parameters (#106)? How about having a fallback method (#108), e.g. fallback to software encoder when hardware acceleration is not available?
The solution for the above needs seems to be solvable by using an empty interface. But, the problem with empty interfaces is that we lose static type check.
Profile of #102
flat flat% sum% cum cum%
0.52s 33.77% 33.77% 0.52s 33.77% runtime.cgocall
0.47s 30.52% 64.29% 0.53s 34.42% github.com/pion/mediadevices/pkg/frame.decodeYUY2
0.09s 5.84% 70.13% 0.09s 5.84% runtime.usleep
decodeYUY2
occupies almost same amount of CPU time of hardware accelerated VP8 encoding.
Add something like image.Image
to support variable channel numbers and sample formats.
It makes conversion of channel size, sampling rate, and sample format easy.
I think we should try to decouple mediadevices
from webrtc stuff so that it can be more generic and useful for a wider audience. Also, if we look at the original definition of the MediaDevices API from Mozilla, they never mentioned that the API is solely for webrtc:
The MediaDevices interface provides access to connected media input devices like cameras and microphones, as well as screen sharing. In essence, it lets you obtain access to any hardware source of media data.
Reference: https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices
Broadcast webcam stream from server to browser.
This feature definitely should be in examples section.
It has many use cases: from home security control to live video streaming.
Nowadays it's a must have feature.
Alternatives: Python (aiortc), Java (Kurento), linux-projects (UV4L)
This feature is very much in demand but alternatives have limitations.
I.e. Kurento requires Java Machine to be installed, aiortc has difficulties with setup due to Python versioning and package support, UV4L is not stable, pretty opinionated and it's not open sourced.
So, this is the case where Go will shine.
jsfiddle can load code from GitHub repository like:
https://jsfiddle.net/gh/get/library/pure/pion/example-webrtc-applications/tree/master/save-to-webm/jsfiddle
https://github.com/pion/example-webrtc-applications/tree/master/save-to-webm/jsfiddle
The demo page contains extra transceiver.
// Offer to receive 1 audio, and 2 video tracks
pc.addTransceiver('audio', {'direction': 'recvonly'})
pc.addTransceiver('video', {'direction': 'recvonly'})
pc.addTransceiver('video', {'direction': 'recvonly'})
It would be more clean to have audio/video demo and video only demo separately.
Current code assumes that the pixel format is 32bit RGBA.
Add one more example with raspivid support instead of gstreamer
to provide hardware encoding for Raspberry Pi.
Since Raspberry Pi is wildly used in many projects it would be nice to add webrtc implementation with Go for it.
No such brilliant alternatives like Go & webrtc yet.
Just to send video from Raspberry Pi to browser with raspivid
for hardware encoding.
In Web API, MediaStreamTrack.ended
event is fired and MediaStreamTrack.onended
handler is called on such errors.
https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrack/onended
This event occurs when the track will no longer provide data to the stream for any reason, including the end of the media input being reached, the user revoking needed permissions, the source device being removed, or the remote peer ending a connection.
Lines 115 to 124 in e4da8fa
Lines 181 to 185 in e4da8fa
Create https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia interface for Linux.
Not sure what API can we use Windows yet. We need to do some research.
As of now, microphone and camera adapters are in the same folder, driver
. To make it more organized and modularized, it's good to group them based on their categories.
Before:
driver
-- microphone_linux.go
-- camera_linux.go
After:
driver
-- camera
-- camera_linux.go
-- microphone
-- microphone_linux.go
This way, the separation between devices is clear, and it'll put less cognitive load for the driver implementor.
In Darwin, we can get access to the microphone through https://developer.apple.com/av-foundation/. To access this API, we need to use either Swift/Objective C.
As of now, GetUserMedia
accepts a single parameter, MediaStreamConstraints
defined as follows:
func (m *mediaDevices) GetUserMedia(constraints MediaStreamConstraints) (MediaStream, error) {
...
}
type MediaStreamConstraints struct {
Audio MediaOption
Video MediaOption
}
// MediaTrackConstraints represents https://w3c.github.io/mediacapture-main/#dom-mediatrackconstraints
type MediaTrackConstraints struct {
prop.Media
Enabled bool
// VideoEncoderBuilders are codec builders that are used for encoding the video
// and later being used for sending the appropriate RTP payload type.
//
// If one encoder builder fails to build the codec, the next builder will be used,
// repeating until a codec builds. If no builders build successfully, an error is returned.
VideoEncoderBuilders []codec.VideoEncoderBuilder
// AudioEncoderBuilders are codec builders that are used for encoding the audio
// and later being used for sending the appropriate RTP payload type.
//
// If one encoder builder fails to build the codec, the next builder will be used,
// repeating until a codec builds. If no builders build successfully, an error is returned.
AudioEncoderBuilders []codec.AudioEncoderBuilder
// VideoTransform will be used to transform the video that's coming from the driver.
// So, basically it'll look like following: driver -> VideoTransform -> codec
VideoTransform video.TransformFunc
// AudioTransform will be used to transform the audio that's coming from the driver.
// So, basically it'll look like following: driver -> AudioTransform -> code
AudioTransform audio.TransformFunc
}
type MediaOption func(*MediaTrackConstraints)
From the type definitions above, we see that we're using MediaTrackConstraints
for unrelated stuff such as:
I think we should somehow move them away from MediaTrackConstraints
because,
The purpose of this issue thread is to talk about possible designs that can solve the problems above.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.