pion / mediadevices Goto Github PK

View Code? Open in Web Editor NEW

525.0 16.0 119.0 31.17 MB

Go implementation of the MediaDevices API.

Home Page: https://pion.ly/

License: MIT License

Go 66.70% C++ 4.55% C 26.01% Dockerfile 0.34% Objective-C 1.74% Shell 0.02% CMake 0.05% Makefile 0.59%

webrtc driver codec video-call audio-call machine-learning face-recognition livestream rtp voip

mediadevices's People

Contributors

Stargazers

Watchers

Forkers

dolanor-galaxy mzkaramat forkkit alejinho avfoundation 18958182818 masatrio 15576684447 tristech81 isgasho fffilimonov pounze f-fl0 wawesomenogui asdlei99 livestream-starters andrein mundeuksoo miloradlalovic kokizzu zyxar laodano1 edaniels digitalix hookttg emrysmyrddin lamhai1401 suunc-suunc guangminglion gtrevg qnkhuat katepangliu n2d trevor403 mix3 brandt dominikn jkiler zjzhang-cn stars-and-focus xy-poin http600 kkc90 maximpertsov orz-forks 33forks mondy cpdevs igtulm vvhh2002 meonardo izinga hetelek krvajal st1992 anylee2021 matt-allan eric7578 braveslc at-wat titlid dgovorukhin myahuang baron2050 neilsagarwal isabella232 kw-m vpalmisano erichsu selvakn malykhin kharijarrett radekg rodrigoaddor bazile-clyde rob1in ctpalmer amirulandalib neversi adityaa30 marthajohnston martha-johnston jbain stv0g seanavery kim-mishra zfg88287508 dezi infamy ryanboring aljanabim thanhphuoc95 kzwhui kobayashirui hexbabe renowncoder rm4n0s abconnectio goga1992 webfutureiorepo

mediadevices's Issues

Allow camera to accept YUYV and NV21 formats

Add camera adapter for Darwin (Mac)

In Darwin, we can get access to the camera through https://developer.apple.com/av-foundation/. To access this API, we need to use either Swift/Objective C. In fact, there's a project that has done this before (https://github.com/dialup-inc/ascii/blob/master/camera/cam_avfoundation.mm). So, we can probably simply learn how they did it and adapt it to this project.

GetUserMedia doesn't close driver if request is partially successed

When calling GetUserMedia with audio and video constraints, if one of audio/video failed to be initialized, other one is left opened. Then, the next GetUserMedia call always fail with invalid state: driver is already opened.

Stream input from remote track

Receive RTP from WebRTC remote track, decode, process (by Audio/VideoTransform), and encode.
This realizes tiling (or picture-in-picture) of multiple streams into one stream, and audio mixing.

Refractor track to be more DRY

Due to the differences between audio and video and lack of generics in Go, there are many duplicated codes in https://github.com/pion/mediadevices/blob/38deddc4f0bb0ceae8391ad33b474c3ecdb0c267/track.go.

Although this is ok, it's still better to reduce duplicated codes so that it's easier to maintain.

Fallback codec

Select a fallback codec implementation if the first one was failed to initialize.
For example with the case of vaapi (higher priority) and vpx, try vaapi first, and if the environment doesn't have video acceleration hardware, use vpx.

It could be something like:

codec.Register(webrtc.VP8, codec.VideoEncoderFallbacks(
  codec.VideoEncoderBuilder(vaapi.NewVP8Encoder),
  codec.VideoEncoderBuilder(vpx.NewVP8Encoder),
)

One problem is that user code doesn't have a way to know which implementation is used, so it's difficult to pass codec specific parameter if using fallback codec.
Adding prop.Codec.ImplementationName string and pass multiple CodecParams for each ImplementationName as map[string]interface{}?

Internally set driver priority

Currently, there is a randomness of audio device selection. (since Go map is unordered.)
Audio source device may contains monitor which is a loopback of audio output.
Adding little bit higher priority to non-monitor device and/or system default device might stabilize selection and suit for typical use.

Support MediaStreamTrack.applyConstraints()

In WebAPI, media setting can be updated by applyConstraints().

https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrack/applyConstraints

Track's ID and Label are swapped

Example:

Track (ID: video, Label: 10afd1bd-dee0-452a-85eb-b49cbea60194) ended with error: EOF

Support ranged media constraints

Pixel format is scored by complete match at now.
For example, I want to select one of YUY2 and UYVY but not want to use JPEG.

mediadevices/pkg/prop/prop.go

Lines 99 to 112 in 949e850

 // Video represents a video's properties 

 type Video struct { 

 Width, Height int 

 FrameRate float32 

 FrameFormat frame.Format 

 } 

 // Audio represents an audio's properties 

 type Audio struct { 

 ChannelCount int 

 Latency time.Duration 

 SampleRate int 

 SampleSize int 

 }

Improve Windows drivers

A follow-up of #83 and #89

Possible improvements

Microphone

Enumerate devices
Detect device disconnection and return EOF
Get actual properties
Migrate to newer API? (DirectSound, Media Foundation)

Camera

Support other pixel formats
Detect device disconnection and return EOF
Migrate to newer API? (Media Foundation)

Any other improvements are welcomed!

Add support for non-YCbCr input to ToI420

As of now, our video encoders, openh264 and vpx, only support I420 format. Therefore, we have ToI420 converter that helps to handle any kind of image format.

However, ToI420 can only handle YCbCr images at this moment. So, the motivation for adding more image formats would be,

Allow more flexibility for the input
Since we allow users to transform the video through VideoTransform, having a flexible input will also open a door for people to use a library such as https://godoc.org/github.com/disintegration/imaging, which relies highly on NRGBA

Add another NewMediaDevices without PeerConnection

PeerConnection is only used to list up supported codecs.
Constructor directly supplying list of codecs would make it capable for non-WebRTC purposes as same as the Web API.

Pass also requested properties to Video/AudioRecord

Only selected prop is passed to Video/AudioRecord at now.

For example on screen capture driver, FrameRate is not discrete.
It would be nice to pass the selected prop and also the requested prop to read such parameters.

Inserting custom image processor

It would be useful if a custom image processor (func CustomImageProcessor(r video.Reader) video.Reader) can be inserted between device and codec.

Personally, I would like to use this package as a replacement of gstreamer. Clock overlay to the image is what I want to insert by using it.

OpenH264 library for non-x86_64-Linux environments

cvendor/lib/openh264/libopenh264.a is compiled for x86_64 Linux.
Would be nice to have more portable way for multiple environments.

Camera always times out on Go 1.14 Linux

Not yet digged into the details.
Same source code works on go1.13.8.

Linux host environment: Linux 5.4.19-100.fc30.x86_64 #1 SMP Tue Feb 11 22:27:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Go version: go version go1.14 linux/amd64

Add video scaling transform

Some scaling kernels are available in x/image/draw package.
https://pkg.go.dev/golang.org/x/image/draw?tab=doc#pkg-variables

Pass codec specific encoding parameters

Maybe as map[string]interface{}?
prop.Codec.Quality can be dropped instead.

Add screen capture adapter for Linux

For linux, libx11 XOpenDisplay, XDefaultRootWindow, XShmGetImage, ShmPutImage could be used.

Reduce mediadevices complexity

As of now, mediadevices uses many interfaces:

While using interfaces makes the design very flexible, it doesn't give that benefit for free. Following are some of the downsides:

Create some boilerplates and reduce maintainability because interfaces don't have implementation details. So, we'll always end up with 1 type for the interface and another type for the struct.
Increase API complexity if we overuse interfaces. Every time we use an interface, it adds a layer of indirection to the actual definition, which is ok if they're only a few defined interfaces. But, the problem quickly arises when we have multiple of them and they're spread all over the place in a package.
Reduce docs readability. I think this is similar to the second point. Since using interfaces adds an extra layer of indirection, it requires more thought process to the reader.

While I've laid out some of the downsides of using interfaces above, I still think that they're great and should be used appropriately. So, I think we should try to get rid of some of the interfaces and replace them with structs, we should remove the ones that don't require flexibilities.

In my opinion, we should convert MediaDevices and MediaStream interfaces to structs.

Note: Hopefully when pion/webrtc v3 is ready, Tracker and LocalTrack interfaces can get merged to pion/webrtc.

Add libva based codecs

libva supports hardware accelerated encoding/decoding of MPEG-2, MPEG-4 ASP/H.263, MPEG-4 AVC/H.264, VC-1/VMW3, and JPEG, HEVC/H265, VP8, VP9.

For example, since Kaby Lake, Intel CPU has VP8/9 accelerator .

VP8 @at-wat
VP9 @at-wat
H264

How to get audio from a microphone

Hello

I want to make a video call without a browser by connecting my RaspberryPi with a microphone and a webcam.

SFU uses Janus.

The video-room example is being modified. I want advice.

I want to set the input from the microphone as an audio track in pion. How can I do that?
Do you have any samples?

TestMeasureBitRate sometimes fails

=== RUN   TestMeasureBitRateDynamic
##[error]    TestMeasureBitRateDynamic: measurement_test.go:95: expected: 25600.000000 (with 8.000000 precision), but got 25585.959280
--- FAIL: TestMeasureBitRateDynamic (5.00s)

https://github.com/pion/mediadevices/pull/139/checks?check_run_id=598870347 (restarted)

Add video throttle transform

Drop frames to limit output framerate.

Add camera adapter to Windows

Not sure what API can we use Windows yet. We need to do some research.

Add a way to control codec encoder parameters

In Web API, bitrate of the codecs are controlled by SDP like:

a=mid:audio
b=AS:000

but it's too complicated for this package.

Directly configuring them on GetUserMedia would be better for us.
For example, like

  s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
    Audio: func(c *mediadevices.MediaTrackConstraints) {
      c.Codec = webrtc.Opus
      c.BitRate = 32000 // 32kbps
      ...
    },
    Video: func(c *mediadevices.MediaTrackConstraints) {
      c.Codec = videoCodecName
      c.BitRate = 100000 // 100kbps
      c.KeyFrameInterval = 100
      ...
    },
  })

  s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
    Audio: func(c *mediadevices.MediaTrackConstraints, c2 *mediadevices.CodecParameters) {
      c.Codec = webrtc.Opus
      c2.BitRate = 32000 // 32kbps
      ...
    },
    Video: func(c *mediadevices.MediaTrackConstraints, c2 *mediadevices.CodecParameters) {
      c.Codec = videoCodecName
      c2.BitRate = 100000 // 100kbps
      c2.KeyFrameInterval = 100
      ...
    },
  })

Update codec builder to support rate limiting while running

As of now, VideoEncoderBuilder and AudioEncoderBuilder only return io.ReadCloser and `error:

BuildAudioEncoder(r audio.Reader, p prop.Media) (io.ReadCloser, error)
BuildVideoEncoder(r video.Reader, p prop.Media) (io.ReadCloser, error)

While returning io.ReadCloser is very idiomatic, io.ReadCloser is not enough for our need. The main limitation is rate-limiting, we can't adjust the codec parameters on the fly or decreasing/increasing the bitrate as needed depending on the current network speed and quality.

So, instead of returning io.ReadCloser, it's better to instead return a new interface that embeds io.ReadCloser and has another method that updates the BaseParam:

package codec

import "io"

type ReadCloser interface {
	io.ReadCloser
	Update(params BaseParams) error
}

Get encoding media properties from Transformer

Currently, source media properties are passed to encoder directly, however, Video/AudioTransformer may change the media properties.
e.g. VideoTransformer may change frame rate and size, AudioTransformer may change number of the channels

Add information of the required libraries to example READMEs

https://github.com/pion/webrtc/issues/1102#issuecomment-605429312

Implement faster ToI420

Pure Go implementation of this through image.Image interface requires huge amount of overhead.

Read from camera times-out on go1.14rc1

It's not a problem for now, but I would like to leave a note.

Following test just checks OnEnded callback.

package main

import (
	"testing"
	"time"

	"github.com/pion/mediadevices"
	_ "github.com/pion/mediadevices/pkg/codec/vpx"
	"github.com/pion/mediadevices/pkg/frame"
	"github.com/pion/webrtc/v2"
)

func TestMain(t *testing.T) {
	configs := map[string]webrtc.Configuration{
		"WithSTUN": {
			ICEServers: []webrtc.ICEServer{
				{URLs: []string{"stun:stun.l.google.com:19302"}},
			},
		},
		"WithoutSTUN": {
			ICEServers: []webrtc.ICEServer{},
		},
	}
	for name, config := range configs {
		t.Run(name, func(t *testing.T) {
			peerConnection, err := webrtc.NewPeerConnection(config)
			if err != nil {
				t.Fatal(err)
			}

			md := mediadevices.NewMediaDevices(peerConnection)

			s, err := md.GetUserMedia(mediadevices.MediaStreamConstraints{
				Video: func(c *mediadevices.MediaTrackConstraints) {
					c.CodecName = videoCodecName
					c.FrameFormat = frame.FormatI420
					c.Enabled = true
					c.Width = 640
					c.Height = 480
				},
			})
			if err != nil {
				t.Fatal(err)
			}
			trackers := s.GetTracks()
			if len(trackers) != 1 {
				t.Fatal("wrong number of the tracks")
			}
			peerConnection.AddTrack(trackers[0].Track())
			trackers[0].OnEnded(func(err error) {
				t.Error(err)
			})
			time.Sleep(10 * time.Second)
			trackers[0].OnEnded(func(err error) {})
			peerConnection.Close()
			trackers[0].Stop()
			time.Sleep(time.Second)
		})
	}
}

with treating camera read timeout as error:

diff --git a/pkg/driver/camera/camera_linux.go b/pkg/driver/camera/camera_linux.go
index cee43b2..f7202f8 100644
--- a/pkg/driver/camera/camera_linux.go
+++ b/pkg/driver/camera/camera_linux.go
@@ -4,6 +4,7 @@ package camera
 import "C"
 
 import (
+       "errors"
        "image"
        "io"
 
@@ -97,6 +98,7 @@ func (c *camera) VideoRecord(p prop.Media) (video.Reader, error) {
                        switch err.(type) {
                        case nil:
                        case *webcam.Timeout:
+                               return nil, errors.New("read timeout")
                                continue
                        default:
                                // Camera has been stopped.

It fails with stun server only on go1.14rc1.

$ go1.14rc1 test . -v
=== RUN   TestMain
=== RUN   TestMain/WithSTUN
    TestMain/WithSTUN: main_test.go:51: read timeout
=== RUN   TestMain/WithoutSTUN
--- FAIL: TestMain (32.97s)
    --- FAIL: TestMain/WithSTUN (21.91s)
    --- PASS: TestMain/WithoutSTUN (11.06s)
FAIL
FAIL	github.com/pion/mediadevices/examples/simple	32.986s
FAIL
$ go1.13 test . -v
=== RUN   TestMain
=== RUN   TestMain/WithSTUN
=== RUN   TestMain/WithoutSTUN
--- PASS: TestMain (27.74s)
    --- PASS: TestMain/WithSTUN (16.67s)
    --- PASS: TestMain/WithoutSTUN (11.07s)
PASS
ok  	github.com/pion/mediadevices/examples/simple	27.756s

I will check it again once next RC of Go1.14 gets available.

Create Dockerfiles for development

Description

This goal is to make it easy for cross OS development with Linux by containerizing the dependencies for each OS in its own Docker image.

The runtime:

Windows: https://www.winehq.org/
Darwin: https://www.darlinghq.org/

Filter device by deviceId

It is MediaTrackConstraints.deviceId in Web API.

Redesign codec

Problem

In order to specify what codecs should be used, users need to:

Import a specific codec to get the side effect, registering itself to the registrar:

import (
   ...
   _ "github.com/pion/mediadevices/pkg/codec/openh264" // This is required to register h264 video encoder
   ...
)

Specify a proper codec name from the following possible values to GetUserMedia:

// From github.com/pion/webrtc
package webrtc
const (
	PCMU = "PCMU"
	PCMA = "PCMA"
	G722 = "G722"
	Opus = "OPUS"
	VP8  = "VP8"
	VP9  = "VP9"
	H264 = "H264"
)

// From example
md.GetUserMedia(mediadevices.MediaStreamConstraints{
		Audio: func(c *mediadevices.MediaTrackConstraints) {
			c.CodecName = webrtc.Opus
			c.Enabled = true
			c.BitRate = 32000 // 32kbps
		},
		Video: func(c *mediadevices.MediaTrackConstraints) {
			c.CodecName = webrtc.H264
			c.FrameFormat = frame.FormatYUY2
			c.Enabled = true
			c.Width = 640
			c.Height = 480
			c.BitRate = 100000 // 100kbps
		},
	})

From the points above, it shows that the current design (using import as a side effect only to register the codec) requires implicit knowledge from the users, they need to know what kind of codec name that is being registered because they have to give the same CodecName that's registered in the imports.

Not only this design is confusing and error-prone, it's also not scalable and inflexible. What if we want to specify codec specific parameters (#106)? How about having a fallback method (#108), e.g. fallback to software encoder when hardware acceleration is not available?

The solution for the above needs seems to be solvable by using an empty interface. But, the problem with empty interfaces is that we lose static type check.

Add fast decodeYUY2

Profile of #102

      flat  flat%   sum%        cum   cum%
     0.52s 33.77% 33.77%      0.52s 33.77%  runtime.cgocall
     0.47s 30.52% 64.29%      0.53s 34.42%  github.com/pion/mediadevices/pkg/frame.decodeYUY2
     0.09s  5.84% 70.13%      0.09s  5.84%  runtime.usleep

decodeYUY2 occupies almost same amount of CPU time of hardware accelerated VP8 encoding.

Define flexible audio data interface

Add something like image.Image to support variable channel numbers and sample formats.
It makes conversion of channel size, sampling rate, and sample format easy.

Decouple from webrtc stuff

I think we should try to decouple mediadevices from webrtc stuff so that it can be more generic and useful for a wider audience. Also, if we look at the original definition of the MediaDevices API from Mozilla, they never mentioned that the API is solely for webrtc:

The MediaDevices interface provides access to connected media input devices like cameras and microphones, as well as screen sharing. In essence, it lets you obtain access to any hardware source of media data.

Reference: https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices

webcam broadcasting

Summary

Broadcast webcam stream from server to browser.

Motivation

This feature definitely should be in examples section.
It has many use cases: from home security control to live video streaming.
Nowadays it's a must have feature.

Alternatives

Alternatives: Python (aiortc), Java (Kurento), linux-projects (UV4L)

Additional context

This feature is very much in demand but alternatives have limitations.
I.e. Kurento requires Java Machine to be installed, aiortc has difficulties with setup due to Python versioning and package support, UV4L is not stable, pretty opinionated and it's not open sourced.
So, this is the case where Go will shine.

Manage demo page in this repository

jsfiddle can load code from GitHub repository like:
https://jsfiddle.net/gh/get/library/pure/pion/example-webrtc-applications/tree/master/save-to-webm/jsfiddle
https://github.com/pion/example-webrtc-applications/tree/master/save-to-webm/jsfiddle

The demo page contains extra transceiver.

// Offer to receive 1 audio, and 2 video tracks
pc.addTransceiver('audio', {'direction': 'recvonly'})
pc.addTransceiver('video', {'direction': 'recvonly'})
pc.addTransceiver('video', {'direction': 'recvonly'})

It would be more clean to have audio/video demo and video only demo separately.

Support more pixel format in x11 screen capture driver

Current code assumes that the pixel format is 32bit RGBA.

raspivid support

Summary

Add one more example with raspivid support instead of gstreamer to provide hardware encoding for Raspberry Pi.

Motivation

Since Raspberry Pi is wildly used in many projects it would be nice to add webrtc implementation with Go for it.

Alternatives

No such brilliant alternatives like Go & webrtc yet.

Additional context

Just to send video from Raspberry Pi to browser with raspivid for hardware encoding.

Add device/codec error handler

In Web API, MediaStreamTrack.ended event is fired and MediaStreamTrack.onended handler is called on such errors.
https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrack/onended

This event occurs when the track will no longer provide data to the stream for any reason, including the end of the media input being reached, the user revoking needed permissions, the source device being removed, or the remote peer ending a connection.

mediadevices/track.go

Lines 115 to 124 in e4da8fa

 n, err = vt.encoder.Read(buff) 

 if err != nil { 

 if e, ok := err.(*mio.InsufficientBufferError); ok { 

 buff = make([]byte, 2*e.RequiredSize) 

 continue 

 } 

 // TODO: better error handling 

 panic(err) 

 }

mediadevices/track.go

Lines 181 to 185 in e4da8fa

 n, err := t.encoder.Read(buff) 

 if err != nil { 

 // TODO: better error handling 

 panic(err) 

 }

Switch from using a compressed format to an uncompressed format

Create getUserMedia interface for Linux

Summary

Create https://developer.mozilla.org/en-US/docs/Web/API/MediaDevices/getUserMedia interface for Linux.

Add microphone adapter to Windows

Not sure what API can we use Windows yet. We need to do some research.

Adapters should be grouped by their categories

As of now, microphone and camera adapters are in the same folder, driver. To make it more organized and modularized, it's good to group them based on their categories.

Before:

driver
-- microphone_linux.go
-- camera_linux.go

After:

driver
-- camera
   -- camera_linux.go
-- microphone
   -- microphone_linux.go

This way, the separation between devices is clear, and it'll put less cognitive load for the driver implementor.

driver/camera: support UYVY pixel format

Add microphone adapter to Darwin (Mac)

In Darwin, we can get access to the microphone through https://developer.apple.com/av-foundation/. To access this API, we need to use either Swift/Objective C.

Redesign GetUserMedia API

As of now, GetUserMedia accepts a single parameter, MediaStreamConstraints defined as follows:

func (m *mediaDevices) GetUserMedia(constraints MediaStreamConstraints) (MediaStream, error) {
   ...
}

type MediaStreamConstraints struct {
	Audio MediaOption
	Video MediaOption
}

// MediaTrackConstraints represents https://w3c.github.io/mediacapture-main/#dom-mediatrackconstraints
type MediaTrackConstraints struct {
	prop.Media
	Enabled bool
	// VideoEncoderBuilders are codec builders that are used for encoding the video
	// and later being used for sending the appropriate RTP payload type.
	//
	// If one encoder builder fails to build the codec, the next builder will be used,
	// repeating until a codec builds. If no builders build successfully, an error is returned.
	VideoEncoderBuilders []codec.VideoEncoderBuilder
	// AudioEncoderBuilders are codec builders that are used for encoding the audio
	// and later being used for sending the appropriate RTP payload type.
	//
	// If one encoder builder fails to build the codec, the next builder will be used,
	// repeating until a codec builds. If no builders build successfully, an error is returned.
	AudioEncoderBuilders []codec.AudioEncoderBuilder
	// VideoTransform will be used to transform the video that's coming from the driver.
	// So, basically it'll look like following: driver -> VideoTransform -> codec
	VideoTransform video.TransformFunc
	// AudioTransform will be used to transform the audio that's coming from the driver.
	// So, basically it'll look like following: driver -> AudioTransform -> code
	AudioTransform audio.TransformFunc
}

type MediaOption func(*MediaTrackConstraints)

From the type definitions above, we see that we're using MediaTrackConstraints for unrelated stuff such as:

VideoEncoderBuilders
AudioEncoderBuilders
VideoTransform
AudioTransform

I think we should somehow move them away from MediaTrackConstraints because,

It's less confusing for the API user
Also, it'll make easier later when we want to interop with JS

The purpose of this issue thread is to talk about possible designs that can solve the problems above.

	// Video represents a video's properties
	type Video struct {
	Width, Height int
	FrameRate float32
	FrameFormat frame.Format
	}

	// Audio represents an audio's properties
	type Audio struct {
	ChannelCount int
	Latency time.Duration
	SampleRate int
	SampleSize int
	}

	n, err = vt.encoder.Read(buff)
	if err != nil {
	if e, ok := err.(*mio.InsufficientBufferError); ok {
	buff = make([]byte, 2*e.RequiredSize)
	continue
	}

	// TODO: better error handling
	panic(err)
	}

	n, err := t.encoder.Read(buff)
	if err != nil {
	// TODO: better error handling
	panic(err)
	}

pion / mediadevices Goto Github PK

mediadevices's People

Contributors

Stargazers

Watchers

Forkers

mediadevices's Issues

Possible improvements

Microphone

Camera

Description

Problem

Summary

Motivation

Alternatives

Additional context

Summary

Motivation

Alternatives

Additional context

Summary

Recommend Projects

Recommend Topics

Recommend Org