w3c / mediacapture-main Goto Github PK

View Code? Open in Web Editor NEW

121.0 56.0 61.0 19.47 MB

Media Capture and Streams specification (aka getUserMedia)

Home Page: https://w3c.github.io/mediacapture-main/

License: Other

HTML 99.92% CSS 0.03% JavaScript 0.05%

webrtc

mediacapture-main's Introduction

Media Capture and Streams Specification

This document defines a set of JavaScript APIs that allow local media, including audio and video, to be requested from a platform.

The document is developed by the WebRTC Working Group.

Published Versions

Please review and report bugs against the latest editor's draft.

Latest editor's draft Latest published version

Useful Links

The content of this document is discussed at the public-media-capture mailing list.

Find more information about the joint task force at our wiki.

Contribution Guidelines

mediacapture-main's People

Contributors

Stargazers

Watchers

mediacapture-main's Issues

Reconsider the text on streaming from a file

This:

When a MediaStream object is being generated from a local file (as opposed to a live audio/video source), the user agent SHOULD stream the data from the file in real time, rather than sending the entire file immediately. The MediaStream object is also used in contexts outside getUserMedia, such as [WEBRTC10]. In both cases, ensuring a realtime stream reduces the ease with which pages can distinguish live video from pre-recorded video, which can help protect the user's privacy.

Is not especially actionable without more specification guidance about what it means to stream from a file. It is also unclear what the purpose of this paragraph is: to establish the potential for a file-based input device, or to describe a way to provide a fake device for the purposes of privacy. Either is grossly under-specified.

I'd recommend removing this paragraph entirely. Implementations that care to spoof gUM for the purposes of protecting privacy know how to do that and we have other specifications coming for producing content from "files" if that is the goal.

Note also that spoofing for privacy purposes needs to also generate appropriate device information to provide when devices are enumerated. That information will need to match up with the actual content that is chosen. That suggests that there could be a need to preempt the desire to spoof to ensure that it is even possible to do a believable job at it.

typo: "enabled" ->"disabled"

[Section 4.3.1] has the following - "but may be replaced by zero-information-content if the MediaStreamTrack is muted or enabled, see below".

"enabled" should be "disabled"

explanation of kind in getSupportedConstraints (DOMString kind);

The getSupportedConstraints (DOMString kind) takes a kind parameter but it looks like the only place you can figure out what that is used for is by reading the examples.

Test issue - ignore this

Ignore this

The iana reg draft points at wrong draft now

Premature optimization in enumerateDevices steps

I see no functional impact from this statement:

If this method has been called previously within this application session, let oldList be the list of MediaDeviceInfo objects that was produced at that call (resultList); otherwise, let oldList be an empty list.

Lets remove it and leave optimizations to implementations.

constraint properties should not be scoped to "source" only

[Section 14.1] In the “video” property table, the spec has the following for the "width" property, “The width or width range, in pixels, of the video source.”

Since constraints can be applied to track (not directly to the “source”), I’d suggest removing “of the video source”, so the definition becomes, "The width or width range in pixels."

There are two other places in the table, i.e. for "height" and "frameRate", which have the same issue.

All the 2119 language

A lot of the document uses the following form:

<em class="rfc2119" title="must">must</em>

But Respec does this for you. All of those can be replaced with:

MUST

Which is a whole lot less tedious.

Simplify model: Remove "detach source" from descriptions

Copied from list email, April 15:

In the current specification, we have two concepts related to sources and tracks:

A track can be stop()ed, in which case it is ended.
A track can be detached from its source.

The text says:

A) in terminology for "source", we have:

Sources are detached from a track when the track is ended for any reason.

B) Under "Life-cycle and Media Flow", we have:

A MediaStreamTrack can be detached from its source. It means that the track is no longer dependent on the source for media data. If no other MediaStreamTrack is using the same source, the source will be stopped. MediaStreamTrack attributes such as kind and label must not change values when the source is detached.

C) Under the "enabled" attribute of a track, we have:

On getting, the attribute must return the value to which it was last set. On setting, it must be set to the new value, regardless of whether the MediaStreamTrack object has been detached from its source or not.

Under the "stop" function for a track, we have:

Set track's readyState attribute to ended.
Detach track's source.

It seems to me that this is one concept more than we need.
Whether there is a relationship between a stopped track and its source or not is an implementation detail, and we shouldn't be constraining it in our API description.

So my suggestion:

In A, C and D, simply remove the text that refers to "Detach".

In B, instead say:

If all MediaStreamTracks that are using the same source are ended, the source will be stopped.

I think that simplifies the terminology, and doesn't change any observable property of the API.

Please use [Exposed]

That way it is much clearer what is exposed to Window and/or Worker.

getUserMedia() from local files?

The current spec mentions that MediaStream object can be generated from a local file, and use the "local file" example in multiple places. I’d suggest removing the “local file” scenario from the main spec. It is undefined otherwise.

The scenario can be addressed in a separate spec when indeed needed. It seems that the new spec effort for getting Media Stream from the MediaElement will be able to address most of the need.

Define "task source" on first use

The following statement without reference or context is extremely hard to parse:

If the stream's activity status changed due to a user request, the task source for this task is the user interaction task source. Otherwise the task source for this task is the networking task source.

MediaStreamTrackEvent's track member is not nullable

Either you need to make this nullable or you need to require the dictionary argument and not give that a default value of null.

"gain" vs "volume"

A couple editorial issues in Example 6, "gain" should be changed to "volume".

Consider specifying a minimum length for deviceId

In reviewing the Audio Output API, they are attempting to reserve a value for use in APIs that would accept a deviceId value. That is made fraught with the potential for collisions because the definition of deviceId doesn't constrain the value-space of the device identifier at all. A minimum length - or any other sort of constraint on the value-space - would allow the Audio Output API to choose a value that is guaranteed not to collide.

Need to specify action on constraints that give no value.

The action to be taken on constraints { height: {} } needs to be clear from the spec.

Spec currently says it MUST specify a value, but leaves open what happens if you try to call the API with a constraint that doesn't.

MediaStream: 'input", "output", and number of "consumers"

[Section 4.1] has the following, "A MediaStream object has an input and an output that represent the combined input and output of all the object's tracks. The output of the MediaStream controls how the object is rendered".

First, the “input” and “output” concepts are not defined explicitly. They are not part of the MediaStream object directly.

Secondly, the consumer should be the object that controls the rendering not the "output".

More important, the spec should be explicit if a MediaStream object can be assigned to more than one consumers simultaneously. The block diagrams in this section currently shows 2 "outputs".

Lack of timeout / cancellation leads to UI inconsistencies

(Long time listener, first time caller - please forgive any errors with regard to process or form.)

The current standard does not appear to include a requirement that implementations provide a means of either:

a) allowing application code to cancel a pending getUserMedia() request, thus removing the permissions UI element from the browser;

b) allow application code to set a timeout on the permission request for a getUserMedia() call, thus allowing the request to fail with a timeout error and the browser to withdraw the permission request element.

This leads to an inconsistent state where the application code has given up but with the Navigator UI still displaying the permissions request dialog. This can lead to user confusion.

I realize that to some degree this is a matter of implementation, but I think it would be worthwhile for the standard to include either a means of cancellation or a timeout.

Language does not seem tamper-free

E.g. if I overwrite MediaStream.prototype.addTrack does that affect MediaStream's constructor? I think both addTrack() and the constructor are meant to invoke a "private" algorithm. This happens throughout the specification.

DeviceID, kind and label needs to be aligned with capabilities

Initially reported in bugzilla by Harald Alvestrand 2014-10-02 14:36:10 UTC

The document needs to be reviewed such that we align deviceId, kind and label with the new definition of capabilities.

enumerateDevices should be getAll() to match other APIs

mediaDevices.getAll() is pretty clear and matches other APIs like the Cache API in SW.

MediaStreamError "error" should default to null

The following in section 7.1.1

dictionary MediaStreamErrorEventInit : EventInit {
    MediaStreamError? error;
};

should be changed to

dictionary MediaStreamErrorEventInit : EventInit {
    MediaStreamError? error = null;
};

Fitness distance must interpret plain values in advanced array as exact.

Right now, the advanced array in Example 3 does squat, unless it's re-written as:

var constraints =
{
  width: {min: 640, ideal: 1280},
  height: {min: 480, ideal: 720},
  advanced: [{width: {exact: 1920}, height: {exact: 1280 } },
             {aspectRatio: {exact: 1.3333333333} }]
};

Do we need MediaStreamError.constraintName?

If we remove “constraintName”, the MediaStreamError will be consistent with DOMError.

UA can potentially use the "message" property to deliver the constraintName or constraintNames to the apps and developers.

This is a compromise the EME spec has taken recently for a similar problem in that spec.

Normative text in "Terminology" section

From Philippes review:
"The paragraph:

"Other than the source identifier (defined in MediaDeviceInfo.deviceId), other bits of source identity are never directly available to the application until the user agent connects a source to a track. Once a source has been "released" to the application (either via a permissions UI, pre-configured allow-list, or some other release mechanism) the application will be able discover additional source-specific capabilities.“

goes beyond terminology and should come with more details in a further part, such as Section 9. Enumerating Local Media Devices"

Filed issue to keep track.

clearly state that "channel" is not defined in the spec

[Section 4.1] has the following, "A channel is the smallest unit considered in this API specification".

This should be replaced with something like this, "It is out of scope for this API specification to provide any control or any information of channels in the MediaStreamTrack".

Aren't not understood constraints ignored?

In section 10.2.1 it is said:
"If the constraint is not supported by the browser, jump to the step labeled Constraint Failure below."
I may be confused, but isn't the situation now that constraints the browser does not support/understand are just ignored?

w3c.github.io version lags behind master

Which greatly diminishes the value of that version. I can generate a PR that would have Travis do the work, but that would still need someone with push access to the repo to finalize the whole thing (@dontcallmedom).

Examples in the spec can use a consistent coding style

for example, the examples sometimes use width: (without double quotation marks) and then “width”: in other places (i.e. with the double quotation marks) in the dictionaries.

Both should work, but it might be nice for the spec to use a consistent coding style.

What's the fitness distance if no ideal is specified?

I assume 0 but I couldn't find any text on this.

A practical algorithm for constraint resolution needs to be described.

The approach of generating all possible combinations of settings and then culling the set is obviously not practical - the set of possible combinations is too large to be practical to enumerate. We should have one example in the document of a way to implement the algorithm that is more practical.

Improve description of EventHandler attributes

Statements of this form are almost completely devoid of information:

This event handler, of type active, MUST be supported by all objects implementing the MediaStream interface.

A simple rewrite makes a big difference:

This event handler, of type active, is fired when the MediaStream becomes active.

And this is throughout.

Chasing down what the conditions are for firing an event is surprisingly difficult without something like this.

As for what I recommend removing, I have no idea why you would have strict normative requirements around support for this and not other things.

html5 player playing remote video

Hi there,

I am not sure what this lib is exactly, but I assume it have something with HTML5 player.

My question is it possible to play remote video in html 5 player ?

MediaStream's active attribute

The link from the constructor to "active" indicates a MediaStream state of active/inactive. However, the prose doesn't reference to this as a state but rather as something that is a boolean. This is rather confusing.

Is [WEBRTC10] a normative reference?

GUM can be implemented without [WEBRTC10].

Should attach the media element to the MediaStream object

In http://w3c.github.io/mediacapture-main/#loading-and-playing-a-mediastream-in-a-media-element, it says 'When the MediaStream state moves from the active to the inactive state, the User Agent MUST raise an ended event on the media element and set its ended attribute to true.' But currently, there is no directly relationship between mediastream and media element. In mediastream, we can control active to inactive and trigger the 'inactive' event. But we can't raise an ended event on the media element. Should we add the 'attach to element algorithm' like mediasource does?
(http://w3c.github.io/media-source/#mediasource-attach )

MediaStreamTrack object can also be consumed directly

MediaStreamTrack object can be consumed directly, for example, by the ORTC RTCRtpSender object. With the object model being introduced into WebRTC 1.0, there will give us another example for this.

[Section 4.1] has the following, "Both MediaStream and MediaStreamTrack objects can be cloned. This allows for greater control since the separate instances can be manipulated and consumed individually".

We should add MediaStreamTrack explicitly, so this becomes, “…the separate instances of MediaStream and MediaStreamTrack can be…”

prohibit multiple getUserMedia() calls sharing the same capture device

The spec should explicitly prohibit multiple getUserMedia() calls sharing the same source/capture device. If multiple objects from the same source are needed, apps must clone the MediaSteeam or MediaStreamTrack instead .

Section 4.3 of the current draft has the following -
"Several MediaStreamTrack objects can represent the same media source, e.g., when the user chooses the same camera in the UI shown by two consecutive calls to getUserMedia()".

There is no algorithm defined explicitly on how to resolve conflict in constraints when sharing capture device with a consecutive getUserMedia() call.

Clarification in "Introduction" section

From Philippes review:
"It writes “local devices (video cameras, microphones, Web cams) “ and "local devices that can generate multimedia stream data.” and therefore it leads to think it deals only with input devices while it deals also with audio output devices.

The use cases includes "uses, such as real-time communication, recording, and surveillance” and I suggest to include music applications as well to make clear it includes the audio output."
Filed issue to keep track.

MediaDeviceInfo.label and groupId are unnecessarily nullable

The spec lists MediaDeviceInfo.label and groupId as nullable:

interface MediaDeviceInfo {
    readonly    attribute DOMString       deviceId;
    readonly    attribute MediaDeviceKind kind;
    readonly    attribute DOMString?      label;
    readonly    attribute DOMString?      groupId;
};

But the enumerateDevices algorithm never ever returns null for these (instead it returns empty string). Even the description of label says it returns empty string, so we should remove nullable here.

Cababilities need to clearly specify that they are constant over time

Initially reported in bugzilla by Harald Alvestrand 2014-10-16 13:07:45 UTC

From Jan-Ivar, in bug 25777:

While examples are nice, I would argue they're no replacement for specification.

Are capabilities accurate?

While it is possible to deduce from the applyConstraints algorithm [1] that capabilities are allowed to be a super-set of what the UA supports, I find no mention of this where Capabilities is defined [2].

I've read the definition a couple of times, and even though it uses the word "subset" four times in one paragraph, it still seems to equate what the capabilities read with what "the UA supports", which doesn't allow for capabilities to be either-or (e.g. super framerate OR super resolution).

I think it would help implementations if the spec stated that returned capabilities must a set or super-set of what the UA supports.

Are capabilities constant?

I find conflicting text on this in the spec:

"The UA may choose new settings for the Capabilities of the object at any time. When it does so it must attempt to satisfy the current Constraints, in the manner described in the algorithm above." [2]

vs.

"Source capabilities are effectively constant. Applications should be able to depend on a specific source having the same capabilities for any session." [3]

[1] http://w3c.github.io/mediacapture-main/getusermedia.html#dfn-applyconstraints
[2] http://w3c.github.io/mediacapture-main/getusermedia.html#capabilities
[3] http://w3c.github.io/mediacapture-main/getusermedia.html#terminology

WebIDL types needed in Constrainable application

Initially reported in bugzilla by Harald Alvestrand 2014-08-25 10:33:22 UTC

From Jan-Ivar, pointing out several issues with the Constrainable pattern as applied to MediaTrack:

The spec still leaves several types to be inferred by the implementer around constraints:

In [1]:

partial interface MediaDevices {
static Dictionary getSupportedConstraints (DOMString kind);

"Dictionary" is not defined. To implement this, we'd need something like:

dictionary SupportedMediaTrackConstraintSet {
boolean width;
boolean height;
boolean aspectRatio;
boolean frameRate;
boolean facingMode;
boolean volume;
boolean sampleRate;
boolean sampleSize;
boolean echoCancelation;
boolean sourceId;
boolean groupId;
// basically a boolean rehash of MediaTrackConstraintSet
};

static SupportedMediaTrackConstraintSet getSupportedConstraints (DOMString kind);

or simply:

static MediaTrackConstraintSet getSupportedConstraints (DOMString kind);

since UAs can return non-zero values in MediaTrackConstraintSet just fine!

In [2]:

interface MediaStreamTrack : EventTarget {
...
Capabilities getCapabilities ();
MediaTrackConstraints getConstraints ();
Settings getSettings ();

Capabilities and Settings are not defined in this use of the Constrainable pattern. To implement this, we'd need something like:

dictionary MediaTrackSettingSet {
long width;
long height;
double aspectRatio;
double frameRate;
VideoFacingMode facingMode;
double volume;
long sampleRate;
long sampleSize;
boolean echoCancelation;
DOMString sourceId;
DOMString groupId;
// basically a bare-value rehash of MediaTrackConstraintSet
};

interface MediaStreamTrack : EventTarget {
...
MediaTrackConstraintSet getCapabilities ();
MediaTrackConstraints getConstraints ();
MediaTrackSettingSet getSettings ();

or simply:

interface MediaStreamTrack : EventTarget {
...
MediaTrackConstraintSet getCapabilities ();
MediaTrackConstraints getConstraints ();
MediaTrackConstraintSet getSettings ();

since UAs can return bare values in MediaTrackConstraintSet just fine!

In the Constrainable Pattern in [3]:

Capabilities are dictionary containing one or more key-value pairs, where each key must be a constrainable property defined in the registry, and each value must be a subset of the set of values defined for that property in the registry. The exact syntax of the value expression depends on the type of the property but is of type ConstraintValues . The Capabilities dictionary specifies the subset of the constrainable properties and values from the registry that the UA supports.

Typo: "Capabilities are dictionary" -> "Capabilities is a dictionary". Importantly, all capabilities are in one dictionary.

The type "ConstraintValues" is not defined anywhere. Do we mean "members of ConstraintSet" here? If so, that makes Capabilities of type ConstraintSet, yet we're leaving that deduction to the reader. IMHO prose and examples are no substitute for definitions.

In the Constrainable Pattern in [4]:

A Setting is a dictionary containing one or more key-value pairs. It must contain each key returned in getCapabilities(). There must be a single value for each key and the value must a member of the set defined for that property by capabilities(). The Settings dictionary contains the actual values that the UA has chosen for the object's Capabilities. The exact syntax of the value depends on the type of the property.

Typo: "A Setting is a dictionary" -> "Settings is a dictionary". Importantly, all settings are in one dictionary.
Typo: "the value must a member" -> "the value must be a member"

We're leaving the reader to deduce that Settings is a bare-values-only subset of ConstraintSet. IMHO prose and examples are no substitute for definitions.

In the Constrainable Pattern in [5]:

typedef Dictionary ConstraintSet;

"Dictionary" is not defined, and even if it were, it wouldn't have the right members, so a typedef infers the wrong thing. We could say:

dictionary ConstraintSet { /* members */ };

In [6]:

dictionary MediaTrackConstraintSet {
ConstrainLong width;
ConstrainLong height;
ConstrainDouble aspectRatio;
ConstrainDouble frameRate;
ConstrainVideoFacingMode facingMode;
ConstrainDouble volume;
ConstrainLong sampleRate;
ConstrainLong sampleSize;
boolean echoCancelation;
ConstrainDOMString sourceId;
DOMString groupId;
};

groupId seems to have the wrong type. Why not ConstrainDOMString groupId ?

.: Jan-Ivar :.

[1] http://dev.w3.org/2011/webrtc/editor/archives/20140817/getusermedia.html#mediadevices-interface-extensions
[2] http://dev.w3.org/2011/webrtc/editor/archives/20140817/getusermedia.html#media-stream-track-interface-definition
[3] http://dev.w3.org/2011/webrtc/editor/archives/20140817/getusermedia.html#capabilities
[4] http://dev.w3.org/2011/webrtc/editor/archives/20140817/getusermedia.html#settings
[5] http://dev.w3.org/2011/webrtc/editor/archives/20140817/getusermedia.html#constraints
[6] http://dev.w3.org/2011/webrtc/editor/archives/20140817/getusermedia.html#dictionary-mediatrackconstraints-members

Need to be clear how future extensions add new constraints

Explanation of constraints in GUM call

Initially reported in bugzilla by Cullen Jennings 2014-05-18 13:47:59 UTC

step 4 of GMD is really hard to understand what is going on an why

The whole algorithm is hard to convince people it is correct because it is hard to understand what is going on. Prefer to see it rewritten in style of Constrainable section.

In gum constructor around step 8.5, it seems wrong as the advanced is not a set of key -value pairs but is an array of sets of k/v pairs

Step 10 in gum constructor is wacky - what is this for? need to be more specific than randomly fail if you feel like it

in step 11 gum constructor, should be clear that call MSUT not return things that are not in the finalSet

add hotlinks to the definition of "constraint" under applyConstraints

From comments in #122. Specifically, the text under SelectSettings() could use a link to introduce "constraint", since SelectSettings() is a likely jump target.

enumerateDevices' access model is unclear

From the spec:

If none of the local devices are attached to an active MediaStreamTrack, let filteredList be a copy of resultList, and all its elements, where the label member is the empty string.

Taken literally, this exposes the label to all concurrently open tabs during active gUM use in any tab regardless of site, which is not what we mean, because it further says:

The algorithm described above means that the access to media device information depends on whether or not permission has been granted to the page's origin to use media devices.

This narrows access down to a page's origin, but is fuzzy about the duration of access, wrt whether access is afforded outside active use (MediaStreamTracks are stopped), or when persistent permission has been (previously) granted.

These statements seem quite contradictory in their description about what matters for access, and neither matches gUM permissions which are afforded, not even per page, but, per instance and not per origin (unless permanent access is granted).

Questions:

Open same jsfiddle in two tabs, grant access in one and deny in the other. Who sees label?
Open different jsfiddles in two tabs, grant access in one and deny in the other. Who sees label?
Stop the MediaStream. Can you still see label?
Never start a MediaStream, but persistent permissions were granted on a previous visit. See label?
Never start a MediaStream, but persistent permissions were granted on a previous visit to another page on the same host. See label?

No way to represent a boolean Capability that can be turned on or off

The doc in #118 talks about "a microphone that can be opened with echo cancelation turned on or off".

What should getCapabilities() return for such a mic? The document seems to assume:

{ echoCancelation: { min: false, max: true } }

which I think is a great idea!

This requires a spec-change to ConstrainBooleanParameters.

Remove "Direct Assignment to Media Elements"

It conflicts with the definition given in the HTML Standard, which also allows for setting a Blob object and such. Given that it's integrated there, providing a pointer seems better.

should getSupportedContraints be static?

If getSupportedContraints is "static" as in the current draft, all the examples need to be fixed.

For example,
var supports = navigator.mediaDevices.getSupportedConstraints("video");

should be changed to
var supports = MediaDevices.getSupportedConstraints("video");

MediaStreamError should not be an interface

Errors in the platform are represented by JavaScript Error object subclasses.