Giter VIP home page Giter VIP logo

pinning-services-api-spec's Introduction

Pinning Service API Spec

This repository contains the specs for the vendor-agnostic pinning service API for the IPFS ecosystem

pinning-services-api-contex.png

About

A pinning service is a service that accepts CIDs from a user in order to host the data associated with them.

The rationale behind defining a generic pinning service API is to have a baseline functionality and interface that can be provided by pinning services, so that tools can be built on top of a common base of functionality.

In this presentation, IPFS creator Juan Benet discusses current and potential pinning use cases, and how a standardized IPFS pinning API can meet these envisioned needs.

The API spec in this repo is the first step towards that future.

Specification

This API is defined as an OpenAPI spec in YAML format:

Documentation

You can find human-readable API documentation generated from the YAML file here:

Code generation

https://openapi-generator.tech allows generation of API client libraries (SDK generation), server stubs, documentation and configuration automatically, given the OpenAPI spec at ipfs-pinning-service.yaml.

Give it a try before you resort to implementing things from scratch.

Adoption

Built-in support for pinning services exposing this API is coming to IPFS tooling:

Client libraries

Server implementations

CI/CD

Online services

Timeline

Contribute

Suggestions, contributions, and criticisms are welcome! However, please make sure to familiarize yourself deeply with IPFS, the models it adopts, and the principles it follows.

This repository falls under the IPFS Code of Conduct.

Spec lifecycle

We use the following label system to identify the state of aspects of this spec:

  • — A work-in-progress, possibly to describe an idea before actually committing to a full draft of the spec
  • — A draft that is ready to review, and should be implementable
  • — A spec that has been adopted (implemented) and can be used as a reference to learn how the system works
  • — We consider this spec to close to final; it might be improved, but the system it specifies should not fundamentally change
  • — This spec will not change
  • — This spec is no longer in use

pinning-services-api-spec's People

Contributors

2color avatar dependabot[bot] avatar ipfs-mgmt-read-write[bot] avatar jessicaschilling avatar lidel avatar olizilla avatar rektyfikowany avatar web-flow avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pinning-services-api-spec's Issues

Arrays of things

There are APIs that take and return arrays of things - Get all pins and Add an array of pins for example.

How big are we expecting these arrays to be? If unbounded, it would be better to accept/return ndjson to avoid having to a) hold these all in memory at once and b) wait for the request/response to be sent before we can start processing the message without needing streaming JSON parsers.

Pagination could be used to limit response sizes, but if you want to paginate requests you may need some way of holding state on the server if the pages relate to each other and you'd still need to buffer each request before you could start processing it in order to parse the JSON contained in the body.

If ndjson is used, some way of conveying error messages during a stream is required. The approach we agreed on for the IPFS HTTP API (though have yet to implement) was to wrap each JSON line in something like { result: ... } or { error: ... } so the caller could differentiate between an error and a valid request/response entity.

Multidevice use case

From what I understand currently API user token is going to be used to identify who requested a pin. However user may have multiple devices and sharing same token across those has few problems:

  1. Pins could be added removed from different devices and who wins is unclear:

    • Device A pins CID-A.
    • Device B pins CID-A.
    • Device A unpins CID-A.

    Does that mean CID-A should be removed or does that mean it should not because device B still holds a pin ?

  2. If access token is shared across multiple devices it would be impossible to audit which device added / removed pins or revoke access to that specific device.

For above reason I think it would be vice to move away from manual endpoint + token entry and instead perform device link / unlink flow similar to how e.g keybase does this. While under the hood that could still use tokens (although signing requests would be better option IMO) it could provide a better solution for above listed problems and provide better UX as described below:

  1. If each device pins / unpins with unique token / key associated with it then pinning service could keep pin as long as one authorized service still has active pin. While service could still choose to provide alternative policy, it would have enough information to implement the other policy.
  2. Since each device will have unique token / key it would be possible to audit all the API calls and identify which device those came from. Additionally revoking access from lost or compromised device would not require re-authorizing all other devices.
  3. Authorizing device could have much better UX that doesn't involve copy & pasting things in webui (at least). Device authorization could be performed e.g. via custom protocol handler that webui could react to.

Add Pin.created?

Knowing when the Pin object was created sounds useful and generic.
Filtering by creation date was raised in various conversations.

Q:

  • Should we add it as part of the mandatory spec?
    • Note: this does not impact WebUI integration, so not a blocker.
    • We won't include it, unless we hear feedback that it is needed
  • If we want to add it, what would be the mechanics of this field?
    • Proposal: Pin.created is optional during pin creation, if user does not pass a timestamp, Pinning Service will fill it with current time in UTC.

Providing content with a pin request

We were evaluating protocol/web3-dev-team#58 in the of protocol/web3-dev-team#62 and the subject of "where is pinning service going to get content from" came up. Assumption is that pinning service will fetch content from ipfs network raises some concerns:

  1. What if node that just added content is behind the NAT that pinning service can't punch through ?
  2. What if the service is used from the web browser, in which case it is highly unlikely pinning service will be able to deal it ?
  3. If all you want is to add some content to IPFS and get it pinned, spinning up a full IPFS node and waiting until content is fetched from it is an overkill.

I remember @lidel was telling me about de facto hack of encoding content in an identity hashed CID, which might overcome some of the above listed concerns but raises whole new ones:

  1. Given that it is not specified how reasonable it is to expect that it would be even supported by a pinning service ?
  2. Does that supposed to work with multi-blocks scenarios ?

    I thought that was a case, but more I think about it less sense it makes to me.

  3. Do we have a CID size limit / request payload size limit to consider ?

Either way, uploading content as identity hashed CID encoded in base64 string in JSON feels like a very impractical solution to meet specific requirements. It seems like we need to consider extending this specification to support this use case or it will not be practical for cases where just putting content on IPFS is desired.

It is also worth pointing out here that e.g. pinata has own API for such a use case https://pinata.cloud/documentation#PinFileToIPFS

/cc @alanshaw @mikeal @jnthnvctr

Better UX in GUI apps

Continued from #6, see also #7

Now that #14 is merged, and we got the basic minimum (support for Authorization: Bearer <key>) we could discuss ways to improve UX of adding services, namely avoid manual copying of authorization token from pinning service to GUI app interface.

To illustrate UX needs, adding pinning service to WebUI in IPFS Desktop app could look like this:

  1. User goes to Settings screen and clicks on "Add Pinning Service"
  2. List of predefined services is shown
    (there is also "Custom" option for manual config)
  3. User clicks on one of predefined providers
  4. WebUI opens authorization page at Pinning Service (PS) using well known API endpoint
    (passing any data that is required)
  5. PS provides interface for "granting pinning permissions to the app X"
    (page could also enable user to create a new pin space, or attach WebUI to existing one)
  6. Upon user approval via PS interface, WebUI is able to use configured pinning service
    (without copying anything manually)

Qs:

  • Any prior art comes to mind? We'd rather not reinvent wheel, if possible
  • Do we need to return anything back to WebUI in step 6,
    if we generate sufficiently random token on the client and pass it in step 4?
    (randomness could be augmented by drand)

Reusable client / server libraries for Pinning Service API

Let's discuss if it would be useful to have reusable libraries.

AFAIK go-ipfs and js-ipfs plan to implement clients for Pinning Service API.
It is very likely that some of pinning services will use golang or JS for the server side too.

Q:

  • Would it be useful to join forces and have vendor agnostic client/server libraries for GO and JS?

cc @ipfs/wg-pinning-services @obo20 @GregTheGreek @priom @jsign @sanderpick @andrewxhill @MichaelMure @aschmahmann @achingbrain @Gozala @jacobheun

Generate API docs

Needs:

  • API docs generated from ipfs-pinning-service.yaml
    • in master
    • PRs
  • Link to generated API docs added to README

Document need for /wss on delegates list

Given ipfs/js-ipfs#3588 we should ensure pinning services are useful in browser context.
While we could add an endpoint for direct DAG upload, we already have all necessary infrastructure for running bitswap session over websockets.

We should add some text to the spec indicating that /wss multiaddr on PinStatus.delegates list is highly recommended for public pinning services, as it maximizes utility beyond regular TCP/UDP and enables interaction with thin clients running on web pages.

Document self-hosting options

Needs

  • At least mention in README that points people in the right direction
    • could be article on docs.ipfs.io, or a link to docker image etc
  • is there any open source implementation for a self-hosted, AWS S3-backed ipfs pinning service that conforms to the Pinning service API?
    or backed by any high-scale centralized storage

Solutions

Options to investigate:

If anyone has time to comment on their experiences, or even document self-hosting setup end-to-end, that will be very useful.

Adjusting limits to match real-life URL length

@obo20 noted that in theory, v0.0.5 of the spec allows client to pass up to 1k of CIDs as a filter:

2020-08-18--18-58-28

In practice, each CID will be ~60 characters long, and the real life URL limit in web browsers is around 2000 characters, which means ~30 CIDs being the actual limit in the browser context.

Non-browsers may allow longer URLs, additional research is needed.

Feature proposal: ability to get bulk pin statuses by request IDs

Context

Right now, it is possible to check only one ID at a time via HTTP GET tot /pins/{requestid}

Proposed enhancement

@whyrusleeping suggested adding ability to query status for multiple requestids.

It sounds like a sensible enhancement, that potentially lowers the number or HTTP requests that need to be sent to the pinning service when tracking pinning status of multiple datasets at the same time.

To make it aligned with existing conventions, we could add it as a new filter, similar to cid one:

/pins?requestid=123123,43234,234237

cc @ipfs/wg-pinning-services @obo20 @hsanjuan thoughts?

Open questions

  • is this the best way to do this?
  • is this useful enough to add to the spec? (i think yes, but lmk)
  • what should be max number of ids allowed per request?
    • existing cid filter allows max 10 CIDS per request, I think this was suggested by Pinata and we may want to use the same limit for requestid
    • if we define no limit, this will fall back to max of limit parameter, which is implicit 1000

Pining API over libp2p ?

I think it is worth considering whether pinning API over libp2p would be a better option than HTTP based pinning API. Here are some of my thoughts on this subject:

Pinning API over libp2p

  1. 👍 This would essentially turn any IPFS node into pinning service (if they choose to provide it). Symmetry here is really compelling because I could authorize my phone to pin anything on my laptop.
  2. 👍 Transport agnostic.
  3. 👍 If I am asking to pin content from my laptop with libp2p I already have a open channel with the node so it could be leveraged to get that content from me.
  4. 👎 It's far more involved than a simple HTTP request, but then again if assumption is that this is used from IPFS node, this doesn't really change much.
  5. 👍 It would be trivial to subscribe to pin changes instead of having to poll for updates.

Apply feedback from Pinning Summit

A small iteration happened during workshop at Pinning Summit, May 7th, 2020.

We should update ipfs-pinning-service.yaml if needed, or document why it is out of scope of MVP.

type pin {
    cid: <cid>
    pin: <pin-type>
    meta: { // all of this is application specific
        created: <date>
        modified: <date>
        replication: ...
}
}


GET         /pins                     -- lists all pins
GET         /pins/?for-cid={cid}      -- get all pins that link to {cid}
POST         /pins                     -- add an array of pin objects
GET         /pins/{cid-of-pin-object} -- get pin object
DELETE     /pins/{cid-of-pin-object} -- deletes pin object
POST        /pins/{cid-of-pin-object} -- updates pin object

Questions:

  • Is there any additional metadata you think MUST be in the Pin object?
  • Is there any additional API/RPC endpoint for this simple Pin API that MUST be there?

Pinning/Unpinning Policies

Not related to the spec itself, but something to consider in terms of what functionality we need.

It's fairly easy to reason about the API when it's triggered by user actions like a CLI command or a UI operation. However, there's been requests for pinning "policies" such as "remotely pin MFS files" or "remotely pin all locally pinned files". In the case of these policies it's less clear what happens in an automated setting. For example, a user has two machines A and B:

  1. A pins cid C
  2. B unpins C
  3. How does the pinning service know if the pin or unpin happened first?

Perhaps if we could track the latest "state" of the pinset in clients then they would be able to update their local understanding of what's pinned on the service provider before allowing the clients to send an "unpin" command.

List pin objects endpoint not working properly?

Hi,

Does query parametrs really work with this endpoint? "cid=pinnedcid" is the only query parameter that is working at the moment, no luck with others.

Below endpoint only outputs something when the files still have the status "pinning", afterwards nothing...

curl -s -X GET -H 'Content-Type: application/json' 'https://gateway.example.com:9097/pins' | jq
{
  "count": 0,
  "results": null
}

Request
This query parameter works, nothing else.

curl -s -X GET -H 'Content-Type: application/json' 'https://gateway.example.com:9097/pins?cid=QmZA9idEBomqsYBvA9Z...' | jq

Response

{
  "count": 1,
  "results": [
    {
      "requestid": "QmZA9idEBomqsYBvA9Z...",
      "status": "pinned",
      "created": "2022-06-28T10:33:01Z",
      "pin": {
        "cid": "QmZA9idEBomqsYBvA9Z...",
        "name": "PinnedCID",
        "origins": [],
        "meta": null
      },
      "delegates": [
        ...
      ],
      "info": {
        "source": "IPFS cluster API",
        "warning1": "CID used for requestID. Conflicts possible",
        "warning2": "experimental"
      }
    }
  ]
}

Pin.name

Something we realized early on is that many users wanted a custom "name" associated with their CIDs. While we could have accomplished this by reading through the metadata that users passed to us (and this is something we initially did), users were finding themselves confused by how this behavior worked. Once we switched name to being its own custom parameter that was separate from keyvalues we started seeing users pass in this attribute with nearly every request.

Is there interest in possibly adding a name parameter that users can pass in with their CIDs?

API should be idempotent

Currently there is no specified way to make POST /pins nor POST /pins/{requestId} idempotent, so if there's some transient issue e.g. reading the pinning service response and the client needs to retry, it may create two potentially-expensive pinning requests when only one was desired.

For POST /pins, an optional requestId field could be added which is used as an idempotency token. For POST /pins/{requestId}, it is explicitly not idempotent for the requestId, so without redesigning the API, it'd probably need a separate field for an idempotency token with some token expiration period, or use an implicit idempotency key like <requestId, CID, name, metadata>.

Finalizing MVP API Spec for IPFS WebUI integration

About

This issue tracks overall finalization status of this spec from the perspective of being ready for stakeholders to start implementation of basic functionality.

cc @jacobheun @pooja @jessicaschilling

Stakeholders

  • @jbenet as BDFL
  • IPFS Core Impl. WG implementing API client in go-ipfs and js-ipfs
  • IPFS GUI Team implementing UI in WebUI / IPFS Desktop app
  • Pinning Services implementing the API

Remaining Issues

While mostly ready, we need to resolve these issues before API can be fully implemented:

Stakeholder Sign-offs

  • @jbenet 👉 review scheduled
  • IPFS Cluster / ipfs/notes#378 (@lanzafame)
  • IPFS Core Impl. WG
  • IPFS GUI Team implementing UI in WebUI / IPFS Desktop app
  • IPFS Pinning Services
    • PS list TBD

Networking difficulties while pinning data

Problem:

The current API has the client inform the pinning service of the CID of the data to pin. While this may be convenient if the data is already in the network, it has downsides if the client is the only one with the data including:

  1. If the client node is unreachable (e.g. behind a symmetric NAT) and they're hoping to use a pinning service to make their content publicly accessible then they're not going to be able to get their data to the pinning service since the pinning service will not be able to reach them
  2. Even if the client is reachable the pinning service still needs to wait for a DHT provide to complete before they can start retrieving the data. This may not be a huge problem, but it is definitely annoying.
    • As a bonus problem if the client node dies in the middle of uploading a large amount of data then the pinning service will have to wait for a large number of CIDs to be provided, not just the CID of the pin object root

Recommended Solution

Take our existing HTTP API and expose it over libp2p instead of over TCP. This will ensure that we are connected to the peer that is supposed to be fetching data from us.

Comparing with Other Solutions

  1. Instead of just sending the pin object CID send the entire pin object
    • Pros:
      • Pretty easy to implement
    • Cons:
      • Does not allow for reduced bandwidth usage in the event the some part of the DAG is already stored by the pinning service
      • Does not allow for resuming cancelled uploads (very possible during the upload of large data)
  2. Have the client nodes peer with some upload nodes from the pinning service before they send the query so that they will get pinged by Bitswap and not be dependent on a DHT lookup
    • Pros:
      • Requires minimal additional code in go-ipfs (js-ipfs doesn't having peering implemented yet)
    • Cons:
      • Requires adding both an HTTP endpoint and a libp2p upload endpoint
        • libp2p upload endpoints cannot AFAIK make use of CA certificates which means needing to have a consistent set of peerIDs that are used by the upload endpoints, or relying on DNSLink which isn't signed
      • AFAIK we can't really load balance (inbound) pinning requests (aside from having multiple target nodes and just choosing one of them)
      • Some brittleness/complexity related to when connections break (as they sometimes do)
        • what happens if the peering connection is temporarily broken when the HTTP request goes out?
        • what happens if the peering connection breaks during the upload?
        • for these cases when the connection is re-established will they still be in the session, when/how will they be re-added?
  3. Use the proposed HTTP API, but do so over libp2p
    • i.e. instead of sending a standard HTTPS request to pinning.service form a libp2p connection to /pinning/service and send the HTTP requests over that connection
    • Pros:
      • We seem to have libraries for doing this already in go that are actually pretty small (https://github.com/libp2p/go-libp2p-http which relies on https://github.com/libp2p/go-libp2p-gostream)
      • Makes it simpler for us to switch to a custom libp2p protocol in the future since we can just figure out which protocols it speaks (e.g. custom, or just http)
      • Only needs an libp2p endpoint, not also a standard HTTP endpoint
      • Gives us client side auth for free, if we want to use it, since we can just check the peerID on the client side of the connection
    • Cons:
      • Adds another library dependency to the protocol (may not be available in all languages)
      • Similar britleness/complexity related to when connections break
        • A little less since it's guaranteed that the libp2p connection exists at the time the HTTP Request is issued
      • libp2p endpoint CA issues as in 2
      • No loadbalancing on Puts (as in 2) or Gets

Any of these solutions seem viable, and I'm interested if there are any other proposals out there that I've missed. However, I'm pretty sure we need to do at least one of these things or we're going to have really serious problems with users failing to upload data to pinning services.

It seems like people are not a fan of option 1, which leaves us with 2 and 3. I'm not sure if they're really that different from each other, although I'm currently leading towards option 3 as it's much less hacky and gives us some other nice benefits.

Thoughts?

Suggestion for error code to be switch to a string

I wanted to check and see if people would be open to switching the error code returned as part of the spec to a string instead of an integer. So things would like like this:

{
    code: 'ITEM_NOT_PINNED',  (short error code)
    message: `Current user does not have this item pinned` (more detailed message)
}

From my experience, strings are a little easier to manage in the codebase and can be easily understood better by the consumer.

I'm completely fine implementing the spec as is, I just wanted to throw this up for discussion to see if people cared one way or the other.

Consider GraphQL over REST API

Simplicity of REST API is great until:

  1. Responses are large enough that getting only subsets becomes important.
  2. Round-trips start to matter and some batching strategy is necessary.

Both lead to custom & non-composable solutions. For these reasons I would like to propose to consider https://graphql.org/ based API because:

  1. You can query exactly what application needs / expects.
  2. All operations are composeable and can be bundled into single request as necessary.
  3. There is great tooling available for it.

Both of the `POST` endpoint docs are outdated

Both of the POST endpoints are outdated. Fixes needed:

Add endpoint:

  • the return object is incorrect. I'm assuming this should be the same return object as the modify endpoint?

Modify endpoint:

  • This should only accept a single CID and not an array of CIDs

API v2 proposal: remove PinResults.count

Prompted by ipfs/ipfs-webui#1900 – this is BREAKING CHANGE for v1 API, so marking this as something we can discuss if we ever want to create v2.

PinResults.count is used for two things:

(1) Reading pin count per status (used in ipfs pin remote service ls --stat):

GET /pins?limit=1&status=queued
GET /pins?limit=1&status=pinning
GET /pins?limit=1&status=failed
GET /pins?limit=1&status=pinned

(2) pagination (where it acts as an equivalent of more: true:

If the value in PinResults.count is bigger than the length of PinResults.results, the client can infer there are more results that can be queried.

Problem

  • Calculating PinResults.count is expensive, especially when running expensive filters like cid, name or status
    • ipfs-desktop/webui is checking CID status via GET /pins?cid=Qm1.Qm2...Qm10 which

Proposed change

  • remove PinResults.count
  • add PinResults.more (true/false bool)
  • add /service/pins/healthcheck that allows inexpensive health check
  • add /service/pins/stats that returns total pin counts for each status
  • switch ecosystem to new fields

This would be a breaking change that requires v2.x.x of this spec and coordinated effort across the stack:
go-ipfs, js-ipfs, ipfs-webui, ipfs-desktop and Pinata.

Is it really breaking? Needs analysis.

I've been thinking a bit about a way to do this in backward compatible way and we could have PinResults.count as an optional field, and always return 1 so the pagination and health checks in old clients still works.

👉 Before we consider doing this, the main question is how old client in go-ipfs will react to an unexpected PinResults.more – if it is ignored and nothing crashes, then we have a better path forward.

If not, we would break our users, and need to coordinate go-ipfs/ipfs-webui/ipfs-desktop and Pinata to make the change around the same time to limit the damage. TBD is this is acceptable, considering the performanc benfits in the long run.

cc @obo20 @aschmahmann

Potential problem with `GET /pins` statuses

Right now I'm seeing the current list of potential statuses as resolving, retrieving, pinned, failed, expired, unpinning

Some potential problems for us here:

Active pinning operations
Right now our database models are organized in such a way that we have two separate tables:

  • User IPFS Pins - These are items that have been successfully pinned to Pinata (either by direct file upload or our pinByHash endpoint, which searches the IPFS network for content). These items can also have a "date_unpinned" date attribute if the user decided to unpin the item.
  • Pin Jobs - This is a list of jobs created by our pinByHash endpoint. Essentially this is a list of active / or failed pinning operations that users have asked our nodes to pin.

I'm not sure we'd be able to accommodate this endpoint in its current iteration without doing a lot of fancy postgres maneuvering that would likely impact performance pretty heavily. Is there a reason why we're grouping succeeded pins in the same queryable list as "in progress / potentially failed" pins?

Pin Expiration
We currently have no concept of "expired" pins in Pinata. We just bill our users at the end of the month based on how much data they stored and for how long. Could we potentially add an "unpinned" status to this list as well?

Unpinning
We also don't have a concept of unpinning. As soon as the request is made to unpin data, we unpin it within that same API call.

Disambiguating *.providers fields

Currently the GET/{CID} endpoint returns an object that looks like this:
image

The response returns a pin object that has values(providers and meta) that are duplicated in the root return object as well.

Is this intentional or an accident?

Defining API access controls

Known constraints

  • Pinning Service API client integration in go/js-ipfs should be vendor-agnostic
    (the same struct should be used for storing API location and credentials across all pinning services, the same authorization flow should work no matter what is on the other end)
  • language used in WebUI and similar GUI apps should be easy to understand
    (eg. one input for "API endpoint", second for "Secret API key", or a single button "authorize this node" that opens authorization flow)

Q: how specific should this spec be?

v0.0.1 explicitly mentions JWT, but some vendors may prefer to use something else (eg. a simple "SECRET API KEY" generated per "bucket" etc).
Clients and user interfaces do not care about what is inside an opaque "api token" – they will simply store it in config during initial configuration and then send it as-is with each request.

For those reasons we may simplify the spec to require an opaque string (in GUIs labeled as "API KEY") passed in HTTP header, leaving its details up to pinning services, especially if we may remove the need for copying api key / token and decide to do authorization at well-known URL at Pinning Service.

Q: How should authorization flow look like?

  • Off-band: pinning service generates "token"/"api key" value which is then entered into relevant UIs by the user.
    It is up to Pinning Service to implement UI for creation and management of those tokens.
    All requests to the Pinning Service will include this static token.

or

  • Seamless: Pinning service provides authorization endpoint that can be opened to add specific PeerID to allowlist.
    Then, use private libp2p-key for signing requests to the Pinning Service.

Q: How should authorization token be passed with requests?

Which way of passing authorization credentials makes sense?

Looking for feedback.

Document behavior for indirect pins

Context: Ability to tell CID is pinned indirectly is somnething that we need for Pinning Service integration within IPFS Desktop and WebUI (ipfs/ipfs-gui#91).

It is possible to check pin status of a specific CID:

GET /pins?cid=Qmfoo

Returned PinStatus object includes pin object for recursive pin and a status:

Status:
description: status a pin object can have at a pinning service
type: string
enum:
- pinning # pinning in progress, optional details can be returned in meta[pinning_status]
- pinned # pinned successfully
- failed # pining service was unable to finish pinning operation, optional details can be found in meta[fail_reason]
- unpinned # data is no longer pinned

Document behavior for indirect pins

Current spec does not specify what should be the response for a CID that is indirectly pinned (not pinned itself, but a member of a DAG that is recursively pinned).

Proposal

  • I don't think we should add indirect status.
  • Instead, asking for indirectly pinned CID should return Pin object that is responsible for keeping it around

Thoughts, concerns?

JS server generation problems

My attempts to generate a server from the current API spec seems to produce invalid results. Below are the steps:

openapi-generator generate -i https://raw.githubusercontent.com/ipfs/pinning-services-api-spec/master/ipfs-pinning-service.yaml -g nodejs-express-server

npm start

 node index.js

{"message":"Express server running","level":"info","service":"user-service","timestamp":"2020-11-19T23:26:26.420Z"}
info: Express server running {"service":"user-service","timestamp":"2020-11-19T23:26:26.420Z"}
openapi.validator: Validating schema
openapi.validator: validation errors [
  {
    "keyword": "required",
    "dataPath": ".paths['/pins']['get'].parameters[7]",
    "schemaPath": "#/definitions/SchemaXORContent/oneOf/0/required",
    "params": {
      "missingProperty": "schema"
    },
    "message": "should have required property 'schema'"
  },
  {
    "keyword": "not",
    "dataPath": ".paths['/pins']['get'].parameters[7]",
    "schemaPath": "#/definitions/SchemaXORContent/oneOf/1/allOf/0/not",
    "params": {},
    "message": "should NOT be valid"
  },
  {
    "keyword": "not",
    "dataPath": ".paths['/pins']['get'].parameters[7]",
    "schemaPath": "#/definitions/SchemaXORContent/oneOf/1/allOf/1/not",
    "params": {},
    "message": "should NOT be valid"
  },
  {
    "keyword": "oneOf",
    "dataPath": ".paths['/pins']['get'].parameters[7]",
    "schemaPath": "#/definitions/SchemaXORContent/oneOf",
    "params": {
      "passingSchemas": null
    },
    "message": "should match exactly one schema in oneOf"
  },
  {
    "keyword": "required",
    "dataPath": ".paths['/pins']['get'].parameters[7]",
    "schemaPath": "#/definitions/Reference/required",
    "params": {
      "missingProperty": "$ref"
    },
    "message": "should have required property '$ref'"
  },
  {
    "keyword": "oneOf",
    "dataPath": ".paths['/pins']['get'].parameters[7]",
    "schemaPath": "#/properties/parameters/items/oneOf",
    "params": {
      "passingSchemas": null
    },
    "message": "should match exactly one schema in oneOf"
  },
  {
    "keyword": "required",
    "dataPath": ".components.parameters['meta']",
    "schemaPath": "#/definitions/Reference/required",
    "params": {
      "missingProperty": "$ref"
    },
    "message": "should have required property '$ref'"
  },
  {
    "keyword": "required",
    "dataPath": ".components.parameters['meta']",
    "schemaPath": "#/definitions/SchemaXORContent/oneOf/0/required",
    "params": {
      "missingProperty": "schema"
    },
    "message": "should have required property 'schema'"
  },
  {
    "keyword": "not",
    "dataPath": ".components.parameters['meta']",
    "schemaPath": "#/definitions/SchemaXORContent/oneOf/1/allOf/0/not",
    "params": {},
    "message": "should NOT be valid"
  },
  {
    "keyword": "not",
    "dataPath": ".components.parameters['meta']",
    "schemaPath": "#/definitions/SchemaXORContent/oneOf/1/allOf/1/not",
    "params": {},
    "message": "should NOT be valid"
  },
  {
    "keyword": "oneOf",
    "dataPath": ".components.parameters['meta']",
    "schemaPath": "#/definitions/SchemaXORContent/oneOf",
    "params": {
      "passingSchemas": null
    },
    "message": "should match exactly one schema in oneOf"
  },
  {
    "keyword": "oneOf",
    "dataPath": ".components.parameters['meta']",
    "schemaPath": "#/properties/parameters/patternProperties/%5E%5Ba-zA-Z0-9%5C.%5C-_%5D%2B%24/oneOf",
    "params": {
      "passingSchemas": null
    },
    "message": "should match exactly one schema in oneOf"
  }
]
Error: openapi.validator: args.apiDoc was invalid.  See the output.
    at OpenAPIFramework.initialize (/Users/gozala/Projects/js-mock-pinning-service/node_modules/express-openapi-validator/dist/framework/index.js:32:23)
    at processTicksAndRejections (internal/process/task_queues.js:97:5)
    at async OpenApiSpecLoader.discoverRoutes (/Users/gozala/Projects/js-mock-pinning-service/node_modules/express-openapi-validator/dist/framework/openapi.spec.loader.js:47:39)
Listening on port 3000

I'm bit lost with all the stuff it generated right now, but I'll post updates as I make progress here.

Authorization safety

I noticed that we're allowing the user to pass their authentication tokens via query parameters to the API instead of using a bearer token. Is there a specific reason for this?

There's a few security issues with passing credentials via query parameters in certain environments. A few notable ones:

  • Query parameters get saved in browser history and often in server logs

  • Browser extensions are often given access by users to query parameters from any site. While headers and cookies are only available to domains that the user permits.

I'm not sure the full context in which this API is going to be utilized but this was something I at least wanted to point out.

Document UCAN as one of Authorization options

👉 This is a good first issue if someone wants to open a PR – all you need it to update docs here.


They are JSON Web Tokens JWTs containing Decentralized Identity Documents secured by public key cryptography.

In practice, [pinning service] users can create their own keypair and register the DID with the [pinning service] UCAN service to get a UCAN token. The [pinning service] user is then free to create user UCAN tokens derived from their registered UCAN.

[..] these derived tokens can be used to limit end-users to upload either any data or data with a specific CID within a scoped time period. When a token is used, [pinning service] can validate it by looking at the chain of proofs used to derive a token, checking the cryptographic identity of each signer of the token.

Use of UCAN does not require any API changes, already existing Authorization Bearer HTTP header can be used for UCAN. We should document this in Authentication section at https://ipfs.github.io/pinning-services-api-spec/#section/Authentication

Reference / prior art:

Provide documented examples of queries / the patterns for them

We need to let implementers of this spec know how to format their queries, specifically when dealing with things like arrays.

As threads like this show: https://stackoverflow.com/questions/11944410/passing-array-in-get-for-a-rest-call

There are many many ways you can do arrays for rest queries. @lidel has told me we've opted to go with the status=pinning,pinned formatting which I think makes sense. However we should definitely document this outside of the YAML for those who are reading the spec from their browser.

Mandatory provider hints

  • PinStatus.meta[receivers] = ['multiaddr1','multiaddr2'] list of peers to connect to to speed up transfer of pinned data
  • Pin.meta[providers] = ['multiaddr1','multiaddr2'] list of peers that are known to have pinned data (aka "original seeds")

In #19 (comment) @aschmahmann wrote

I'd really like to emphasize to current pinning services that this is really useful and they should implement it if it's not a huge ask. If many of them are unable to implement it in a reasonable time frame then we should be aware of that when dealing with user issues.

I agree this turns out to be a pretty important feature.
If we leave it in meta it may be harder for services to implement it.

@jacobheun @aschmahmann @achingbrain

  • Should we promote this to a required (could be an empty array) top level field(s)?
    • If so, would it be ok to rename it to providers in both places for simplicity?
      (Pin.providers and PinStatus.providers)

Spec Compliance Test App

Summary

  • We need a conformance test suite that implementers and service providers can run against their service to confirm it works as expected.
  • ideally this would be CLI tool that takes <access-token> and the URL with API <endpoint> and runs a set of tests
    • tests should be designed to run idempotently and reduce friction (e.g. remove all pins as the first step and confirm there are no pins as the second, and remove everything as the last step)

I'm available to review / feed edge cases is anyone wants to pick this up.

Implementation details

Create JS client library

CLI Compliance Test suite

The compliance test would be a separate package (@ipfs-shipyard/pinning-service-complicance-checks) that uses the client library to run tests and exits with code 0 if there were no hard errors:

$ npx ipfs-pinning-service-complicance-checks https://service.example.com secret-token
Checking compliance of Pinning Service API at  https://service.example.com:
  (output)
Done!

Web interface (nice to have)

Would be nice to have a static website with two inputs for <access-token> and <endpoint> and "Test" button, but this is lower priority than CLI tool (we want something that can run automatically on CI to constantly validate services we list in ipfs-webui – ipfs/ipfs-webui#1854 (comment))

Test scenarios

Below are things we want to test, in order:

MVP list

Response codes

One small detail I noticed in the spec is that for creating, modifying and deleting pins a successful response has a 202 code (accepted), which is described as:

The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place.

I guess it's assumed that the pinning/deleting will happen outside of the request in a queue?

In my initial implementation there is no queue (responses can take a few seconds) so 202 feels wrong here, there's no need for the client to come back and check because the status will be pinned or failed in the response itself, so it makes sense for it to be a 200.

Similarly with deleting, the delete happens during the request so a 204 seems like the right response code here as there's no response body expected.

I wonder if there's some wiggle room in the spec for different response status codes depending on implementation, clients could definitely use the hint of 202 if they are going to need to come back and check on the status, although a status of queued or pinning also confirms that.

Related but possibly not helpful: looking at the status codes for the IPFS HTTP API, IPFS itself uses 200 when processing:

200 - The request was processed or is being processed (streaming)

How to create an AccessToken?

Hi, I'm trying to host an IPFS node with a custom pinning service, which allow specific people PIN their files on my node remotely.

The Tutorial said that An opaque token is required to be sent with each request in the HTTP header: and it should be generated per device, and the user should have the ability to revoke each token separately.

However, I can't find where or how to generate and manage these access token?

Is there a plugin I need to install? Not find via Tutorial and Google.

If you know, please give me a hand.
Thank you!

Ability to check status of multiple CIDs at once

Web clients such as ipfs-webui should be able to check status of multiple CIDs via a single request, but v0.0.1 does not support that.

Problem

Apps running in browser contexts have hard limits on how many requests can be executed in specific time window, and that may get even more strict in the future. This means loading status of multiple items may be artificially slowed down by user agent, leading to slugginsh/buggy user interface.

cc @jessicaschilling @rafaelramalho19 from IPFS GUI Team

Proposal: IPNS pinning

At the moment, I don't think IPNS is supported by some pinning services, and I'm not sure if everyone is in sync on what the behavior would actual look like.

It'd be nice if we could standardize how IPNS should be handled by pinning services (including DNSLink functionality, how updates work).

golang client fails to deserialize metadata objects

Upstream bug: ipfs/boxo#384

TLDR: if metadata has value other than string, go-pinning-service-http-client fails to deserialize, for example Estuary (@whyrusleeping) wanted to return pin progress like this:

"info": {
  "obj_fetched": 1234,
  "size_fetched": 571232,
}

go-ipfs does not expose them to end users on the CLI/API, and I don't think existing pinning services are exposing metadata fields in GUI in production yet (cc @obo20), but filling this issue anyway for discoverability.

TLDR Workaround

If anyone thinks about utilizing metadata fields PinStatus.info and Pin.meta, use only string values for now.

Document caveats around key-value pinset stores

Extracted from ipfs-shipyard/pinning-service-compliance#118 (comment):

Also, is it impossible for IPFS cluster to support pagination/creation-date sorting; or is it something that hasn't been implemented yet? Is there a tracking issue for this?

It is impractical. Cluster does not have a relational-database backend for storing the pins, but just a key value store. Keys don't have sorted IDs, listing keys out from this store can result in random orders. Thus some features like pagination cannot be done without reading everything to memory, sorting, etc. which is a footgun for big pinsets. I think it is ok if cluster does not support pagination. It tries to do its best and it's quite ok that it supports everything else.

I'd like to at the very least update the Pagination and filtering section to loosen up requirements and provide some rules of thumb for service implementations backed by key-value stores.

@hsanjuan @SgtPooki
What is the current behavior of ipfs-cluster around GET /pins, filtering and pagination?
What would be the best compromise we should document?

Some ideas how to handle "sorting and filtering becomes too expensive" scenarios:

  • (a) pagination and filtering does not work at all and GET /pins always returns 405 Method Not Allowed
    • simple, if someone needs this, they would use implementation backed by a database with indexes
  • (b) pagination and filtering works for small pinsets, but starts returning 405 Method Not Allowed` above certain number of pins
    • response includes error informing user that sorting is too expensive, and they need to reduce number of pins, or track them on their own
  • (c) no pagination, no before and after filters (they produce 405 Method Not Allowed), GET /pins returns pins in random order

Are there better ways?

Timestamp for "pin creation" event

Why do we need this?

Pin "creation" timestamp often comes up while discussing pagination and listing filters the order in which returned pins should be returned. @obo20 noted that without having means of sorting in the spec, different pinning services may create contradictory behaviors.

I am leaning towards opinion we should add it as mandatory field to PinStatus object, and enable filtering based on "creation" date.

It will enable:

  • api clients to fetch only recent pins
  • pinning services to optimize backend (eg. by creating time series / indexes)
  • automatic sorting of returned results and deterministic pagination behavior across vendors
    (details TBD, depend on how we resolve this issue first)

What do we mean by "pin creation"?

When we think about pin lifecycle, there are two "creation" events:

  1. Pin operation is queued, waiting to be processed
  2. Pinning is finished, data is pinned

This introduces question:

Q: should we add one or two fields?

Any ambiguity would go away if we add both (PinStatus.queued and PinStatus.pinned),
however I am unsure how useful queued is after reaching pinned status. Storing unused value feels like a waste.

If we add a single field (PinStatus.updated) should it be immutable moment of pin request entering the service (timestamp of queued event), or should it be updated when reaching pinned status?

Thoughts? Is there a better way of representing this? (mind we want to keep API minimalist)

cc @obo20 @GregTheGreek @priom @jsign @sanderpick @andrewxhill

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.