Giter VIP home page Giter VIP logo

integrity-backend's People

Contributors

anaulin avatar benhylau avatar dependabot[bot] avatar makew0rld avatar yurkowashere avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

integrity-backend's Issues

Update proofmode meta-content parsing

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Correctly parse meta-content

In September I updated the proofmode meta-content schema: starlinglab/integrity-schema#7

Apparently this change was never reflected in the backend parsing, and so it needs be updated now too. I'll probably bundle this in with a PR for #120.

This old incorrect parsing only happens in the code for making C2PA claims for proofmode JPEGs, that's the c2pa-proofmode action.

Implement CIDv1 generation in FileUtil

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Generating CIDv1 hashes of files.

Research

  • Dietrich has confirmed that you can't do pure Python generation of CIDv1, at least not with re-implementing a lot of IPFS stuff
  • There is ipfs-only-hash in NodeJS, but it's broken for CIDv1
  • Using ipfs add --only-hash --cid-version=1 -Q file.txt will output a file hash
  • No daemon needs to be running
  • One downside is that you need to run ipfs init one time before doing this
  • If that downside is too much I could write a Go program that uses go-ipfs and does the same thing as ipfs add but without checking for an IPFS repo first

Ubuntu failed install

missing devenv in ubuntu
pipenv install devenv

and python 3.10 in Pipfile

default config has c2pa-update which is invalid

Fix the meta dump to not alter the text

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Fix JSON dump format so we can verify sigs properly in #89.

Write the raw JSON message here so it preserves a copy that can be verified.

To Do

  • ...

Multi-org support of same instance

Task Summary

πŸ“… Due date: Target Feb 16 ideally, otherwise Feb 23 (Phase 1)
🎯 Success criteria: Add multi-org support and deploy on second instance (aka. BCN instance).

Right now, each instance supports only one org because there is "org id" type of indicator in the API (or JWT) and the directory structure essentially assumes one org user per API. As a result, we have to deploy a Droplet per fellowship.

We can go two paths:

  1. Dockerization
  2. Insert an "org id" indicator in the HTTP message (probably JWT)

Here is a proposal for (2), which I think is the easier path:

In the JWT add an organization, which can be hyphacoop for example. Then on the server, we need to change the following...

1. INTERNAL_ASSET_STORE directory

Currently we have [ assets, claims, create, tmp ] inside internal. Put these into an organization folder.

If the organization key is missing, we should fail the request. I considered a "default" for legacy JWTs but perhaps it's better that we phase out old JWTs.

2. SHARED_FILE_SYSTEM directory

Currently we have folders that are synced to a client FTP. We need to potentially sync each "org id" to a different FTP.

Can we find a tool that bi-directionally syncs each organization subdirectory over a separate FTP, each with its own credentials? Currently we are running a separate process, which is non-ideal.

3. Searches

Ensure that when searching for assets, each org searches only within its internal organization folder, under INTERNAL_ASSET_STORE/organization/*.

4. Org-specific hardcodes -> config file

These hardcodes need to move to a JSON config file. FTP credentials should probably go in same file, and the "FTP sync tool" needs to somehow read from this. The schema for this config file needs to be defined.

To Do

  • Decide if dockerization or "pathing" is the way to go (edit: we're "pathing")
  • Implement the code
  • Add more storage on Droplet
  • Deploy to "BCN instance" and perform data migration alongside #52 (both needs migrate, let's migrate once)
  • Mint new JWTs for BCN @benhylau
  • Mint JWTs for 2nd org on the server (those phones will be with @lee94josh and @sophiamjones)

File corruption on store action

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Fix bug on store action where photo gets corrupt.

Have a look at these files in store-output I passed thru SCMP sftp. It happens to all the photos now, it seems. The files are from my HTC Exodus s1.

3abeabe056d53064e8aac4600f3311651d32c19abfb3d5e5017b9e7446d7e072

See all files in this test: oscilloscope.zip

Pass each of those thru https://verify.contentauthenticity.org/inspect and you'll see how they are corrupt. Could be some race thing between the two inner actions in https://github.com/starlinglab/starling-capture-api/blob/main/starlingcaptureapi/actions.py#L99-L109

To Do

  • ...

Validation architecture

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Detach verification from any specific repo and abstract different kinds of verification as much as possible

Overview

Validation methods

Delivery methods

  • HTTP with JWT
  • Dropbox
  • Signal
  • Trusted manual input
  • SD card

To Do

  • Rename verification to validation
  • Create separate module for all verification methods
    • Move #102 there
    • Class for each verifier with name(), validate()
    • Also validated_sigs_json() or something that returns C2PA-style JSON that can be put in the metadata
      • with pretty mode that truncates fields to 100 chars
  • Move HTTP input in integrity-backend into a separate repo - it's now a preprocessor
    • Take JWT, parse content, put everything into metadata, create zip, put into input
    • Standardize JWT format
  • Integrate verifier module into all the preprocessors
    • proofmode signal
    • Server WACZ
    • Starling Capture over HTTP
  • Clean up integrity-backend - #108
    • Remove http server (handlers, multipart, readme, Pipfile, main.py, etc)
    • Comment out c2pa-starling-capture to be fixed later in another PR (see this commit)
    • Add back HTTP docs to preprocessor
  • Docs on validation methods and their JSON output in preprocessor README

20220526_102124

Misdirection in error logs

{actions.py:211} INFO: Content **signing** by authsign server: https://api.integrity.prod.starlinglab.org/authsign

Then, an error:

{actions.py:738} ERROR: 502 Server Error: Bad Gateway for url: https://api.integrity.prod.starlinglab.org/authsign/sign

But then, an info and potentially misleading past participle:

{actions.py:220} INFO: **content signed** by authsign server /mnt/integrity_store/starling/internal/hala-systems/tmp/ukraine-3d-kyiv-photos/action-archive/224d...78d2/224da...78d2.jpg.authsign

I believe the last log should better reflect the lifecycle (success / fail / timeout, etc.) of the process.

Verification of Starling Capture signatures

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Verify cryptographic signatures that come in through Starling Capture app.

The integrity-backend currently does not verify signatures from the HTC Exodus phones. We simply rely on the JWT authorization, and proceed to trust that the photo came from the provisioned phone, but in fact, the phone provides two signatures signing the image hash and metadata bundle. We should be verifying both signatures as part of our secure workflow.

The POST message contains meta and signature fields that need to be verified.

numbers-AndroidOpenSSL

This is the software signature using OpenSSL. This is what Numbers uses to verify the signature, we can take from this implementation: https://github.com/numbersprotocol/starling-capture/blob/master/util/verification/verification/verification.py

numbers-Zion

This is the hardware secure enclave signature. The output of the verification is the Ethereum wallet address associated with the phone (we set this up when provisioning the device).

We should take the meta and signature fields, and generate the Ethereum address. The address output should be recorded as content-metadata, and serves as a hardware identifier of the recording device.

starling-capture-test-asset.zip

To Do

  • Scaffold an interface that allows verification of different signaturing algos (starting with the two listed here)
  • numbers-AndroidOpenSSL verification
  • numbers-Zion verification + include output in content metadata

Authsign signatures very slow

Authsign signatures takes 2 mins to complete

Traced to this code in integritybackend/file_util.py`` file def authsign_sign`

        r = requests.post(
            authsign_server_url + "/sign",
            headers=headers,
            json={"hash": data_hash, "created": dt},
        )
  • Check speed of authsign direct - speed fast
  • Check speed from integrity server - curl speed fast
  • Check speed from pythong isolated code curl speed fast 2 mins
  • Deploy ipv6 firewall on all servers

Specific input data and claim formats

Task Summary

πŸ“… Due date: 24 Nov 2021
🎯 Success criteria: Specify data formats and schema of the Starling Capture API.

To Do

  • Specify current output of Starling Capture
  • Specify claim format for input into claim-tool at:
    • Capture assertion (post Starling Capture app)
    • Edit assertion (post Nexus/Photoshop)
    • Store assertion (post IPFS/Filecoin)
  • Detail transformations necessary at each stage
  • Document in more permanent place

Add MD5 hashing

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: digest_md5 function in file_util.py

Similar to digest_sha256 function. Came from a comment in a PR, see #64 (comment)

Verify our EXIF geo encoding code

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Verify that we are generating valid EXIF and the transformed lat long is accurate.

  1. Note GPS (lat, long) of a an image in Starling Capture.
  2. Take the image thru our system, take the EXIF block in C2PA and insert them into the image (or any image) with some exif tool.
  3. Put the photo thru some photo locator service and make sure we map to same place as (1).

See: https://github.com/starlinglab/starling-capture-api/pull/13/files#r768884772

To Do

  • Verify a photo

Verification of ProofMode signatures

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Verify file signatures from ProofMode applications

  • Related: #89
  • Not urgent
  • Verifies signatures in proofmode zip, in the preprocessor
  • Current proofmode preprocessor code is happening here, at parse_proofmode_data

To Do

  • ...

Build an API to receive files and output C2PA-injected images

Task Summary

πŸ“… Due date: 24 Nov 2021
🎯 Success criteria: Build an API that allows backend to receive [ photos + JSON ] and inject C2PA with capture signatures in Capture assertion.

Links you'll need:

To Do

  • @benhylau to provide Starling Capture test files
  • spec API to receive and fetch
  • add basic auth
  • implement APIs in Python
  • @benhylau to provide claim format for Capture assertion injection
  • do an injection thru claim-tool
  • verify sending source files via POST and fetching C2PA-injected images thru GET
    • πŸ€” how we (e.g. SCMP) list directories?

Integrate Numbers Protocol registration into integrity API server

Task Summary

πŸ“… Due date: March 7, 2022
🎯 Success criteria: Register to Numbers Protocol following same pattern as #53.

Post-editorial content (same ones we register to LikeCoin ISCN) will be registered with Numbers. This will be similar to our integration with LikeCoin (#53). In this case we'd have a IOTA wallet, register a CID, that'll allow future iterations of the asset be linked to it. This allows us to aggregate versions (and in the future, attestations) of an asset on IOTA.

Note that the registration chain may change in the near term.

To Do

  • See how we are doing #53 in https://github.com/starlinglab/starling-integrity-api/pull/58/files
  • Manually, from Python with curl, register a piece of data and verify entry on blockchain
  • Make a comment on the key materials necessary
    • Are we using a Starling wallet or Numbers has a wallet they use to register everyone behind the scenes? formerly is generally preferred but latter is acceptable as well, this usually changes whether we are making an API call to a public API or having to run wallet-blockchain coordination code locally. Sort this with Numbers.
  • Let @benhylau know what data fields are necessary
  • Integrate the registration code into repo according to flow diagram we discussed

Add grouping of created photos by date

Task Summary

πŸ“… Due date: mid-Feb
🎯 Success criteria: Group photos in create-output folder by date.

Please add additional date sub-directory.

create-output/jane-doe/2022-01-24/...
                      /2022-01-25/...
                      /2022-02-11/...
create-output/mitra-moe/2022-01-24/...
                       /2022-01-25/...

To Do

  • Add feature
  • Deploy this for BCN instance

C2PA on proofmode JPGs

Task Summary

πŸ“… Due date: Monday May 2
🎯 Success criteria: Working c2pa_proofmode action

  • Get rid of create_proofmode func
  • Func: c2pa-proofmode / c2pa_proofmode
    • Input: zip path, config JSON
    • Unzip
    • Read from content_metadata to get photographer name (use Signal number for now): "private": { "signal": { "source": "+16475551234",
    • Take each jpeg and inject
    • Output folder in shared FS: <org_id>/<collection_id>/c2pa_proofmode_output/<photographer_name>/<date "YYYY-MM-DD">

Metadata about IPFS CID

TODO
Decide if we need to store more metadata about generate IPFS hashes so they can be re-created at a future date in the same way from the original file.

Best solution is to keep the original CAR file, but this may not be practical.

ISSUE Described
Although IPFS CID => 1 File the inverse is not true
1 File => Many IPFS CID not just one. This is dependent on how the file is parsed.

The main IPFS options are that change the CID are

--raw-leaves
--trickle
--wrap-with-directory
--chunker

Note from the help file

Almost all the flags provided by this command will change the final CID, and
new flags may be added in the future. It is not guaranteed for the implicit
defaults of 'ipfs add' to remain the same in future Kubo releases, or for other
IPFS software to use the same import parameters as Kubo.

If you need to back up or transport content-addressed data using a non-IPFS
medium, CID can be preserved with CAR files.
See 'dag export' and 'dag import' for more information.

This means if we ever need to put a file back into an IPFS swarm, we have to make sure we use the right parameters to create the CAR file that will eventually go into IPFS otherwise the IPFS CID will be different.

We may need to store some additional metadata on how we arrived at a CID

Currently we use

ipfs add hello --cid-version=1

--cid-version=1 also triggers

--cid-base base32
and
--raw-leaves=true

Auto C2PA injection of Filecoin Piece CID

Task Summary

πŸ“… Due date: mid-March
🎯 Success criteria: Delay and automate the injection of the store claim, to contain both IPFS CID and Filecoin Piece CID.

See: #4 (comment)

Poll for Piece CID from web3.storage and do injection once available. Before being able to do this, we need to have a better way to track assets than the filename based approach.

To Do

  • ...

Adopt new configuration schema and implement new file structure with Collection directories

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Decide on file structure for Collections.

File structure chart: https://app.mural.co/t/starlinglab7814/m/starlinglab7814/1645818950661/a7fde68442959d82fbb968f2f825964cc1db86aa?sender=ub3a421c048efd7d270741703

To Do

  • Figure out bot directories
  • Figure out WACZ directories

☝🏽 from perspective of the backend (this repo) we have defined the requirements for files to be dropped for processing. Bringing them to the right format to "drop off" will be responsibility of Collector clients.

We also discussed moving HTTP API server (i.e. POST handler) out to become a standalone Collector client. This may spin off to a future issue.

FTP sync broken

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Fix ftp sync.

FTP sync stopped working and manual run has it crashing with:

starling@lab1:/mnt/volume_tor1_01/starling/sftp$ /usr/bin/python3 /usr/local/bin/pyftpsync sync --no-prompt --no-keyring --no-verify-host-keys --resolve=skip /mnt/volume_tor1_01/starling/sftp sftp://47.89.10.4
Synchronize /mnt/volume_tor1_01/starling/sftp
                with sftp://47.89.10.4/
Encoding local: utf-8, remote: utf-8
Connecting None:*** to sftp://47.89.10.4
Using credentials from .netrc file: starling_sync:***.
Could not remove lock file: [Errno 2] No such file
Could not remove lock file: [Errno 2] No such file
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/pyftpsync.py", line 251, in run
    s.run()
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/synchronizers.py", line 833, in run
    res = super(BiDirSynchronizer, self).run()
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/synchronizers.py", line 213, in run
    self.close()
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/synchronizers.py", line 163, in close
    self.remote.close()
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/sftp_target.py", line 211, in close
    self._unlock(closing=True)
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/sftp_target.py", line 262, in _unlock
    self.sftp.remove(DirMetadata.LOCK_FILE_NAME)
  File "/usr/local/lib/python3.9/dist-packages/pysftp/__init__.py", line 728, in remove
    self._sftp.remove(remotefile)
  File "/usr/local/lib/python3.9/dist-packages/paramiko/sftp_client.py", line 398, in remove
    self._request(CMD_REMOVE, path)
  File "/usr/local/lib/python3.9/dist-packages/paramiko/sftp_client.py", line 822, in _request
    return self._read_response(num)
  File "/usr/local/lib/python3.9/dist-packages/paramiko/sftp_client.py", line 874, in _read_response
    self._convert_status(msg)
  File "/usr/local/lib/python3.9/dist-packages/paramiko/sftp_client.py", line 903, in _convert_status
    raise IOError(errno.ENOENT, text)
FileNotFoundError: [Errno 2] No such file

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/pyftpsync", line 8, in <module>
    sys.exit(run())
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/pyftpsync.py", line 258, in run
    s.remote.close()
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/sftp_target.py", line 211, in close
    self._unlock(closing=True)
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/sftp_target.py", line 262, in _unlock
    self.sftp.remove(DirMetadata.LOCK_FILE_NAME)
  File "/usr/local/lib/python3.9/dist-packages/pysftp/__init__.py", line 728, in remove
    self._sftp.remove(remotefile)
  File "/usr/local/lib/python3.9/dist-packages/paramiko/sftp_client.py", line 398, in remove
    self._request(CMD_REMOVE, path)
  File "/usr/local/lib/python3.9/dist-packages/paramiko/sftp_client.py", line 822, in _request
    return self._read_response(num)
  File "/usr/local/lib/python3.9/dist-packages/paramiko/sftp_client.py", line 874, in _read_response
    self._convert_status(msg)
  File "/usr/local/lib/python3.9/dist-packages/paramiko/sftp_client.py", line 903, in _convert_status
    raise IOError(errno.ENOENT, text)
FileNotFoundError: [Errno 2] No such file
Exception ignored in: <function BaseSynchronizer.__del__ at 0x7fc4055773a0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/synchronizers.py", line 154, in __del__
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/synchronizers.py", line 163, in close
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/sftp_target.py", line 211, in close
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/sftp_target.py", line 267, in _unlock
TypeError: 'NoneType' object is not callable
Exception ignored in: <function _Target.__del__ at 0x7fc40556ba60>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/targets.py", line 134, in __del__
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/sftp_target.py", line 211, in close
  File "/usr/local/lib/python3.9/dist-packages/ftpsync/sftp_target.py", line 267, in _unlock
TypeError: 'NoneType' object is not callable
Exception ignored in: <function Connection.__del__ at 0x7fc4044dd940>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/pysftp/__init__.py", line 1013, in __del__
  File "/usr/local/lib/python3.9/dist-packages/pysftp/__init__.py", line 785, in close
  File "/usr/local/lib/python3.9/dist-packages/paramiko/sftp_client.py", line 195, in close
  File "/usr/local/lib/python3.9/dist-packages/paramiko/channel.py", line 671, in close
  File "/usr/local/lib/python3.9/dist-packages/paramiko/transport.py", line 1846, in _send_user_message
AttributeError: 'NoneType' object has no attribute 'time'

To Do

  • ...

Create JS sample for displaying C2PA-injected images created from backend

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Render overlay from C2PA-injected image using Adobe's js-sdk.

This needs to support lazy loading of the C2PA image.

Take @YurkoWasHere's b64 lazy load http://node2.e-mesh.net/lazy/index.html and Adobe's js-overlay https://contentauthenticity.org/how-it-works and make it work with an image we created.

Screen Shot 2021-11-18 at 11 19 02 PM

To Do

  • @benhylau add test image for parsing
  • Parse an image using this sample code provided by Adobe
  • Pick up the js-overlay code from Adobe on Dec 3 and render with our image
  • Verify b64-encoded parsing
  • Pass to SCMP to verify this works behind Cloudflare
  • Render custom keys (e.g. IPFS CID) on overlay

Make sure server starts up on Mac OS

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: pipenv run server succeeds on Mac OS; timebox work -- abandon this if it is turns out to be Too Hard ℒ️

Ben ran into this issue on his local machine:

❯❯❯ pipenv run server                                                                                      main
Loading .env environment variables...
Loaded configuration for organizations: dict_keys(['hyphacoop'])
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/internal/hyphacoop/assets already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/internal/hyphacoop/claims already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/internal/hyphacoop/tmp already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/internal/hyphacoop/create already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/internal/hyphacoop/create-proofmode already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/fs/hyphacoop/add already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/fs/hyphacoop/update already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/fs/hyphacoop/store already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/fs/hyphacoop/custom already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/fs/hyphacoop/create-output already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/fs/hyphacoop/create-proofmode-output already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/fs/hyphacoop/add-output already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/fs/hyphacoop/update-output already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/fs/hyphacoop/store-output already exists
[2022-02-18T02:13:26-0500] {file_util.py:31} INFO: Directory /Users/benedict/.starling-integrity/fs/hyphacoop/custom-output already exists
Traceback (most recent call last):
  File "/Users/benedict/Dev/starling/starling-integrity-api/main.py", line 102, in <module>
    proc.start()
  File "/opt/homebrew/Cellar/[email protected]/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/opt/homebrew/Cellar/[email protected]/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 224, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 284, in _Popen
    return Popen(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/opt/homebrew/Cellar/[email protected]/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/opt/homebrew/Cellar/[email protected]/3.9.8/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function <lambda> at 0x1041b0040>: attribute lookup <lambda> on __main__ failed

To Do

  • Reproduce issue locally
  • Make fix

Find a home to deploy the API

Task Summary

πŸ“… Due date: 23 Nov 2021
🎯 Success criteria: Have the Python API in this repo hosted on a public endpoint I can POST files to.

To Do

  • Decide were to put it
  • Create a functioning environment
    • install and run api server as server
    • Get a hostname
    • nginx for SSL
  • Define location for files
  • Final resting place of API

Update legacy config handling

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: ...

config.py has been updated with functions to read the config variables easily. All access to configurations should be done via the config.ORGANIZATION_CONFIG.get* functions.

There is currently a re-indexing of the read config to a key:value pair that should be addressed as well.

To Do

Document all server configs

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Document all server config keys in README.

The README currently only documents only a subset of the server configs, and the selection is kind of arbitrary. For example, C2PATOOL_PATH is documented but IPFS_CLIENT_PATH isn't. It is also unclear what is required vs. optional, and when is something needed.

There are names that don't tell very much what they are (e.g. JWT_SECRET and KEY_STORE). This is especially confusing since we added features like C2PA_CERT_STORE. How is this functionally different from KEY_STORE?

So this ticket is to add documenting to all the keys in .env, without changing the config names themselves.

To Do

Upgrade to new C2PA

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: See below

Adobe has released their C2PA software to the public, and the C2PA spec has changed. We must update our software to match the open source tooling, and so that software using the new spec (like Photoshop) can work with our pipeline.

To Do

  • Update code to use new claim tool - https://github.com/contentauth/c2patool
    • Rename module
  • Update JSON in integrity-backend and JSON-generating code
  • Update integrity-schema to match new C2PA spec
  • [ ] Keep old schema, and map it to the C2PA spec version it was for (rename folder)
  • Have table like this:
claim tool version | c2pa spec version | verifier website

Authsign formating bug

Authsign json is saved in an additional json encode.

"{\"hash\":\"......

Seems that authsign data is read in as a string here
https://github.com/starlinglab/integrity-backend/blob/main/integritybackend/file_util.py#L216

But saved as if it was a dict being turning into a string
https://github.com/starlinglab/integrity-backend/blob/main/integritybackend/file_util.py#L221

To solve:

Load the string into a dict

authsign_proof = json.loads(r.text)
authsign_proof = r.json()

OR

Save it as a string

authsign_proof = r.text

Implement Opentimestamps

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Research and then implement opentimestamps


Read these to understand the OpenTimestamps system:

Research

  • Python library
  • CLI in Python
  • There are public verifiers like https://opentimestamps.org
  • There are also public calendar servers, which are Bitcoin nodes that aggregate file hashes and commit them to the Bitcoin blockchain for free
  • Calendars are not federated, but your client can ask multiple calendars to "stamp" your file (and in fact asks multiple by default)
  • It takes 1 second to get a timestamp (.ots file), but it is incomplete
  • To verify the timestamp against the blockchain, we have to download all the other timestamp data for the block our file was added to, which only happens after 8-9 hours
  • This is called "upgrading" the timestamp, and is done by contacting calendar servers
  • Upgrading can happen at any point once the block has been committed (8-9 hours after the timestamp)
  • An upgraded timestamp only requires a Bitcoin node to verify the proof, not any calendar servers

Conclusion

  • If we are okay to assume calendar servers (or a mirror of them) will stick around, then the process for using OpenTimestamps is simple
    • Timestamp the file, this takes a few seconds
    • Keep the timestamp file (.ots) as part of the archive, in a way that obviously indicates which file it matches to
    • Timestamp files can be upgraded and verified at any later date
  • If we don't want to assume calendar servers (or a mirror of them) will stick around:
    • Timestamp the file, this takes a few seconds
    • Keep the timestamp file in a temporary holding folder
    • Using a service or cronjob, upgrade the timestamp after 8-9 hours
    • Move the timestamp into the archive, in a way that obviously indicates which file it matches to

To Do

  • How easy is it to at a future date, take the "receipt from ots" from our records, and download the merkle tree of 8-9 hrs from the ots server
    • Very easy. Just run ots upgrade my_timestamp.ots and the existing timestamp file will be upgraded to a full proof with all the needed data from the block

Handle Signal payload

Task Summary

πŸ“… Due date: end-Feb
🎯 Success criteria: Receive zip file from Signal and inject with C2PA.

Numbers will spin up a Signal bot with the following task in Numbers Asana, reproduced here:

Spin up a Signal bot to send received zip files from Signal to Starling Integrity API using:

curl -X POST https://integrity-api.starlinglab.org/assets/create-signal
-H "Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJhdXRob3IiOnsiaWRlbnRpZmllciI6Imh0dHBzOi8vaHlwaGEuY29vcCIsIm5hbWUiOiJCZW5lZGljdCBMYXUifSwiY29weXJpZ2h0IjoiQ29weXJpZ2h0IChDKSAyMDIxIEh5cGhhIFdvcmtlciBDby1vcGVyYXRpdmUuIEFsbCBSaWdodHMgUmVzZXJ2ZWQuIn0._GVB0x7EGHdxMW78XftpO4nLiAU11g7WtdJvyrrDMws"
-H "Content-Type: multipart/form-data"
-F "[email protected]"

The signal.zip is generated using ProofMode, but a forked one due to a bug that we hope will be fixed upstream, but currently hosted here:
https://www.dropbox.com/sh/y5ou7expyrz8h39/AACsKgqpb-hj8TdxQZlHuGyua?dl=0

Note that the API is not yet built yet, but it will be very similar to Starling Capture's current way of talking to the backend. Internally it will unzip and inject C2PA.

The ask here is to spin up a Signal bot with a Signal number, that can receive and forward the payload.

We need to parse the payload, transform, inject C2PA, and put the output into a create-signal-output folder.

To Do

  • ...

Publishing to IPFS and Filecoin

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Publish a bunch of cat photos to IPFS nad Filecoin and get back the IPFS CID and Filecoin Piece CID.

I put a folder of openly licenses cat photos onto SCMP's FTP. Please investigate how best to publish them onto IPFS and Filecoin and receive the IPFS CID and a permanent identifier to discover the asset on the Filecoin network (I think the Piece CID).

@YurkoWasHere pls see this giant thread for context on Filecoin identifiers.

From that thread, here are some key messages:

You can use any service to upload + pin the file to IPFS first (without the Filecoin incentive layer) for immediate upload + access to the file. Afterwards, you need to upload that file to a Filecoin storage provider through a storage client proxy (via Estuary, Web3.storage, PiKNiK (coming soon), etc).
One thing to note: the CID of a file can be different depending on the hashing algorithm & CID version used in generation, so you have to make sure the storage client proxy uses the same method of CID generation as the IPFS pinning service used.

Yep, I agree. Using the Piece CID is the way to go for embedding into the image asset, which will allow the reference on blockchain explorers. However, I think you should also include the data CID (the IPFS CID) as well since that’s how the data will actually be retrieved.

storing data in filecoin on your own will not result in it being retrievable on the ipfs network, you need something like estuary to provide that service

With the technology right now, you would need to make a new CAR (if you don’t already have the previous CAR saved), and resubmit that deal. Estuary may have something to make renewals easier. The Piece CID can remain unchanged using the same params.

Note that:

  • The Piece CID is not what's used in 78 days
  • There isn't a way to map IPFS CID to Filecoin Piece ID, other than some local black magic that some people do (what Why described in the thread, I wouldn't use that in our design)
  • In Filecoin ecosystems, the IPFS CID is sometimes referred to as Payload CID or Data CID
  • We want to avoid running IPFS and Filecoin internally, and rely on client proxy services

Our requirements for the stored asset:

  • Retrievable from the IPFS network
  • Link to a stable ID on the Filecoin network such that active and future deals associated with same content can be retrieved
  • Does not take > 10 minutes to produce

To Do

IPFS CID pin orchestration through cluster into private swarm

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Be able to take encrypted archives and pin them across private swarm in staging.

[@YurkoWasHere to fill]

Files to store:

  • /mnt/integrity_store/starling/internal/{org_id}/{collection_id}/action-archive/*.encrypted (NOT *.zip)

To Do

  • Research ipfs-cluster and how pins are orchestrated
  • Implement ipfs/ipfs-cluster pinning in backend
  • Start thinking about documentation on how to operate a "node" with audience: archiving org, news org, etc.
    • why and how
    • what is ipfs, filecoin, storj
    • security
      • private swarm key
      • ipfs gateway
      • data residency and deletion (who stores the stuff in each network)
    • pros and cons, risks
    • costs
    • ansibles
    • see: https://www.sucho.org

Add smoke testing for API server

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: We are confident at all times that code in main can successfully serve basic /create requests

Right now the only automated tests we have are a couple of unit tests for the logic behind claim generation. It would be good to have a way to automatically verify that code at head is still functional enough that it can serve basic API requests. Ideally we would run such a test in CI before merging.

To Do

  • Write script that can send the server a basic request
  • (maybe) Add assets and configuration so that server can be automatically tested with this script (i.e. vendorize claim_tool, add an image to test with, etc)
  • (maybe) Add running this script as a step to CI and/or deploys

Expand FileUtil to hash CIDs and do AES encrypt

We currently have FileUtil that does helper operations for file manipulation. We need to add two new functionalities here:

  1. Hash CIDv1 (same algo that Numbers Protocol uses internally for registration)
  2. Encrypt with AES-256-CBC

Alongside functions that enable these operations, such as generating an AES key, etc. It is possible that it's more appropriate to rename (or have another class) called CryptoUtil as keygen seems to fall out of "file operations" but I'll leave that open.

Note that internally we use sha256 as an internal addressing scheme, where the "asset" has persistent filename as its sha256 digest. Associated metadata files are also named with the matching asset's sha256. We need to keep this system. (Temporary assets are usually named with a UUID.)

To Do

  • Hash (sha256 and Numbers-compatible CIDv1) - done in #64
  • Encrypt (AES for data using collection key) - done in #66
    • Pin test asset on IPFS node and make sure our computed CID matches what say... IPFS Desktop does
  • Connect with @anaulin to generate these (see #53 (comment)):
    • hash://sha256/9564b85669d5e96ac969dd0161b8475bbced9e5999c6ec598da718a3045d6f2e
    • ipfs://bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oclgtqy55fbzdi
    • md5://<md5_hash>

Implement archiving Action

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Implement components that generates and registers encrypted archiving according to archiving policy on a collection.

Draft workflow:

Screen Shot 2022-03-10 at 10 40 34 AM

To Do

  • ...

Update legacy actions used for C2PA injection

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Update legacy actions to support new internal file actions defined by configuration.

Our file watcher now watches for ./{collection_id}/input/*.zip. A collection folder with 3 actions defined in config looks like:

./collection_id/
    input/*.zip (watched)
    action-archive/ (the action handles this)
    action-create/
    action-store/

This breaks legacy actions (create, add, update, store, custom). The way it'll work for the legacy actions is:

  • ./create gets replaced with a collection_id so we watch, for example, ./capture-app-collection/input/*.zip
  • ./store gets replaced with a collection_id so we watch, for example, ./ipfs-store-collection/input/*.zip

We will no longer handle raw JPGs as input. The only format coming in should be ZIPs.

The HTTP handler will need to output conformant ZIPs in the create endpoint. Similarly, we need a new way of accepting JPG files that are currently received via FTP folder drops.

To Do

  • ...

Open source the repo

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: Open source the repo.

This task tracks a list of TODOs for FOSSing.

To Do

  • Change repo name to starling-integrity-api
  • Add MIT license
  • Verify no credentials / sensitive info are in the code and issues
  • Make sure it's well documented
  • Make this repo public
  • Rename subfolder #9 (comment)
  • Move content from UA data model pad to here @benhylau
    • Moved out to larger documentation effort

FTP sync with SCMP FTP server

Task Summary

πŸ“… Due date: 29 Nov 2021
🎯 Success criteria: Bidirectional sync with FTP server.

To Do

  • Set up sync with SCMP server
  • Evaluate existing sync solution (do we keep?)

Refactor archive action

Task Summary

πŸ“… Due date: N/A
🎯 Success criteria: ...

To Do

  • Take org_id instead org_config
  • ISCN data model code can be moved to new file

Document collections config file schema and metadata file structure

Holding place for integrity backend config file

{
    "organizations": [
        {
            "id": "starling-lab-test",
            "collections": [
                {
                    "id": "test-bot-archive-slack",
                    "asset_extensions": [
                        "zip"
                    ],
                    "actions": [
                        {
                            "name": "archive",
                            "params": {
                                "authsigner": "starlinglab-authsign",
                                "encryption": {
                                    "algo": "aes-256-cbc",
                                    "key": "starlinglab-aes-256"
                                },
                                "registration_policies": {
                                    "opentimestamps": {
                                        "active": true
                                    },
                                    "iscn": {
                                        "active": true
                                    },
                                    "numbersprotocol": {
                                        "active": true
                                    }
                                }
                            }
                        }
                    ]
                },
                {
                    "id": "test-bot-archive-telegram",
                    "asset_extensions": [
                        "zip"
                    ],
                    "actions": [
                        {
                            "name": "archive",
                            "params": {
                                "authsigner": "starlinglab-authsign",
                                "encryption": {
                                    "algo": "aes-256-cbc",
                                    "key": "starlinglab-aes-256"
                                },
                                "registration_policies": {
                                    "opentimestamps": {
                                        "active": true
                                    },
                                    "iscn": {
                                        "active": true
                                    },
                                    "numbersprotocol": {
                                        "active": true
                                    }
                                }
                            }
                        }
                    ]
                },
                {
                    "id": "test-web-archive",
                    "asset_extensions": [
                        "wacz"
                    ],
                    "actions": [
                        {
                            "name": "archive",
                            "params": {
                                "authsigner": "starlinglab-authsign",
                                "encryption": {
                                    "algo": "aes-256-cbc",
                                    "key": "starlinglab-aes-256"
                                },
                                "registration_policies": {
                                    "opentimestamps": {
                                        "active": true
                                    },
                                    "iscn": {
                                        "active": true
                                    },
                                    "numbersprotocol": {
                                        "active": true
                                    }
                                }
                            }
                        }
                    ]
                },
                {
                    "id": "test-web-archive-dfrlab",
                    "asset_extensions": [
                        "wacz"
                    ],
                    "actions": [
                        {
                            "name": "archive",
                            "params": {
                                "authsigner": "starlinglab-authsign",
                                "encryption": {
                                    "algo": "aes-256-cbc",
                                    "key": "starlinglab-aes-256"
                                },
                                "registration_policies": {
                                    "opentimestamps": {
                                        "active": true
                                    },
                                    "iscn": {
                                        "active": true
                                    },
                                    "numbersprotocol": {
                                        "active": true
                                    }
                                }
                            }
                        }
                    ]
                },
{
                    "id": "test-dropbox,
                    "asset_extensions": [
                        "jpg","wacz","zip"
                    ],
                    "actions": [
                        {
                            "name": "archive",
                            "params": {
                                "authsigner": "starlinglab-authsign",
                                "encryption": {
                                    "algo": "aes-256-cbc",
                                    "key": "starlinglab-aes-256"
                                },
                                "registration_policies": {
                                    "opentimestamps": {
                                        "active": true
                                    },
                                    "iscn": {
                                        "active": true
                                    },
                                    "numbersprotocol": {
                                        "active": true
                                    }
                                }
                            }
                        }
                    ]
                }
            ]
        }
    ]
}

Integrate ISCN registration into integrity API server

Task Summary

πŸ“… Due date: mid-March
🎯 Success criteria: When an asset is pushed by our server to IPFS, it also registers the piece of content on ISCN (at least one one chain)

To Do

  • investigate LikeCoin libraries, understand at a high level what they do: https://github.com/likecoin/iscn-batch-uploader, https://github.com/likecoin/iscn-js
  • decide best way to integrate this into our Python3-based server (do we call out to a Node executable? into a separate Node-based service running locally? something else?)
  • implement integration in code
  • set up wallet to be held by the server to fund the registration, get wallet funded
  • decide on schema for registration for our specific use cases(s)
  • implement registration schema in code
  • verify registrations appear as intended on-chain

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.