Giter VIP home page Giter VIP logo

construct-hub's People

Contributors

addihorowitz avatar amazon-auto avatar aws-cdk-automation avatar cdklabs-automation avatar chriscbr avatar colifran avatar comcalvi avatar corymhall avatar dependabot[bot] avatar eladb avatar evgenyka avatar gabewomble avatar github-actions[bot] avatar iliapolo avatar jlargent42 avatar jumic avatar kaizencc avatar madeline-k avatar mmuller88 avatar mrarnoldpalmer avatar mrgrain avatar naumel avatar netanir avatar otaviomacedo avatar pgollucci avatar rix0rrr avatar romainmuller avatar sumupitchayan avatar truggeriaws avatar yuth avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

construct-hub's Issues

Implement: Transpilation Function

Goal: make sure we have code samples in S3 to every package in all supported language
How: Should be triggered per package + language, on every update/new S3 object event (on S3 assembly.json objects)
Dependent On: there's a separate task for repackaging JSII to support transpilation, #11

Implement asynchronous application of the jsii-transpile tool on assemblies stored in S3, and add the language-specific variants of the assembly there.

The function triggers from a DynamoDB Stream event for updates to objects in the table.

Automating Self-publishing of Constructs - Security & Sanitisation

For an internal enterprise instance of construct-hub, how could we implement some automated Sanitisation and "Securitisation" of user self-published constructs?

How would self-publishing work? Perhaps publishing a construct sends a codecommit repo to a codepipeline which runs user-defined tests (check there are some tests) but also enterprise-defined tests, and then to a final manual approval step before publication to the json/dynamodb store of constructs?

How do we set the org policies for the tests enterprise-defined testing requirements and other policies?

How are the references to the codecommit construct repos stored in the catalog/hub db?

New packages may not be ingested

Our discovery function polls npmjs.com for new packages. Once a package is detected, the transliterator tries to npm install this package from the dedicated CodeArtifact repository that mirrors npmjs.com.

Occasionally, a delay may occur between the time a package is available on npmjs.com and CodeArtifact. This means that we might be dropping new packages as they come in.

We need to make sure we have sufficient retry mechanism in-place that mitigate this delay.

Implement: Observability Dashboard

Implement the observability dashboard with generic APIs as much as possible, so the observability patterns can be re-used in various other projects.

The dashboard design is to be based off the results of #8.

Implement: Discovery Function

What: lambda function, should be executed every 5m
The goal: Harvest packages list from npm and notify construct hub
Input: context (last event that was processed = transaction ID)
Output: sends notification to SNS topic, payload = link to tmp directory in s3 where the package file is
How: the harvesting should start with the last transaction ID, unless we got to the end of the list. Save it to S3.
Reference: Construct Catalog lambda function (take into account this reference retrieves only the latest version of each package).
Testing: process the results and send SNS, Make sure we correctly react to basic network problems
Monitoring: execution of the function (every 5m), #messages of SNS topic, #downloads/execution


First PR: basic functionality + testing
Second PR: Production ready


Implement the discovery function based on the findings documented as a result of #2:

Create a lambda function or state machine, depending on runtime requirements, that triggers every x (minutes|hours) to poll the registry.npmjs.org/db/_changes endpoint and posts any relevant change events to a SNS topic. These change events will include a packages scope, name, id, and any other relevant information for documentation ingestion.

The ingestion functionality should store a last_event_id that updates on every run so the following run knows where to begin from. This could be a long running state machine with an optional start parameter that would allow backfilling from any point in time.

The ingestion function should filter on changes related only to packages with the appropriate tags ie: jsii, aws-cdk, etc.

The schema for the SNS notification follows:

 import { Assembly } from '@jsii/spec';

 export interface IngestionInput {
   /**
    * A unique identifier for the origin that notified about this package
    * version. This may be a UUID generated per-origin. This identifier is used
    * by the front-end application to determine appropriate package installation
    * instructions (e.g: supporting private package registries).
    */
   readonly origin: string;

   /**
    * The contents of the .jsii assembly file that has been disocered and is
    * submitted for injestion in the Construct Hub.
    */
   readonly assembly: Assembly;

   /**
    * The timestamp at which the version has been created in the package
    * registry. When the object is in JSON form, this is encoded as an ISO-8601
    * timestamp, preferrably using the UTC time zone.
    */
   readonly time: Date;

   /**
    * A standardized checksum of the `assembly` and `time` fields formatted in
    * some canonical form. The checksum is encoded as a string with the following
    * format: `<algorithm>-<base64-encoded-hash>`.
    *
    * @example "sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
    */
   readonly integrity: string;
 }

Add `metadata.json` files to the package data backend bucket

In order to generate the front end, we currently need to types of files:

  1. The jsii assembly.
  2. The metadata.json file, which contains npm metadata that isn't available in the assembly (e.g publish date).

Right now, the metadata.json file is missing from the backend, and only exists in awscdk.io.

We need to populate the package data bucket with metadata.json files as well.

Setup CI/CD for Construct Library

Provision a complete CI/CD workflow, possibly using projen, to support automated PR validations on the repository, and automated releases to the package managers.

Until the name is finalised, all publishing should either be "dry run", or targeting private repositories.

Initial monitoring

Determine what is needed on a dashboard to facilitate operating of the Construct Hub, including private instances.

Sketch out the organization of a CloudWatch dashboard with all the relevant metrics, alarms, etc... And links to the critical resources that may need quick access.

Monitoring API design from the frontend.
It should be connected to internal alarm system.
Keep in mind that the app will be public.

Desired Output: Deploy Canary + metrics + alarms + cloudwatch dashboard + send the metrics to internal alarm system.
Time estimation: ~2d

Implement: Intake Function

Runs On: new SNS message
Goal: Moves S3 object from tmp location to final S3 location
Input: SNS payload from the discovery function (S3 link to one package data)
Output: one package object is saved in S3 (path: assemblies/${assembly.name}/v${assembly.version}/assembly.json)
How: It validates the assembly and filters problematic packages + sanitize the JSII input (make sure it's in the right format), make sure the checksum matches
Reference: validation code in @jsii/spec
Testing: add basic tests
Monitoring: #dropped packages, #validated packages

First PR: Implement the function + testing
Second PR: Add monitoring

The function is triggered off an input event produced by the discovery function (in the SNS topic):

 import { Assembly } from '@jsii/spec';

 export interface IngestionInput {
   /**
    * The URI to an S3 object containing the NPM package tarball.
    *
    * @example "s3://<bucket>/<key>[?versionId=<versionId>]"
    */
   readonly tarballUri: string;

   /**
    * The timestamp at which the version has been created in the package
    * registry. When the object is in JSON form, this is encoded as an ISO-8601
    * timestamp, preferrably using the UTC time zone.
    */
   readonly time: Date;

   /**
    * (Optional) Metadata associated by the discovery function to the package
    * version
    */
   readonly metadata?: { readonly [key: string]: string };

   /**
    * A standardized checksum the request, encoded using the following format
    * format: `<algorithm>-<base64-encoded-hash>`.
    *
    * @example "sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
    */
   readonly integrity: string;
 }

The checksum can be computed using the following function:

import { createHash } from 'crypto';

function checksum(request: IngestionInput, tarball: Buffer, alg = 'sha384'): string {
  const hash = createHash(alg);
  const addField = (name: string, data: string | Buffer) =>
    //           <SOH>        $name          <STX>        $data          <ETX>
    hash.update('\x01').update(name).update('\x02').update(data).update('\x03');

  for (const [name, value] of Object.entries(request.metadata ?? {}).sort(([l], [r]) => l.localeCompare(r))) {
    addField(`metadata/${name}`, value);
  }
  addField('tarball', tarball);
  addField('time', request.time);

  return `${alg}-${hash.digest('base64')}`;
}

It stores the assembly data at the relevant S3 location.

Implement: package ingestion and processing

Adds a lambda function that triggers from a dynamo DB stream to add make a new version of a package, or newly published package, available in construct-hub. The dynamo stream contains change events from relevant packages and includes the package scope, name, and version to be processed as described in #55.

The lambda will download the package artifacts from npm and store the relevant ones, .jsii and readme.md, in the target s3 bucket. No processing should be performed on the package artifacts as part of this task. This bucket should be made accessible to the construct-hub-webapp frontend via cloudfront.

Make sure packages files are harvested from npm and ready to use on S3

The owner of this task makes sure to understand what are the sub tasks that are needed in order to achieve this goal. They create the relevant sub-tasks under PLANNED.

For example:

  • What do we need in order to productize the existing code that harvests the packages from npm? (testing? monitoring? else?)

Please speak with Elad about it.

Create `jsii-transpile` tool

Goal: given a jsii-enabled package (e.g: the npm tgz for the package), install the package & dependencies into a working directory, compile & transliterate all code examples therein into select target languages, then produce a transliterated .jsii assembly file as a result.
How: jsii-rosetta provides the functionality to perform the transliteration, however it does not bother with the dependency management (it is intended to run at build time, not against packaged libraries), and produces a rosetta tablet file instead of a transliterated jsii assembly document.


The jsii-transpile tool would consume a .jsii assembly file, and leverage jsii-rosetta APIs to create one <lang>.jsii file per configured target language, with all APIs renamed to the correct language representation, and sample code trans-literated to the correct language where possible.

Bug: trasnliterator fails on some packages

For examples, for @aws-cdk/aws-autoscaling:

Error: Sample uses literate source test/example.images.lit.ts, but not found: /tmp/workdir6yEJu7/node_modules/@aws-cdk/aws-autoscaling/test/example.images.lit.ts
--
stack.1 | at loadLiterateSource (/var/task/index.js:130207:15)
stack.2 | at Object.fixturize (/var/task/index.js:130188:18)
stack.3 | at Object.allTypeScriptSnippets (/var/task/index.js:139895:34)
stack.4 | at allTypeScriptSnippets.next (<anonymous>)
stack.5 | at Rosetta.addAssembly (/var/task/index.js:140065:22)
stack.6 | at async loadAssemblies (/var/task/index.js:142608:9)
stack.7 | at async transliterateAssembly2 (/var/task/index.js:142579:26)
stack.8 | at async Runtime.handler (/var/task/index.js:142734:7)

This happens even though we do have transliterated README in the api ref docs: https://docs.aws.amazon.com/cdk/api/latest/python/aws_cdk.aws_autoscaling/README.html

Implement the Backend X Front-end compatibility model as described in the design RFC.

Since documentation is generated on the backend in a background, we need to make sure the webapp knows how to handle multiple versions of documentation pages.

  1. Strongly typed interfaces that represent the schema.
  2. Enforce major version bumps on breaking changes
  3. Front-end heuristic to fetch the correct definition file based on the latest version it support and is available.

More details here: https://github.com/aws/aws-cdk-rfcs/blob/master/text/0324-cdk-construct-hub.md#backend-x-frontend-compatibility

Ingestion: Use CodeArtifact to store packages

Currently we store the discovered packages in S3, we suggest that we migrate to use CodeArtifact. There a few reason to do so:

  • Easy management of package versions
  • Security: We use npmjs in the transliteration process. The transliterator Lambda execute npm install in order to get to full type definition, required to generate the code snippets in all target language. This means that the function requires full network outbound access in order to reach out to npm. Using CodeArtifact will allow us to block all outbound traffic.
  • Reduce the work required in the ingestion pipeline, we no longer need to download the tarball from npm.js and upload it to s3.

CodeArtifact allows configuring npm as an upstream repo which mean we don't need to mirror npm ourselves (we still need the discovery function to track updates to npm).

Note that from the ConstructHub operator perspective, this is a drop-in replacement for npmjs, we will implement it as an opt-in feature.

open questions:

  • Cost: We need to measure the cost difference between CA and S3 per volume of package and traffic.

Research: Discovery Function

The npmjs.com package registry is backed by a CouchDB database, for which a replica is available for public consumption at skimdb.npmjs.com.

The _changes endpoint can be used to query the CouchDB for updates from a certain point in time (transaction ID), which would allow a Lambda function to discover new packages without having to resort to a full search on the package registry.

Ideally, the changes should be filtered with application selection criteria (keywords should include one of aws-cdk, cdk8s, cdktf or constructs -- that list may grow in the future); however no index exists that backs such a search, and it may turn out to be too slow... In which case, filtering client-side may be more effective.

This task is to determine the correct approach to use to ensure a reliable discovery stream. This should also include considerations about error handling (what is and is not retry-able, etc...)

Implement: npm change event ingestion

Create a lambda function or state machine, depending on runtime requirements, that triggers every x (minutes|hours) to poll the registry.npmjs.org/db/_changes endpoint and store any relevant change events in a dynamo db table. These change events will include a packages scope, name, id, and any other relevant information for documentation ingestion.

The ingestion functionality should store a last_event_id that updates on every run so the following run knows where to begin from. This could be a long running state machine with an optional start parameter that would allow backfilling from any point in time.

The ingestion function should filter on changes related only to packages with the appropriate tags ie: jsii, aws-cdk, etc.

The target dynamo table will trigger a stream for the package ingestion and processing as described in #54

Bootstrap CDK Construct Library

Set up the basic code layout for a construct library, with all available jsii languages properly configured (subject to future naming changes, though).

Basic construct entry point and unit testing set-up.

This should be achieved using projen.

Move doc generation to `jsii-docgen`

  1. Use rosetta to transliterate the assemblies
  2. Use the docgen code to create the API reference.

Output should be a single markdown file that can rendered by the front-end. Should produce one file per submodule

Backend doc-generation PoC.

Implement a hacky solution for backend doc generation and deploy to environment so we can validate the affects it has on performance.

Implement: Pre-rendering function

Implement a pre-rendering function that prepares data for efficient front-end processing. The actual output is TBD (pending front-end requirements).

Development app

Add an app for deploying the construct hub for development

Add `.jsii` objects to awscdk.io distribution

Simply store the .jsii assemblies in the S3 bucket next to the metadata.json object, so that the prototype Construct Hub can refer to those until it gets it's own storage medium.

Implement: Latest Builder Function

Runs On: S3 Object updates to a assembly.json object
Goal: Updates the latest.json object that has the last n (e.g: 20) packages indexed in the system ("hot items")
How: Incrementally updates the latest.json with new package versions. In order to ensure consistency, the provisioned concurrency of this function will be set to 1 (ensuring only 1 execution of this function happens at any time).
Input: S3 notification payload
Output: The latest.json object has been updated to include the latest n (e.g: 20) packages indexed in the system


First PR: Initial functionality, happy case testing
Second PR: Monitoring, testing edge cases


The latest builder function triggers off S3 Object update events for the assembly.json objects, and keeps the latest.json file up-to-date with the last 20-or-so packages indexed in the platform.

It uses object metadata and versioning to ensure the result remains consistent (see design doc)

Implement package deny-list

To ensure the integrity of the website and prevent recurring abuse we need to have the ability to block specific packages from being ingested.

Note that it is not sufficient to not list offending package in the client side, we need to prevent the package from entering the processing pipeline.

Implementation notes

  • We need to allow blocking a specific version of a package as well as all versions of a package.
  • (Security-requirement) "Alert when a specific package owner is hitting the deny list protection far more then normal. "

Implement: Catalog Builder Function

Goal: Keep the catalog.json file up-to-date with the full collection of indexed packages (latest release of each major version line of each package)
Input: S3 object update (or create) notifications (assembly.json objects)
How: Updates the contents of the file given the notification payload, adding or updating entries in the list, then updating the catalog.json object. To ensure consistency, the functions provisioned concurrency is going to be pinned to 1 (ensuring only 1 instance of the function runs at any given time). If this does not work out (and only in this case), S3 object versioning can be leveraged together with S3 object metadata to detect concurrent updates to this object, and merge them into consistency.
Testing: Happy case, concurrent modification detection
Monitoring: Number of packages in the catalog file (or size of the file), classic lambda monitoring (execution failures, successes, etc...), concurrent modifications detected / rescued.


The catalog builder function triggers off a S3 update on the assembly.json objects, and keeps the catalog.json object up-to-date in S3, making sure the file contains the latest information about all indexed packages.

Implement deny-list

Implement a lambda function that gets a text file from s3 with names of packages that should be denied and removes them from s3.
The discovery function should make sure not to harvest them again.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.