cdklabs / construct-hub Goto Github PK

View Code? Open in Web Editor NEW

198.0 198.0 26.0 28.01 MB

AWS CDK construct library that can be used to deploy instances of the Construct Hub in any AWS Account.

License: Apache License 2.0

JavaScript 0.17% TypeScript 99.83%

construct-hub's People

Contributors

Stargazers

Watchers

construct-hub's Issues

Implement: Transpilation Function

Goal: make sure we have code samples in S3 to every package in all supported language
How: Should be triggered per package + language, on every update/new S3 object event (on S3 assembly.json objects)
Dependent On: there's a separate task for repackaging JSII to support transpilation, #11

Implement asynchronous application of the jsii-transpile tool on assemblies stored in S3, and add the language-specific variants of the assembly there.

The function triggers from a DynamoDB Stream event for updates to objects in the table.

API reference: Array type reference isn’t clear

Need to simplify the UI design for arrays.

Automating Self-publishing of Constructs - Security & Sanitisation

For an internal enterprise instance of construct-hub, how could we implement some automated Sanitisation and "Securitisation" of user self-published constructs?

How would self-publishing work? Perhaps publishing a construct sends a codecommit repo to a codepipeline which runs user-defined tests (check there are some tests) but also enterprise-defined tests, and then to a final manual approval step before publication to the json/dynamodb store of constructs?

How do we set the org policies for the tests enterprise-defined testing requirements and other policies?

How are the references to the codecommit construct repos stored in the catalog/hub db?

Produce the mardkown as part of the backend processing

New packages may not be ingested

Our discovery function polls npmjs.com for new packages. Once a package is detected, the transliterator tries to npm install this package from the dedicated CodeArtifact repository that mirrors npmjs.com.

Occasionally, a delay may occur between the time a package is available on npmjs.com and CodeArtifact. This means that we might be dropping new packages as they come in.

We need to make sure we have sufficient retry mechanism in-place that mitigate this delay.

Implement: Observability Dashboard

Implement the observability dashboard with generic APIs as much as possible, so the observability patterns can be re-used in various other projects.

The dashboard design is to be based off the results of #8.

Weird TOC (“API Reference” is under “License”)

Reproduce: https://constructs.dev/packages/cdk-sqs-monitored/v/1.0.3#structs

Implement: Discovery Function

What: lambda function, should be executed every 5m
The goal: Harvest packages list from npm and notify construct hub
Input: context (last event that was processed = transaction ID)
Output: sends notification to SNS topic, payload = link to tmp directory in s3 where the package file is
How: the harvesting should start with the last transaction ID, unless we got to the end of the list. Save it to S3.
Reference: Construct Catalog lambda function (take into account this reference retrieves only the latest version of each package).
Testing: process the results and send SNS, Make sure we correctly react to basic network problems
Monitoring: execution of the function (every 5m), #messages of SNS topic, #downloads/execution

First PR: basic functionality + testing
Second PR: Production ready

Implement the discovery function based on the findings documented as a result of #2:

Create a lambda function or state machine, depending on runtime requirements, that triggers every x (minutes|hours) to poll the registry.npmjs.org/db/_changes endpoint and posts any relevant change events to a SNS topic. These change events will include a packages scope, name, id, and any other relevant information for documentation ingestion.

The ingestion functionality should store a last_event_id that updates on every run so the following run knows where to begin from. This could be a long running state machine with an optional start parameter that would allow backfilling from any point in time.

The ingestion function should filter on changes related only to packages with the appropriate tags ie: jsii, aws-cdk, etc.

The schema for the SNS notification follows:

 import { Assembly } from '@jsii/spec';

 export interface IngestionInput {
   /**
    * A unique identifier for the origin that notified about this package
    * version. This may be a UUID generated per-origin. This identifier is used
    * by the front-end application to determine appropriate package installation
    * instructions (e.g: supporting private package registries).
    */
   readonly origin: string;

   /**
    * The contents of the .jsii assembly file that has been disocered and is
    * submitted for injestion in the Construct Hub.
    */
   readonly assembly: Assembly;

   /**
    * The timestamp at which the version has been created in the package
    * registry. When the object is in JSON form, this is encoded as an ISO-8601
    * timestamp, preferrably using the UTC time zone.
    */
   readonly time: Date;

   /**
    * A standardized checksum of the `assembly` and `time` fields formatted in
    * some canonical form. The checksum is encoded as a string with the following
    * format: `<algorithm>-<base64-encoded-hash>`.
    *
    * @example "sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
    */
   readonly integrity: string;
 }

Add `metadata.json` files to the package data backend bucket

In order to generate the front end, we currently need to types of files:

The jsii assembly.
The metadata.json file, which contains npm metadata that isn't available in the assembly (e.g publish date).

Right now, the metadata.json file is missing from the backend, and only exists in awscdk.io.

We need to populate the package data bucket with metadata.json files as well.

Setup CI/CD for Construct Library

Provision a complete CI/CD workflow, possibly using projen, to support automated PR validations on the repository, and automated releases to the package managers.

Until the name is finalised, all publishing should either be "dry run", or targeting private repositories.

Images in README are not displayed/resolved

https://constructs.dev/packages/@matthewbonig/nightynight/v/0.1.3?lang=typescript

Open the GitHub link. You'll see an image in the README file. You can't see it on Construct Hub.

Perhaps it's related to another issue? #198

Initial monitoring

Determine what is needed on a dashboard to facilitate operating of the Construct Hub, including private instances.

Sketch out the organization of a CloudWatch dashboard with all the relevant metrics, alarms, etc... And links to the critical resources that may need quick access.

Monitoring API design from the frontend.
It should be connected to internal alarm system.
Keep in mind that the app will be public.

Desired Output: Deploy Canary + metrics + alarms + cloudwatch dashboard + send the metrics to internal alarm system.
Time estimation: ~2d

Python docs not available for this package

https://constructs.dev/packages/cdk-tweet-queue/v/1.0.3?lang=python

https://constructs.dev/packages/cdk-dynamo-table-viewer/v/0.1.32?lang=python#usage-typescriptjavascript

https://constructs.dev/packages/cdk-cicd/v/0.1.3?lang=python

Another example:
https://constructs.dev/packages/@aws-cdk/pipelines/v/1.114.0?lang=python

Change the doc-generation function to produce a well known JSON definition file as the documentation artifact, instead of markdown.

Implement: Intake Function

Runs On: new SNS message
Goal: Moves S3 object from tmp location to final S3 location
Input: SNS payload from the discovery function (S3 link to one package data)
Output: one package object is saved in S3 (path: assemblies/${assembly.name}/v${assembly.version}/assembly.json)
How: It validates the assembly and filters problematic packages + sanitize the JSII input (make sure it's in the right format), make sure the checksum matches
Reference: validation code in @jsii/spec
Testing: add basic tests
Monitoring: #dropped packages, #validated packages

First PR: Implement the function + testing
Second PR: Add monitoring

The function is triggered off an input event produced by the discovery function (in the SNS topic):

 import { Assembly } from '@jsii/spec';

 export interface IngestionInput {
   /**
    * The URI to an S3 object containing the NPM package tarball.
    *
    * @example "s3://<bucket>/<key>[?versionId=<versionId>]"
    */
   readonly tarballUri: string;

   /**
    * The timestamp at which the version has been created in the package
    * registry. When the object is in JSON form, this is encoded as an ISO-8601
    * timestamp, preferrably using the UTC time zone.
    */
   readonly time: Date;

   /**
    * (Optional) Metadata associated by the discovery function to the package
    * version
    */
   readonly metadata?: { readonly [key: string]: string };

   /**
    * A standardized checksum the request, encoded using the following format
    * format: `<algorithm>-<base64-encoded-hash>`.
    *
    * @example "sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
    */
   readonly integrity: string;
 }

The checksum can be computed using the following function:

import { createHash } from 'crypto';

function checksum(request: IngestionInput, tarball: Buffer, alg = 'sha384'): string {
  const hash = createHash(alg);
  const addField = (name: string, data: string | Buffer) =>
    //           <SOH>        $name          <STX>        $data          <ETX>
    hash.update('\x01').update(name).update('\x02').update(data).update('\x03');

  for (const [name, value] of Object.entries(request.metadata ?? {}).sort(([l], [r]) => l.localeCompare(r))) {
    addField(`metadata/${name}`, value);
  }
  addField('tarball', tarball);
  addField('time', request.time);

  return `${alg}-${hash.digest('base64')}`;
}

It stores the assembly data at the relevant S3 location.

Implement: package ingestion and processing

Adds a lambda function that triggers from a dynamo DB stream to add make a new version of a package, or newly published package, available in construct-hub. The dynamo stream contains change events from relevant packages and includes the package scope, name, and version to be processed as described in #55.

The lambda will download the package artifacts from npm and store the relevant ones, .jsii and readme.md, in the target s3 bucket. No processing should be performed on the package artifacts as part of this task. This bucket should be made accessible to the construct-hub-webapp frontend via cloudfront.

Add "OFFICIAL" tag for packages that are owned by AWS

It should help users distinguish between UGC and AWS

release to maven is broken

We constantly see errors from this release: https://github.com/cdklabs/construct-hub/actions/runs/874396829

Make sure packages files are harvested from npm and ready to use on S3

The owner of this task makes sure to understand what are the sub tasks that are needed in order to achieve this goal. They create the relevant sub-tasks under PLANNED.

For example:

What do we need in order to productize the existing code that harvests the packages from npm? (testing? monitoring? else?)

Please speak with Elad about it.

Create `jsii-transpile` tool

Goal: given a jsii-enabled package (e.g: the npm tgz for the package), install the package & dependencies into a working directory, compile & transliterate all code examples therein into select target languages, then produce a transliterated .jsii assembly file as a result.
How: jsii-rosetta provides the functionality to perform the transliteration, however it does not bother with the dependency management (it is intended to run at build time, not against packaged libraries), and produces a rosetta tablet file instead of a transliterated jsii assembly document.

The jsii-transpile tool would consume a .jsii assembly file, and leverage jsii-rosetta APIs to create one <lang>.jsii file per configured target language, with all APIs renamed to the correct language representation, and sample code trans-literated to the correct language where possible.

Bug: trasnliterator fails on some packages

For examples, for @aws-cdk/aws-autoscaling:

Error: Sample uses literate source test/example.images.lit.ts, but not found: /tmp/workdir6yEJu7/node_modules/@aws-cdk/aws-autoscaling/test/example.images.lit.ts
--
stack.1 | at loadLiterateSource (/var/task/index.js:130207:15)
stack.2 | at Object.fixturize (/var/task/index.js:130188:18)
stack.3 | at Object.allTypeScriptSnippets (/var/task/index.js:139895:34)
stack.4 | at allTypeScriptSnippets.next (<anonymous>)
stack.5 | at Rosetta.addAssembly (/var/task/index.js:140065:22)
stack.6 | at async loadAssemblies (/var/task/index.js:142608:9)
stack.7 | at async transliterateAssembly2 (/var/task/index.js:142579:26)
stack.8 | at async Runtime.handler (/var/task/index.js:142734:7)

This happens even though we do have transliterated README in the api ref docs: https://docs.aws.amazon.com/cdk/api/latest/python/aws_cdk.aws_autoscaling/README.html

"Construct HubYou need to enable JavaScript to run this app"

Reproduce: https://constructs.dev/packages/@softchef/cdk-iot-device-management/v/0.0.33?lang=typescript

Implement the Backend X Front-end compatibility model as described in the design RFC.

Since documentation is generated on the backend in a background, we need to make sure the webapp knows how to handle multiple versions of documentation pages.

Strongly typed interfaces that represent the schema.
Enforce major version bumps on breaking changes
Front-end heuristic to fetch the correct definition file based on the latest version it support and is available.

More details here: https://github.com/aws/aws-cdk-rfcs/blob/master/text/0324-cdk-construct-hub.md#backend-x-frontend-compatibility

All CDK, CDK8s, CDKtf packages are available in S3

Make sure the discovery function brought all the packages to S3 and that they are now available to be used by the frontend

Ingestion: Use CodeArtifact to store packages

Currently we store the discovered packages in S3, we suggest that we migrate to use CodeArtifact. There a few reason to do so:

Easy management of package versions
Security: We use npmjs in the transliteration process. The transliterator Lambda execute npm install in order to get to full type definition, required to generate the code snippets in all target language. This means that the function requires full network outbound access in order to reach out to npm. Using CodeArtifact will allow us to block all outbound traffic.
Reduce the work required in the ingestion pipeline, we no longer need to download the tarball from npm.js and upload it to s3.

CodeArtifact allows configuring npm as an upstream repo which mean we don't need to mirror npm ourselves (we still need the discovery function to track updates to npm).

Note that from the ConstructHub operator perspective, this is a drop-in replacement for npmjs, we will implement it as an opt-in feature.

open questions:

Cost: We need to measure the cost difference between CA and S3 per volume of package and traffic.

Create launch list for the project

https://w.amazon.com/bin/view/AWS_Launch_Process/#HMasterLaunchChecklist

404 is not displayed when I try to go to https://constructs.dev/packages/@aws-cdk/

I do see the following message in the console Error: “Package @aws-cdk does not exist in catalog“

Research: Discovery Function

The npmjs.com package registry is backed by a CouchDB database, for which a replica is available for public consumption at skimdb.npmjs.com.

The _changes endpoint can be used to query the CouchDB for updates from a certain point in time (transaction ID), which would allow a Lambda function to discover new packages without having to resort to a full search on the package registry.

Ideally, the changes should be filtered with application selection criteria (keywords should include one of aws-cdk, cdk8s, cdktf or constructs -- that list may grow in the future); however no index exists that backs such a search, and it may turn out to be too slow... In which case, filtering client-side may be more effective.

This task is to determine the correct approach to use to ensure a reliable discovery stream. This should also include considerations about error handling (what is and is not retry-able, etc...)

Implement: npm change event ingestion

Create a lambda function or state machine, depending on runtime requirements, that triggers every x (minutes|hours) to poll the registry.npmjs.org/db/_changes endpoint and store any relevant change events in a dynamo db table. These change events will include a packages scope, name, id, and any other relevant information for documentation ingestion.

The ingestion functionality should store a last_event_id that updates on every run so the following run knows where to begin from. This could be a long running state machine with an optional start parameter that would allow backfilling from any point in time.

The ingestion function should filter on changes related only to packages with the appropriate tags ie: jsii, aws-cdk, etc.

The target dynamo table will trigger a stream for the package ingestion and processing as described in #54

Links from README file --> 404

This issue was reported by Matthew Bonig.

Check out the links at the intro of his package. If you click them you get to 404.

https://constructs.dev/packages/@matthewbonig/nightynight/v/0.1.3?lang=typescript

(In addition the images in this intro aren't resolved. Opening another ticket for that #199 )

Bootstrap CDK Construct Library

Set up the basic code layout for a construct library, with all available jsii languages properly configured (subject to future naming changes, though).

Basic construct entry point and unit testing set-up.

This should be achieved using projen.

Remove dependency on `awscdk.io` backend

We shouldn't be using awscdk.io as our backend. Instead, we should be using the buckets currently being populated by the various lambda backend functions.

Move doc generation to `jsii-docgen`

Use rosetta to transliterate the assemblies
Use the docgen code to create the API reference.

Output should be a single markdown file that can rendered by the front-end. Should produce one file per submodule

Backend doc-generation PoC.

Implement a hacky solution for backend doc generation and deploy to environment so we can validate the affects it has on performance.

API reference: field name should be black and with larger font size

The field name looks too small in comparison to its type, default and other metadata.

If it can be solved quickly (~2h), lets resolve it. Otherwise, let's talk.

Backend: filter packages that don't have license file. Leave only packages that have open source license

Keep packages with the following license: BSD, MIT, Apache.

If the license file doesn't exist, the backend should add the standard license from https://spdx.dev.

Code samples in Python and TypeScript for all the packages are available in S3

Make sure that all the packages in S3 have code samples in Python and TypeScript (if they support it) available to be used by the frontend.

Make sure the code samples are displayed in the frontend.

Implement: Pre-rendering function

Implement a pre-rendering function that prepares data for efficient front-end processing. The actual output is TBD (pending front-end requirements).

Development app

Add an app for deploying the construct hub for development

Add `.jsii` objects to awscdk.io distribution

Simply store the .jsii assemblies in the S3 bucket next to the metadata.json object, so that the prototype Construct Hub can refer to those until it gets it's own storage medium.

Implement: Latest Builder Function

Runs On: S3 Object updates to a assembly.json object
Goal: Updates the latest.json object that has the last n (e.g: 20) packages indexed in the system ("hot items")
How: Incrementally updates the latest.json with new package versions. In order to ensure consistency, the provisioned concurrency of this function will be set to 1 (ensuring only 1 execution of this function happens at any time).
Input: S3 notification payload
Output: The latest.json object has been updated to include the latest n (e.g: 20) packages indexed in the system

First PR: Initial functionality, happy case testing
Second PR: Monitoring, testing edge cases

The latest builder function triggers off S3 Object update events for the assembly.json objects, and keeps the latest.json file up-to-date with the last 20-or-so packages indexed in the platform.

It uses object metadata and versioning to ensure the result remains consistent (see design doc)

Implement package deny-list

To ensure the integrity of the website and prevent recurring abuse we need to have the ability to block specific packages from being ingested.

Note that it is not sufficient to not list offending package in the client side, we need to prevent the package from entering the processing pipeline.

Implementation notes

We need to allow blocking a specific version of a package as well as all versions of a package.
(Security-requirement) "Alert when a specific package owner is hitting the deny list protection far more then normal. "

Report: #packages_that_support_python, #packages_failed, #packages_succeeded

Make sure we have accurate visibility:
#packages_that_support_python,
#python_failed,
#python_succeeded

#packages_that_support_typescript,
#typescript_failed,
#typescript_succeeded

Implement: Catalog Builder Function

Goal: Keep the catalog.json file up-to-date with the full collection of indexed packages (latest release of each major version line of each package)
Input: S3 object update (or create) notifications (assembly.json objects)
How: Updates the contents of the file given the notification payload, adding or updating entries in the list, then updating the catalog.json object. To ensure consistency, the functions provisioned concurrency is going to be pinned to 1 (ensuring only 1 instance of the function runs at any given time). If this does not work out (and only in this case), S3 object versioning can be leveraged together with S3 object metadata to detect concurrent updates to this object, and merge them into consistency.
Testing: Happy case, concurrent modification detection
Monitoring: Number of packages in the catalog file (or size of the file), classic lambda monitoring (execution failures, successes, etc...), concurrent modifications detected / rescued.

The catalog builder function triggers off a S3 update on the assembly.json objects, and keeps the catalog.json object up-to-date in S3, making sure the file contains the latest information about all indexed packages.

Implement deny-list

Implement a lambda function that gets a text file from s3 with names of packages that should be denied and removes them from s3.
The discovery function should make sure not to harvest them again.

Add support for markdown tables

https://constructs.dev/packages/cdk-dynamo-table-viewer/v/0.1.32?lang=typescript

https://constructs.dev/packages/datadog-cdk-constructs/v/0.3.0?lang=python

Migrate to CDK v2

Migrate to CDK v2.

Package info in `catalog.json` contains wrong metadata

Currently the metadata key in each package inside the catalog.json file contains the jsii package metadata.

construct-hub/src/backend/catalog-builder/catalog-builder.lambda.ts

Line 108 in 14f5c42

metadata: metadata.jsii.metadata,

This is wrong, it should contain the npm metadata contained in the metadata.json file of each package.

Inventory: Missing python docs is inaccurate

The inventory canary reports any missing Python assembly as a problem, but this is not the case if the original package does not have a Python target configured.

cdklabs / construct-hub Goto Github PK

construct-hub's People

Contributors

Stargazers

Watchers

Forkers

construct-hub's Issues

Implementation notes

Recommend Projects

Recommend Topics

Recommend Org