cdklabs / construct-hub Goto Github PK
View Code? Open in Web Editor NEWAWS CDK construct library that can be used to deploy instances of the Construct Hub in any AWS Account.
License: Apache License 2.0
AWS CDK construct library that can be used to deploy instances of the Construct Hub in any AWS Account.
License: Apache License 2.0
Goal: make sure we have code samples in S3 to every package in all supported language
How: Should be triggered per package + language, on every update/new S3 object event (on S3 assembly.json objects)
Dependent On: there's a separate task for repackaging JSII to support transpilation, #11
Implement asynchronous application of the jsii-transpile
tool on assemblies stored in S3, and add the language-specific variants of the assembly there.
The function triggers from a DynamoDB Stream event for updates to objects in the table.
Need to simplify the UI design for arrays.
For an internal enterprise instance of construct-hub, how could we implement some automated Sanitisation and "Securitisation" of user self-published constructs?
How would self-publishing work? Perhaps publishing a construct sends a codecommit repo to a codepipeline which runs user-defined tests (check there are some tests) but also enterprise-defined tests, and then to a final manual approval step before publication to the json/dynamodb store of constructs?
How do we set the org policies for the tests enterprise-defined testing requirements and other policies?
How are the references to the codecommit construct repos stored in the catalog/hub db?
Our discovery function polls npmjs.com
for new packages. Once a package is detected, the transliterator tries to npm install
this package from the dedicated CodeArtifact repository that mirrors npmjs.com
.
Occasionally, a delay may occur between the time a package is available on npmjs.com
and CodeArtifact. This means that we might be dropping new packages as they come in.
We need to make sure we have sufficient retry mechanism in-place that mitigate this delay.
Implement the observability dashboard with generic APIs as much as possible, so the observability patterns can be re-used in various other projects.
The dashboard design is to be based off the results of #8.
What: lambda function, should be executed every 5m
The goal: Harvest packages list from npm and notify construct hub
Input: context (last event that was processed = transaction ID)
Output: sends notification to SNS topic, payload = link to tmp directory in s3 where the package file is
How: the harvesting should start with the last transaction ID, unless we got to the end of the list. Save it to S3.
Reference: Construct Catalog lambda function (take into account this reference retrieves only the latest version of each package).
Testing: process the results and send SNS, Make sure we correctly react to basic network problems
Monitoring: execution of the function (every 5m), #messages of SNS topic, #downloads/execution
First PR: basic functionality + testing
Second PR: Production ready
Implement the discovery function based on the findings documented as a result of #2:
Create a lambda function or state machine, depending on runtime requirements, that triggers every x (minutes|hours) to poll the
registry.npmjs.org/db/_changes
endpoint and posts any relevant change events to a SNS topic. These change events will include a packages scope, name, id, and any other relevant information for documentation ingestion.The ingestion functionality should store a
last_event_id
that updates on every run so the following run knows where to begin from. This could be a long running state machine with an optional start parameter that would allow backfilling from any point in time.The ingestion function should filter on changes related only to packages with the appropriate tags ie: jsii, aws-cdk, etc.
The schema for the SNS notification follows:
import { Assembly } from '@jsii/spec';
export interface IngestionInput {
/**
* A unique identifier for the origin that notified about this package
* version. This may be a UUID generated per-origin. This identifier is used
* by the front-end application to determine appropriate package installation
* instructions (e.g: supporting private package registries).
*/
readonly origin: string;
/**
* The contents of the .jsii assembly file that has been disocered and is
* submitted for injestion in the Construct Hub.
*/
readonly assembly: Assembly;
/**
* The timestamp at which the version has been created in the package
* registry. When the object is in JSON form, this is encoded as an ISO-8601
* timestamp, preferrably using the UTC time zone.
*/
readonly time: Date;
/**
* A standardized checksum of the `assembly` and `time` fields formatted in
* some canonical form. The checksum is encoded as a string with the following
* format: `<algorithm>-<base64-encoded-hash>`.
*
* @example "sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
*/
readonly integrity: string;
}
In order to generate the front end, we currently need to types of files:
metadata.json
file, which contains npm metadata that isn't available in the assembly (e.g publish date).Right now, the metadata.json
file is missing from the backend, and only exists in awscdk.io
.
We need to populate the package data bucket with metadata.json
files as well.
Provision a complete CI/CD workflow, possibly using projen
, to support automated PR validations on the repository, and automated releases to the package managers.
Until the name is finalised, all publishing should either be "dry run", or targeting private repositories.
https://constructs.dev/packages/@matthewbonig/nightynight/v/0.1.3?lang=typescript
Open the GitHub link. You'll see an image in the README file. You can't see it on Construct Hub.
Perhaps it's related to another issue? #198
Determine what is needed on a dashboard to facilitate operating of the Construct Hub, including private instances.
Sketch out the organization of a CloudWatch dashboard with all the relevant metrics, alarms, etc... And links to the critical resources that may need quick access.
Monitoring API design from the frontend.
It should be connected to internal alarm system.
Keep in mind that the app will be public.
Desired Output: Deploy Canary + metrics + alarms + cloudwatch dashboard + send the metrics to internal alarm system.
Time estimation: ~2d
Runs On: new SNS message
Goal: Moves S3 object from tmp location to final S3 location
Input: SNS payload from the discovery function (S3 link to one package data)
Output: one package object is saved in S3 (path: assemblies/${assembly.name}/v${assembly.version}/assembly.json)
How: It validates the assembly and filters problematic packages + sanitize the JSII input (make sure it's in the right format), make sure the checksum matches
Reference: validation code in @jsii/spec
Testing: add basic tests
Monitoring: #dropped packages, #validated packages
First PR: Implement the function + testing
Second PR: Add monitoring
The function is triggered off an input event produced by the discovery function (in the SNS topic):
import { Assembly } from '@jsii/spec';
export interface IngestionInput {
/**
* The URI to an S3 object containing the NPM package tarball.
*
* @example "s3://<bucket>/<key>[?versionId=<versionId>]"
*/
readonly tarballUri: string;
/**
* The timestamp at which the version has been created in the package
* registry. When the object is in JSON form, this is encoded as an ISO-8601
* timestamp, preferrably using the UTC time zone.
*/
readonly time: Date;
/**
* (Optional) Metadata associated by the discovery function to the package
* version
*/
readonly metadata?: { readonly [key: string]: string };
/**
* A standardized checksum the request, encoded using the following format
* format: `<algorithm>-<base64-encoded-hash>`.
*
* @example "sha384-oqVuAfXRKap7fdgcCY5uykM6+R9GqQ8K/uxy9rx7HNQlGYl1kPzQho1wx4JwY8wC"
*/
readonly integrity: string;
}
The checksum can be computed using the following function:
import { createHash } from 'crypto';
function checksum(request: IngestionInput, tarball: Buffer, alg = 'sha384'): string {
const hash = createHash(alg);
const addField = (name: string, data: string | Buffer) =>
// <SOH> $name <STX> $data <ETX>
hash.update('\x01').update(name).update('\x02').update(data).update('\x03');
for (const [name, value] of Object.entries(request.metadata ?? {}).sort(([l], [r]) => l.localeCompare(r))) {
addField(`metadata/${name}`, value);
}
addField('tarball', tarball);
addField('time', request.time);
return `${alg}-${hash.digest('base64')}`;
}
It stores the assembly
data at the relevant S3 location.
Adds a lambda function that triggers from a dynamo DB stream to add make a new version of a package, or newly published package, available in construct-hub. The dynamo stream contains change events from relevant packages and includes the package scope, name, and version to be processed as described in #55.
The lambda will download the package artifacts from npm and store the relevant ones, .jsii and readme.md, in the target s3 bucket. No processing should be performed on the package artifacts as part of this task. This bucket should be made accessible to the construct-hub-webapp frontend via cloudfront.
It should help users distinguish between UGC and AWS
We constantly see errors from this release: https://github.com/cdklabs/construct-hub/actions/runs/874396829
The owner of this task makes sure to understand what are the sub tasks that are needed in order to achieve this goal. They create the relevant sub-tasks under PLANNED.
For example:
Please speak with Elad about it.
Goal: given a jsii-enabled package (e.g: the npm tgz
for the package), install the package & dependencies into a working directory, compile & transliterate all code examples therein into select target languages, then produce a transliterated .jsii
assembly file as a result.
How: jsii-rosetta
provides the functionality to perform the transliteration, however it does not bother with the dependency management (it is intended to run at build time, not against packaged libraries), and produces a rosetta tablet file instead of a transliterated jsii assembly document.
The jsii-transpile
tool would consume a .jsii
assembly file, and leverage jsii-rosetta
APIs to create one <lang>.jsii
file per configured target language, with all APIs renamed to the correct language representation, and sample code trans-literated to the correct language where possible.
For examples, for @aws-cdk/aws-autoscaling
:
Error: Sample uses literate source test/example.images.lit.ts, but not found: /tmp/workdir6yEJu7/node_modules/@aws-cdk/aws-autoscaling/test/example.images.lit.ts
--
stack.1 | at loadLiterateSource (/var/task/index.js:130207:15)
stack.2 | at Object.fixturize (/var/task/index.js:130188:18)
stack.3 | at Object.allTypeScriptSnippets (/var/task/index.js:139895:34)
stack.4 | at allTypeScriptSnippets.next (<anonymous>)
stack.5 | at Rosetta.addAssembly (/var/task/index.js:140065:22)
stack.6 | at async loadAssemblies (/var/task/index.js:142608:9)
stack.7 | at async transliterateAssembly2 (/var/task/index.js:142579:26)
stack.8 | at async Runtime.handler (/var/task/index.js:142734:7)
This happens even though we do have transliterated README in the api ref docs: https://docs.aws.amazon.com/cdk/api/latest/python/aws_cdk.aws_autoscaling/README.html
Since documentation is generated on the backend in a background, we need to make sure the webapp knows how to handle multiple versions of documentation pages.
More details here: https://github.com/aws/aws-cdk-rfcs/blob/master/text/0324-cdk-construct-hub.md#backend-x-frontend-compatibility
Make sure the discovery function brought all the packages to S3 and that they are now available to be used by the frontend
Currently we store the discovered packages in S3, we suggest that we migrate to use CodeArtifact. There a few reason to do so:
npmjs
in the transliteration process. The transliterator Lambda execute npm install
in order to get to full type definition, required to generate the code snippets in all target language. This means that the function requires full network outbound access in order to reach out to npm. Using CodeArtifact will allow us to block all outbound traffic.CodeArtifact allows configuring npm as an upstream repo which mean we don't need to mirror npm ourselves (we still need the discovery function to track updates to npm).
Note that from the ConstructHub operator perspective, this is a drop-in replacement for npmjs, we will implement it as an opt-in feature.
open questions:
I do see the following message in the console Error: “Package @aws-cdk does not exist in catalog“
The npmjs.com
package registry is backed by a CouchDB database, for which a replica is available for public consumption at skimdb.npmjs.com
.
The _changes
endpoint can be used to query the CouchDB for updates from a certain point in time (transaction ID), which would allow a Lambda function to discover new packages without having to resort to a full search on the package registry.
Ideally, the changes should be filtered with application selection criteria (keywords
should include one of aws-cdk
, cdk8s
, cdktf
or constructs
-- that list may grow in the future); however no index exists that backs such a search, and it may turn out to be too slow... In which case, filtering client-side may be more effective.
This task is to determine the correct approach to use to ensure a reliable discovery stream. This should also include considerations about error handling (what is and is not retry-able, etc...)
Create a lambda function or state machine, depending on runtime requirements, that triggers every x (minutes|hours) to poll the registry.npmjs.org/db/_changes
endpoint and store any relevant change events in a dynamo db table. These change events will include a packages scope, name, id, and any other relevant information for documentation ingestion.
The ingestion functionality should store a last_event_id
that updates on every run so the following run knows where to begin from. This could be a long running state machine with an optional start parameter that would allow backfilling from any point in time.
The ingestion function should filter on changes related only to packages with the appropriate tags ie: jsii, aws-cdk, etc.
The target dynamo table will trigger a stream for the package ingestion and processing as described in #54
This issue was reported by Matthew Bonig.
Check out the links at the intro of his package. If you click them you get to 404.
https://constructs.dev/packages/@matthewbonig/nightynight/v/0.1.3?lang=typescript
(In addition the images in this intro aren't resolved. Opening another ticket for that #199 )
Set up the basic code layout for a construct library, with all available jsii
languages properly configured (subject to future naming changes, though).
Basic construct entry point and unit testing set-up.
This should be achieved using projen
.
We shouldn't be using awscdk.io
as our backend. Instead, we should be using the buckets currently being populated by the various lambda backend functions.
Output should be a single markdown file that can rendered by the front-end. Should produce one file per submodule
Implement a hacky solution for backend doc generation and deploy to environment so we can validate the affects it has on performance.
The field name looks too small in comparison to its type, default and other metadata.
If it can be solved quickly (~2h), lets resolve it. Otherwise, let's talk.
Keep packages with the following license: BSD, MIT, Apache.
If the license file doesn't exist, the backend should add the standard license from https://spdx.dev.
Make sure that all the packages in S3 have code samples in Python and TypeScript (if they support it) available to be used by the frontend.
Make sure the code samples are displayed in the frontend.
Implement a pre-rendering function that prepares data for efficient front-end processing. The actual output is TBD (pending front-end requirements).
Add an app for deploying the construct hub for development
Simply store the .jsii
assemblies in the S3 bucket next to the metadata.json
object, so that the prototype Construct Hub can refer to those until it gets it's own storage medium.
Runs On: S3 Object updates to a assembly.json
object
Goal: Updates the latest.json
object that has the last n (e.g: 20) packages indexed in the system ("hot items")
How: Incrementally updates the latest.json
with new package versions. In order to ensure consistency, the provisioned concurrency of this function will be set to 1
(ensuring only 1 execution of this function happens at any time).
Input: S3 notification payload
Output: The latest.json
object has been updated to include the latest n (e.g: 20) packages indexed in the system
First PR: Initial functionality, happy case testing
Second PR: Monitoring, testing edge cases
The latest builder function triggers off S3 Object update events for the assembly.json
objects, and keeps the latest.json
file up-to-date with the last 20-or-so packages indexed in the platform.
It uses object metadata and versioning to ensure the result remains consistent (see design doc)
To ensure the integrity of the website and prevent recurring abuse we need to have the ability to block specific packages from being ingested.
Note that it is not sufficient to not list offending package in the client side, we need to prevent the package from entering the processing pipeline.
Make sure we have accurate visibility:
#packages_that_support_python,
#python_failed,
#python_succeeded
#packages_that_support_typescript,
#typescript_failed,
#typescript_succeeded
Goal: Keep the catalog.json
file up-to-date with the full collection of indexed packages (latest release of each major version line of each package)
Input: S3 object update (or create) notifications (assembly.json
objects)
How: Updates the contents of the file given the notification payload, adding or updating entries in the list, then updating the catalog.json
object. To ensure consistency, the functions provisioned concurrency is going to be pinned to 1
(ensuring only 1 instance of the function runs at any given time). If this does not work out (and only in this case), S3 object versioning can be leveraged together with S3 object metadata to detect concurrent updates to this object, and merge them into consistency.
Testing: Happy case, concurrent modification detection
Monitoring: Number of packages in the catalog file (or size of the file), classic lambda monitoring (execution failures, successes, etc...), concurrent modifications detected / rescued.
The catalog builder function triggers off a S3 update on the assembly.json
objects, and keeps the catalog.json
object up-to-date in S3, making sure the file contains the latest information about all indexed packages.
Implement a lambda function that gets a text file from s3 with names of packages that should be denied and removes them from s3.
The discovery function should make sure not to harvest them again.
Migrate to CDK v2.
Currently the metadata
key in each package inside the catalog.json
file contains the jsii package metadata.
This is wrong, it should contain the npm metadata contained in the metadata.json
file of each package.
The inventory canary reports any missing Python assembly as a problem, but this is not the case if the original package does not have a Python target configured.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.