ansible-community / github-docs-build Goto Github PK

GitHub actions and workflows for building Ansible collection documentation.

License: GNU General Public License v3.0

Shell 5.04% HTML 10.15% Jinja 84.81%

github-docs-build's Introduction

github-docs-build

This repository contains GitHub actions and reusable workflows you can use to generate documentation in your own collection repository on GitHub.

github-docs-build's People

Contributors

Stargazers

Watchers

Forkers

briantist felixfontein tremble gotmax23 theko2fi

github-docs-build's Issues

The shared workflows reference actions as though they are external rather than in the same repository

For example ansible-community/github-docs-build/actions/ansible-docs-build-init@main

This works but is problematic for several reasons:

the git ref is hardcoded, so no matter which version (ref) of the workflow you are using, the action will always be loaded from main
this also makes modifications via PR even more challenging than they already are because of the pull_request_target event, because during development and testing the refs all need to be changed to the PR branch, and then changed once again before merging
a fork would always be referencing the source repo's version rather than itself, unless permanently changed

Actions can be referenced from the local filesystem, for example the above would change to ./actions/ansible-docs-build-init. The possible challenge to this, is that typically, you must do a checkout on the runner first to ensure the files actually exist. In most repositories, this is pretty straightforward because you're already doing a checkout anyway.

With these shared workflows, a checkout is running in context of the calling repo, not the workflow's repo. I am not sure if there's a way that we can infer from within the workflow, which commit of the shared workflow itself has been referenced. If we can do that, we can also do a checkout of the repo and get the actions that way.

What would be really ideal, is to re-use the checkout that already occurred to get the workflow in the first place. Much like actions, shared workflows are automatically retrieved with a git clone/checkout by the runner system, so the files are somewhere. Composite actions have a variable they can access that gives the filesystem location of the action (so that files in the action can be referenced). If a similar variable exists for shared workflows that would be the ideal way to handle this. I have not seen any documentation that suggests that variable exists though.

In the reusing workflows page there is an example that references a local action:

jobs:
  reusable_workflow_job:
    runs-on: ubuntu-latest
    environment: production
    steps:
      - uses: ./.github/actions/my-action@v1
        with:
          username: ${{ inputs.username }}
          token: ${{ secrets.envPAT }}

The interesting thing about this example is that is clearly does not have a checkout step, implying that local action references will "just work" but I have not tried this yet. The example is not meant to demonstrate local action referencing though, so it's entirely possible that this is a non-working example.

A dependency of the surge package was vandalized

See:

This makes surge useless at the moment, but looking at the offending commit, it's not actually dangerous: Marak/colors.js@074a0f8#diff-92bbac9a308cd5fcf9db165841f2d90ce981baddcb2b1e26cfff170929af3bd1R18 just runs an infinite loop.

Note that the surge workflow drops all permissions except for contents: read, so it has no access to change anything it runs in, and can't read anything sensitive except the surge token passed into the workflow.

CI failures looking for `antsibull-docs<3,>=2`

CI is failing a bunch of tests because it seems to be trying to install a version of antisbull-docs that doesn't exist yet. I'm not sure yet where it's getting that range from.

https://github.com/ansible-community/github-docs-build/actions/runs/4922546028/jobs/8793437859#step:4:366

cc @felixfontein if you have any ideas

Bump ansible-core and python versions

Technically, 2.16 isn't GA yet, so we should probably wait for the actual release

Implement a publish target for GitHub Pages

Discussed in #9

^{Originally posted by briantist January 8, 2022}
GH Pages might be a nice target for publishing, as it'd be (theoretically) available to any collection without an external service. Since publishing primarily involves commits, it could be done completely within the actions ecosystem, though I know folks can be quite wary of automation committing to a repository.

Some things to think about are how to handle publishing for different PRs (sub-directories?), commit bloat (does this matter?), support for publishing in separate repositories (will require a PAT).

I think this makes a lot of sense, and will look to implement it.

Third party actions should be referenced more securely

We use a number of external actions from a handful of authors within our composite actions and workflows.

When actions were first released, the most common way to reference these was with a major version number, that corresponded to a tag, like actions/checkout@v2.

But this is insecure, as what is published at that tag can change any time. So best practice is to mitigate this with one of a few strategies, like forking the repo so you can reference from the fork (and update the fork at your convenience), or referencing a commit hash in the original repo (the commit could disappear but this prevents malicious changes).

For our use case, the latter option probably makes the most sense.

Doing this means we also need a way to keep up with upstream changes, review them, and update the hashes from time to time.

We should also decide whether we do this for every action, or if we will trust some publishers, like GitHub (anything under the actions/ organization).

As far as keeping the hashes up to date, I was looking at dependabot, which has support for this, but it seems like it only supports action references within workflow files, not references within composite actions (which is where we need it most).

We can now reference reusable workflows with local syntax

What does this mean?

It means we have a way to run tests against changes to shared workflows, by referencing them as "local":
https://github.blog/changelog/2022-01-25-github-actions-reusable-workflows-can-be-referenced-locally/

Basically, just like we do for the actions, we can create workflows in the repo that reference the shared workflows, to run tests against PRs and pushes to the workflows.

What this does not mean

This doesn't solve #4 , because there is still no way to load an action locally from the same repo as a shared workflow, as far as I can tell.

Allow for specifying the version of antsibull to be installed

Currently we're just installing the latest version that's on PyPI, but since antsibull is central to this whole process, it probably makes a lot of sense have the version be optional, especially in case of trying out new features. That probably also means installing from GitHub instead of PyPI, like we do for ansible-core (so version ends up as a ref, but releases are tagged with semver too).

Add linting to the build action

My original prototypes before this repository used the antsibull-docs lint-collection-docs command.

I think it got lost when I converted everything to actions/workflows, and for the most part, it was fine because if the docs didn't build, that would be your signal there was a problem :)

However, today I ran into an issue where the entire RST file I added was being ignored. So there was nothing in the output to indicate why, just errors from other pages that tried to reference the new ones.

Finally I ran lint and figured out the issue, and this is a great reminder why that command is still needed!

Failing workflow: https://github.com/ansible-collections/community.hashi_vault/runs/6249903735?check_suite_focus=true

This page did work in a previous workflow, so clearly something I added to the page made it fail, but I could not figure it out from build output alone.

Lint showed me the issue: I added some labels without the right prefix:

../../docs/docsite/rst/migration_hashi_vault_lookup.rst:216:0: Label "kv_response" does not start with expected prefix "ansible_collections.community.hashi_vault.docsite."
../../docs/docsite/rst/migration_hashi_vault_lookup.rst:287:0: Label "kv_replacements" does not start with expected prefix "ansible_collections.community.hashi_vault.docsite."

Enable "fail on error" for init action and workflows

Similar to #1

We should add support for the --fail-on-error flag introduced in ansible-community/antsibull#393 .

Add support for build command arguments to the build html action

The build action takes an input for the path to the build command, and defers actual building to that command.

Since paths can contain spaces, the path is quoted when invoked.

But that means you can't give arguments to the build command, because they'll end up inside the quotes.

A new input for arguments would allow for more flexibility.

For example, I wanted to use this in a different project that has a Makefile for building, and I wanted to give arguments to the make command, but couldn't do so. Instead I committed a build.sh to call make with the arguments.

docs build incorrectly uploads its containing folder in the artifact upload

After investigation, this is actually a problem with the build action. When it uploads the artifact, it points at the output html folder. The uploaded artifact then has its "root" contain the html folder and then everything is in there. The contents of the html directory should be the "root" of the artifact.

This might be an issue of having to use a trailing slash / for the path in the upload-artifact action.

Originally, when I published with Surge the root of the domain would be the root of the site. That's no longer the case, and I have to go to /html, as in https://community-hashi-vault-pr211.surge.sh/html/ to see the root of the site. That makes the first link wrong. Interestingly, the links for the individual files are all correct.

I suspect that this is due to the artifact upload/download, so I'll have to confirm that and get the pathing straightened out. And when I do that, I also have to make sure the link generation for the individual files still works correctly, or if it needs to be changed accordingly.

"Luckily"(?) this only affects my collection right now 😬

See:

ansible-collections/community.hashi_vault#211 (comment)

Or if the comment disappears, here was the rendered content:

##Docs Build 📝

Thank you for contribution!✨

The docs for this PR have been published here: https://community-hashi-vault-pr211.surge.sh

You can compare to the docs for the main branch here:

The docsite for this PR is also available for download as an artifact from this run: https://github.com/ansible-collections/community.hashi_vault/actions/runs/1733976750

File changes:

A html/collections/community/hashi_vault/docsite/lookup_guide.html

M html/collections/community/hashi_vault/docsite/contributor_guide.html

M html/collections/community/hashi_vault/docsite/user_guide.html

M html/collections/community/hashi_vault/hashi_vault_lookup.html

M html/collections/community/hashi_vault/index.html

M html/collections/community/hashi_vault/vault_login_lookup.html

M html/collections/community/hashi_vault/vault_login_module.html

M html/collections/community/hashi_vault/vault_read_lookup.html

M html/collections/community/hashi_vault/vault_read_module.html

Enable lenient mode as an option in the init action and workflows

antsibull has a --lenient switch which changes its default behavior of failing on warnings, but right now we are only using the defaults. This means that we can't build docs for any collections that have warnings in their build, like community.general.

This option should be exposed in the init action, and also exposed in any shared workflows that use it.

Support for checking out other collections

The community.aws docs depend on fragments from the amazon.aws docs. It would be good to have a mechanism to have the amazon.aws docs available when processing the community.aws docs

Exclude irrelevant files from the diff output

Any change in the docs results in changes to .js and .inv files that aren't useful to see. It'd be best to exclude them from the diff output, especially since the size of the output that we can display is limited by github API limits. In anything but the smallest of collections, the output from those seems large enough to exceed the maximum and will cause truncation (although these files tend to be at the end, so at least the truncation will most likely trim this output first).

We're using git diff --no-index to generate this output. --no-index is used because neither directory being compared is a git repo. git diff has a way to filter files, and the relevant filter should be something like this:

':(exclude)**/*.js' ':(exclude)**/*.inv'

However, the filter only works without --no-index.

One thing we can do when generating the diff output is to make one or both of the generated docs sets into a repo with git init and then do the comparison. This will need some experimentation to get the procedure right, and should probably be configurable via the action's inputs.

Add a validate job to the workflows

With #27 , we have some control over how strict the build is.

The way I see it, there should be a build that has the strictest settings, for validation, while builds for publishing (especially in PRs) should use the most lenient options, to give us the best chance of a publishable build (even if some pages have errors), because that's more useful for checking rendered output and comparing changes.

By using separate jobs, we can get published docs whenever possible, while still having the validate job "fail" the CI until all errors/warnings are fixed.

Currently using this pattern in community.hashi_vault and lowlydba.sqlserver, but it could use support here to avoid having to repeat some things.

In the push workflows, we can achieve this with the push reusable workflow already, with caveats:

we can't disable the artifact upload, so we set the name to _unused
- the underlying action can already disable artifact upload, we should expose that to the push workflow

In PR, the only reason we can't use the push reusable workflow is because it has no way to specify the ref, so:

we kind of repeat what the push workflow does but use the merge commit for checkout
- the push workflow could take a ref input
we skip on PR close events (no need to validate there)
should we add a validate reusable workflow?
- if so, we cannot use it in the other reusable workflows (can't nest reusables), so it will be some duplication
- but it does provide a convenient use for users who want to the run the validations separately
inclusion of a validate job in PR and Push workflows
- good (opinionated) default imo
- provide option to skip it? easy to do, some collections maybe don't want to fail CI on this
- if optional job is used in needs: it complicates the dependent jobs a little if you still want them to run; need to use if: always() && needs.validate.result [is skipped or successful] type of conditional
- validate is not needed in needs: for PRs, but probably would be for push workflows that also publish, if that validation should prevent publish; in theory the push workflow can just be invoked most strictly all the time, removing the need for a separate validation workflow, but this won't work for pre-init'd use cases, unless strictness can be controlled on build as well as init.

Last point leads to the idea of:

anstibull-docs sphinx-init generated build.sh could be updated to take arguments, allowing for some override of options chosen at init time

Incorrect commit is being used in PR builds

Originally posted here:

#17 (comment)

Re: using the merge SHA:

I am beginning to worry that there may be a problem with this.. I noticed in a previous PR, and now in a new one, that the docs build is always 1 push behind, somehow.

In this PR: ansible-collections/community.hashi_vault#223

I made my second push, which contained both of these commits (in this order):

ansible-collections/community.hashi_vault@ede88f8

ansible-collections/community.hashi_vault@47f166a

The docs build ran and failed: https://github.com/ansible-collections/community.hashi_vault/runs/5401472234?check_suite_focus=true

The HEAD checkout it did was using the merge_commit_sha, 0a8a68704a308eba6eac1decd31ac031b7593049

The build failed because the lookup plugin contained a reference to the module; this problem existed in my first push. It was fixed in the second, but is building as though it wasn't.

If I look directly at that commit, it looks like my first push, wherein the module was missing:
ansible-collections/community.hashi_vault@0a8a687

I don't have time to dig into it further now, but it looks a lot like this:
actions/checkout#237

I had been putting this off a bit due to this resuable workflow issue, in part because I saw the issues around the same time and couldn't be certain if they were related or not, and in part because even if they were unrelated, the issue made debugging and troubleshooting difficult.

With that issue solved, I am still seeing this happen.

Symptoms

This is easiest seen when docs are being published. The latest push will trigger the build and the docs built will be the contents of your previous push. I say "push" because a push containing multiple commits does not seem to build the docs of the second-to-last commit, but rather the last commit of the last push.

In a repo where the docs are not published the easiest way to see this would be to add errors egregious enough that the build will fail.

Triggers

I do not know yet all of the conditions under which this does or does not occur:

is merge SHA field itself flawed?
- it's not readily clear what it is supposed to represent, and my original tests showed it having the same content as the merge ref when both were available
is it related to reusable workflows somehow?
- my original tests were in direct workflows
- could test that again (to be sure) and also test the equivalent with reusable workflows
is it related to pull_request vs pull_request_target workdlows?
- I don't remember what my original test used as a trigger, will need to confirm
- could be a confounding factor with reusable workflows
is it a regression of actions/checkout#237 ?

Resolution

Ideally we can more clearly identify the triggers, and if it's a bug on GitHub's side we can report it.. past experience shows that niche issues like this will take a long time to address, if we can even get them to.

We may want to change the precedence of when we use the merge ref vs the merge SHA field.

For now we start with the SHA because it's easier to know if it's available or not, to then fall back to the merge commit ref.

Since we have nowhere to retrieve the merge ref and instead we build it, we need to either try the checkout and ignore failure, or do additional git operations and/or API calls to determine its validity if we want to start with that and "fall back" to merge SHA.

A PR diff may show docs changes if the source branch is not based on the latest target commit

For example, if a PR is created from a branch that based on main, and a new commit to main after branching changes docs, then the PR docs build will show that the PR changes the docs, even though that is not the case. In fact, the docs in main are ahead of what's in the PR, and ideally the PR's branch would be rebased, however it is not always necessary to rebase a branch when there are otherwise no conflicts.

The reason this happens is that in the PR workflow, we build the docs for both the BASE and HEAD commit of the PR. If the PR is against main for example, we build docs for main, but since the PR is based on an earlier main commit, the output will be different.

One thing we can try to do, is do our BASE build against the commit that the PR branch was started from. That would solve the situation laid out above.

But in another case, consider that the PR actually did change docs, but main is now ahead of the branch with its on changes. If we build against the PR branch's original base, we will showing the (intended) changes of the PR, but will not necessarily be representing the sate of the docs after the PR is merged.

Although GitHub will alert to conflicts in the tracked files, the rendered docsite is not tracked.

I'm not sure what the best or correct course of action is.

I don't think it's a good idea to do 3 builds, comparing the PR to both its own base and to its merge target, it seems a little too complicated.

A third option, whether the base render is based on latest target or on branch's base, is to detect that the branch has drifted, and include a warning in the PR comment that the differences shown may not be accurate, recommending a rebase if accuracy is a concern.

Better surface warnings from the docs build as warnings in GHA

This ought to be possible by telling GHA which patterns to look for as warnings, I forget the actual command syntax, will fill this out later, but jotting down now so I don't forget.

Artifact upload may have a practical limit we will hit in some circumstances

Part of splitting out the various pieces of a docs build/publish is ensuring that we use least privilege and prevent arbitrary code from having access to privileged tokens or secrets. As a result, the build portion of this process (which runs arbitrary code from contributors in the case of PRs) runs in its own job, with no secrets or write access. But it must make the rendered docsite files available to other jobs that might do something useful with them.

Because each job runs in its own virtual machine, these files cannot be directly accessed by other runners. The content is too large to send via outputs practically. The supported way for transferring data like this, is uploading build artifacts. Within a workflow, artifacts can be downloaded by other jobs within the same workflow run, so that is how we handle this, using the upload-artifact action.

This action is convenient, in that we can point it at a directory, and it will upload the whole tree. As described in the readme though, each file is uploaded individually, and for large numbers of files, this can result in hitting API/concurrency limits, making uploads slow and/or error-prone, possibly untenable. Their suggested solution is to tar the files beforehand, and upload a single file.

I anticipate that we may hit such limits in larger collections, or in use cases for the docs build process that build a docsite for multiple collections, so I would like to get ahead of that and implement pre-tarring as an option.

Note: compression is handled by the uploading process, so it's not necessary to also compress the files.

The caveats of doing this are that if someone wants to download the artifact from the build, it will be a .tar.gz (the downloads are only available this way despite each file being uploaded individually), and inside will be another tar that the end user will have to expand. Additionally, any action or workflow we have that uses the artifact (for example to publish it) will also need to be aware of this so that it can properly untar the downloaded artifact.

GitHub Actions: Deprecating save-state and set-output commands

Looking at the output of one of a recent docs validation run for amazon.aws I'm seeing:

 Warning: The `set-output` command is deprecated and will be disabled soon. Please upgrade to using Environment Files. For more information see: https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

It would appear that there's something that needs to change...

Surge workflow - secret availability is not tested

The surge workflow requires a secret, however we don't actually test that the secret is available. As a result, when a repo which uses the workflow successfully (like community.hashi_vault) is forked, the workflow in the fork fails unless the fork's owner sets a working secret.

I think the shared workflow should check for the existence of the secret first of all (GitHub does not provide a nice way to do this but I think we can compare to empty string), and it should provide an option that controls what to do when the secret is missing (skip tasks and be "successful" or fail), so that the caller can decide how to handle it.