Hugo Front Matter Tools

A collection of Python scripts desinged to help manage Hugo .md content front matter.

Python

See Proper Python for applicable guidance when enabling Python parts of this workflow.

Quick Start

Since the project's Python infrastructure has already been created, but is not saved in our GitHub repo, all you need to do to get started locally is this:

╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/hugo-front-matter-tools ‹main› 
╰─$ git pull
Already up to date.
╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/hugo-front-matter-tools ‹main› 
╰─$ python3 -m venv .venv
╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/hugo-front-matter-tools ‹main●› 
╰─$ source .venv/bin/activate
(.venv) ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/hugo-front-matter-tools ‹main●› 
╰─$ pip3 install -r python-requirements.txt

Then...

(.venv) ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/hugo-front-matter-tools ‹main●›
╰─$ python3 rootstalk-front-matter-to-google-sheet.py

And... Use this link to open the Rootstalk Articles Front Matter Google Sheet.

History

14-Sep-2023 - Updated to move from the old rootstalk project to npm-rootstalk.

front-matter-to-csv.py (Deprecated)

This script is/was designed to read all of the .md files in a directory tree, and populate a single .csv file with the contents/status of all the frontmatter found in those Markown files. Inspired by convert.py from the lukas/frontmatter-to-csv repo.

rootstalk-front-matter-to-google-sheet.py

Having evolved from front-matter-to-csv.py, this script is designed to read all of the ROOTSTALK .md files in a directory tree and populate a specified .csv file (and dedicated Google Sheet) with the contents/status of all the frontmatter found in those Markdown files.

This script is specific to Rootstalk because many of the constants and static elements (documented below) have become specific to our Rootstalk structure. However, the concept of harvesting and reporting .md file front matter remains unchanged. This script could easily be adapted to work with other front matter structures.

Use this link to open the Rootstalk Articles Front Matter Google Sheet.

Link Generation

On October 25, 2022, a feature was added to this script for generation of links to corresponding "development" site pages. So, for instance, the row of .csv data that previously contained this:

volume-viii-issue-1,trissell.md,Food Deserts in t...,9,trissell,,05/10/2022 15:22:00,False,1,author,Mikayla Trissell,Trissel-1-volume-...

...now contains this:

volume-viii-issue-1,trissell.md,https://icy-tree-020380010.azurestaticapps.net/volume-viii-issue-1/trissell ,Food Deserts in t...,9,trissell,,05/10/2022 15:22:00,False,1,author,Mikayla Trissell,Trissel-1-volume-...

A link to the article trissell, for our example, now appears in the 3rd column of the .csv data. Clicking that link from any view of the .csv file should open the "icy-tree" development version of the current article. Note that this version of the website is actively generated from the main branch of the Rootstalk project code.

Google Sheets Integration

Found what looks like useful, and current, guidance in How to Connect Python to Google Sheets. Opened a new google-sheet feature branch of the code, and away we go...

Switching guidance to the more relevant Python quickstart document.

Choose to enable the Google Sheets API in my hugo-frontmatter-tools project.
Had to add at leat one "test" user to get past app authentication, so [email protected] was authorized as a test user.
Remaining process was as-advertised and generated a new credentials.json file as expected.
Ran quickstart.py on my Mac Mini to authorize the app here. A token.json file was subsequently generated in the working directory.

Next, with credentials.json in-hand I moved back to the guidance at How to enable Python access to Google Sheets in How to Connect Python to Google Sheets.

Created the new sheets.py script and changed the spreadsheet title to Rootstalk-Articles-Frontmatter-Export in line 10.

And apparently my credentials.json doesn't have the right format? Probably because How to Connect Python to Google Sheets is written for a web application, but I need a desktop app?

Reset!

So, this 12 minute 45 second video appears to be a much better, more relevant and elegantly simple, explanation! This approach will leverage the gspread Python library, and it assumes that we already have a Google Sheet (I'm going to make one very soon!) and we simply "share" it with the "service account" that we're going to create with Google Drive API and Google Sheet API services enabled.

Making the Google Sheet

First, I open https://google.com logged in as [email protected], the account that will own our new Google Sheet. I select the Google Apps menu and pick Sheets which opens this page which is specific to the [email protected] account.
Here I created my new "blank" Google Sheet and changed it's name to Rootstalk Articles Frontmatter.
I changed the name of Sheet1 to 25-Oct-2022 to match today's date.
Then I selected File and Import from the menu, then Upload, and browsed to select the latest frontmatter-status.csv file for import.
I imported the .csv file to the current sheet.

The all-important Google Sheet ID is: https://docs.google.com/spreadsheets/d/1cOYyS5gwU3HbTG8aVkaBwFPL1Z_7U25bJBCKCePFafI

Integration with the Google Sheet

Following the video I did this...

From my Google Developer dashboard I selected the hugo-frontmatter-tools project I had created earlier.
I made sure both Google Drive API and Google Sheets API are enabled for the project.
Next, from https://console.cloud.google.com/apis/dashboard?project=hugo-frontmatter-tools I clicked Credentials in the left menu.
I clicked Manage Service Accounts just above the right end of the Service Accounts tab.
I clicked Create Service Account near the top-center of the window.
In Service Account Details I gave my SA a name of hugo-frontmatter-tools which generated a corresponding ID with an email address of hugo-formatter-tools@hugo-frontmatter-tools.iam.gserviceaccount.com. <-- This is important!
Clicked Create and Continue and accepted all defaults until Done.
I opened our Google Sheet and used the Share button in the upper-right corner to open the Share "Rootstalk Articles Frontmatter" sheet dialog.
In the dialog I pasted the all-important service account email address from above, clicked Share (as an Editor), and then Done.
Back in the service account page I clicked the three dots on the right and chose Manage Keys as instructed in the video.
I clicked Add Key, Create New Key and selected JSON and CREATE. Google generated a new key and uploaded it as hugo-frontmatter-tools-a027c8a25d36.json in my ~/Downloads folder.

That completes the credentials acquisition process. Per the video I then returned to the project window here in VSCode and...

Activate and enable the gspread project to our virtual environment like so:

source .venv/bin/activate
pip3 install gspread
pip3 freeze > python-requirements.txt

Move the JSON key file to the required location, like so:

mkdir ~/.config/gspread
mv ~/Downloads/hugo-frontmatter-tools-a027c8a25d36.json ~/.config/gspread/service_account.json

Note: Those last commands above will need to be repeated on other platforms before Google Sheet integration can be achieved there!

google-sheet-test.py

Time for some testing so I created the google-sheet-test.py script by borrowing code from the video's script.py source. I changed specifics as necessary in the open and worksheet function calls, and commented out anything that could damage our data, to yield this:

import gspread

sa = gspread.service_account()
sh = sa.open("Rootstalk Articles Front Matter")

wks = sh.worksheet("26-Oct-2022-09:08PM")

print('Rows: ', wks.row_count)
print('Cols: ', wks.col_count)

print(wks.acell('A9').value)
print(wks.cell(3, 4).value)
print(wks.get('A7:E9'))

# print(wks.get_all_records())
# print(wks.get_all_values())

# wks.update('A3', 'Anthony')
# wks.update('D2:E3', [['Engineering', 'Tennis'], ['Business', 'Pottery']])
# wks.update('F2', '=UPPER(E2)', raw=False)

# wks.delete_rows(25)

Running that code as a test produced this:

(.venv) ╭─mark@Marks-Mac-Mini ~/GitHub/hugo-front-matter-tools ‹google-sheet*› 
╰─$ /Users/mark/GitHub/hugo-frontmatter-tools/.venv/bin/python /Users/mark/GitHub/hugo-front-matter-tools/google-sheet-test.py
Rows:  1000
Cols:  26
volume-viii-issue-1
The Streets of Da...
[['volume-viii-issue-1', 'obrien.md', 'https://icy-tree-020380010.azurestaticapps.net/volume-viii-issue-1/obrien', 'A Woman and the Land', '7'], ['volume-viii-issue-1', 'kessel.md', 'https://icy-tree-020380010.azurestaticapps.net/volume-viii-issue-1/kessel', 'Prairie Style: Wr...', '14'], ['volume-viii-issue-1', 'thompson.md', 'https://icy-tree-020380010.azurestaticapps.net/volume-viii-issue-1/thompson', 'Dark Skies or Lig...', '4']]

Huzzah!

Google Sheet Feature is Complete

Development of Google Sheet integration within the script prompted me to rename the old script from front-matter-to-csv.py to front-matter-to-google-sheet.py, but this new script still generates a new front-matter-status.csv file each time it's run.

After the .csv is generated the script attempts to open our Rootstalk Articles Front Matter Google Sheet where it creates a new worksheet/tab named with the current date/time. The .csv file contents are then uploaded to the new worksheet using this block of code:

  try:
    paste_csv(csv_filename, sh, sheet_name + "!A1")
  except Exception as e:
    print(e)

When it works we get a new worksheet/tab full of current front matter data in our all-important Google Sheet, https://docs.google.com/spreadsheets/d/1cOYyS5gwU3HbTG8aVkaBwFPL1Z_7U25bJBCKCePFafI/.

Documentation

Yes, there is more! Check out Google-Sheet-Front-Matter.mp4 sometime when you have 15 minutes to kill. 😄

Static Elements

Static elements of the script currently include the following.

branches

List of the three code branches for which links are generated in the Google Sheet.

branches = [ "develop", "main", "production" ]

Expected Fields

fields = {
  "md-path": "Content Path",
  "md-file": "Filename",
  "develop-link": "develop Link",
  "main-link": "main Link",
  "production-link": "Production Link",
  "title": "title",
  "last_modified_at": "last_modified_at",
  "to-do": "to-do List",
  "articleIndex": "articleIndex",
  "description": "description",
  "azure_dir": "azure_dir",
  "obsolete": "Obsolete Front Matter",
  "contributors": "contributors",
  "role": "contributor.role",
  "name": "contributor.name",
  "headshot": "contributor.headshot",
  "caption": "contributor.caption",
  "bio": "contributor.bio",
  "categories": "categories",
  "header_image": "header_image",
  "filename": "header_image.filename",
  "alt_text": "header_image.alt_text",
  "tags": "tags",
  "byline": "byline",
  "byline2": "byline2",
  "subtitle": "subtitle",
  "no_leaf_bug": "no_leaf_bug",
  "index": "index",
  "date": "date",
  "draft": "draft"
}

Obsolete Fields

The following fields are considered "obsolete" and deprecated. Articles that still posses these fields should be upgraded to use corresponding fields as soon as possible.

obsolete = ["sidebar", "pid", "issueIndex", "azure_headerimage", "author", "azure_headshot", "authorbio", "headerimage", "articletype"]

Contributor `dict` Fields

contributor_fields = ["role", "name", "headshot", "caption", "bio"]

Header Image `dict` Fields

header_image_fields = ["filename", "alt_text"]

Input `glob` File Spec

 filepath = str(pathlib.Path.home()) + "/GitHub/rootstalk/content/**/volume*/*.md"

CSV Output Filename

  csv_filename = "front-matter-status.csv"

Link Generation Base URL

The links used here are automatically built by the two principal braches of the Rootstalk project repo, and a third for production at https://rootstalk.grinnell.edu.

def build_link(k, path):
  base_urls = { "develop":"https://yellow-wave-0e513e510.3.azurestaticapps.net/", "main":"https://yellow-wave-0e513e510.3.azurestaticapps.net/", "production":"https://rootstalk.grinnell.edu/" }

digital-grinnell / hugo-front-matter-tools Goto Github PK