Giter VIP home page Giter VIP logo

copyaid's Introduction

I mostly work in GitLab now

https://gitlab.com/castedo

Why? Because GitHub as a forge is not open-source. In contrast, the community edition of GitLab is open-source.

copyaid's People

Contributors

castedo avatar

Watchers

 avatar  avatar

copyaid's Issues

Enable user setting of language tag parameter

Let lang_tag act as a standard parameter to be automatically passed to prompt (request) settings. This is for values per https://datatracker.ietf.org/doc/html/rfc5646
https://www.rfc-editor.org/info/bcp47
https://en.wikipedia.org/wiki/IETF_language_tag
and the HTML lang tag.

Check a top-level key in the user config file (copyaid.toml) for lang_tag if a user wants to set it.

Extra nice addition feature is to

import locale
'-'.join(locale.getlocale()[0].split('_'))

to automatically populate this parameter if it is not specified in the config file.

enable setting of template placeholder for text format based on file extension

This is additional functionality beyond #2. Based on the file extension of the source file passed to copyaid, enable the setting of a template placeholder for the chat_system message setting.

Maybe the mapping between filename extension and text to be inserted should be direct like:

".md" -> "Markdown and HTML"
".tex" -> "LaTeX"

Note how "Markdown and HTML" could perform better in the prompt for GPT even though to a human, conceptually, "Markdown" should suffice, in theory.

Or maybe there should be a MIME type indirection in between:

".md" -> "text/markdown" -> "Markdown and HTML"
".tex" -> "application/x-latex" -> "LaTeX"

These mapping could be exclusively in the user copyaid.toml config file or the prompt query settings file. My inclination is that the mapping

".md" -> "text/markdown"
".tex" -> "application/x-latex"

should be in the user config copyaid.toml and the mapping

"text/markdown" -> "Markdown and HTML"
"application/x-latex" -> "LaTeX"

should be in the settings file, along with the OpenAI API settings like the prompt template and the GPT model setting.

Make a GitHub template repository for CopyAId

Make a GitHub template repository like:
https://github.com/castedo/baseprint-starter
but instead of for Baseprinter it is for CopyAId.

This is very similar to, totally inspired by, and roughly speaking a knock-off of https://github.com/manubot/manubot-ai-editor/.

However, compared to Manubot AI Editor, there are five benefits to this enhancement which I think are superior to Manubot AI Editor:

  1. there is no dependency on Manubot
  2. it is calling the same CopyAId tool that is designed and optimized for manual CLI usage
  3. Users create a repository with a clean history from a template repository rather than a rootstock style repository
  4. the user will have a repository that calls GitHub Action for better versioning and isolation from changes
  5. the bulk of the logic for calling CopyAI is inside a Docker container image rather than GitHub configuration YAML files.

As a user, benefit 2 is important to me. I want to see and merge in copyedited text from OpenAI locally with local files BEFORE committing. And also kick off and get back AI copyedits quickly too, and not have to wait for GitHub actions to complete.

enable template placeholders in system_prompt message setting

Some parts of prompts change slightly depending on scenario. In particular, making corrections based on American vs British English. Another example is editing Markdown vs LaTeX formatted source text.

This feature is to enable the chat_system settings inside a prompt request settings file to contain a template with a placeholder for English dialect and a way to select between American and British in the copyaid.toml user configuration file.

Copybreak line is only way to skip the start of a file

Placing

<!-- copybreak off -->

at the top of a file is currently the only way to skip processing the start of a file.

But in some contexts this is very inconvenient. For example:

  1. when using mkdocs one wants to have the title at the top of the file so the title is automatically placed in the navigation bar
  2. or having metadata of a pandoc Markdown file at the top of a file.

FEATURE: copybreak with optional subtask

The following is copied from https://gitlab.com/castedo/copyaid/-/issues/6

MOTIVATION

Mass testing with prompts indicates that quality of GPT output quality degrades as the inputs get longer and longer. It is also more expensive to send all text in a file. It is a bit of pain to have to break up documents into smaller files merely for the reason to have less text sent to OpenAI. It is also quite annoying to have OpenAI suggesting lots of changes to sections of text that have already been worked on when only other sections are in need of copyediting/proofreading. This feature is relatively simple to implement and provides users lots of flexibility to control behavior and mitigate these problems.

FEATURE

Allow specially marked lines to act as "copybreaks" within source text. These line are not included in OpenAI request text and instead force a break up of the source file into separate chunks that become part of separate prompt texts for the OpenAI API.

For markdown (.md) an example copybreak line is:

<!-- copybreak -->

and for LaTex (.tex) an example copybreak line is:

%% copybreak

The config file for Copyaid allows control of the exact line prefix per file type (based on file extension) and the keyword. For the above example, the config in TOML would be something like:

copybreak = {
    'md' = ['<!--', 'copybreak'],
    'tex' = ['%%', 'copybreak'],
}

Optionally a subtask name can follow the marking prefix, after 'copybreak' and whitespace. What prompts and requests are triggered, if any, given the subtask name is controlled from the config file. Some subtask names can be configured to skip being sent to OpenAPI and so that the chunk of text is left as is. For example:

<!-- copybreak skip -->

and

%% copybreak skip

will cause all further text to be skipped from being sent to OpenAI until a difference subtask name is encountered.

When no subtask name is specified, whatever was the last subtask name specified is used again. The configuration for a copyaid task can specify the initial subtask name to take effect. Some users might want it to be "on" and the skip subtask name to be "off".

During an initial experimental stage I plan to use "light" and "heavy" as subtask names corresponding to light/heavy copy-editing and will probably configure "skip" as the initial subtask.

RELATED

https://github.com/manubot/manubot-ai-editor/ automatically splits up files into "paragraphs" and sends them as separate chunks to OpenAI. I find the logic for parsing apart "paragraphs" too fragile, hard-coded, and error prone to be acceptable as a default for entire files. As a future feature, I imagine some CopyAId subtask names can enable similar automatic break up, but not by default. The automatic additional breaking would only happen because a particular subtask of a copybreak has as enabled it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.