Giter VIP home page Giter VIP logo

taxonomy's Introduction

InstructLab ๐Ÿถ (ilab)

Lint Tests Build Release License

๐Ÿ“– Contents

Welcome to the InstructLab CLI

InstructLab ๐Ÿถ uses a novel synthetic data-based alignment tuning method for Large Language Models (LLMs.) The "lab" in InstructLab ๐Ÿถ stands for Large-Scale Alignment for ChatBots [1].

[1] Shivchander Sudalairaj*, Abhishek Bhandwaldar*, Aldo Pareja*, Kai Xu, David D. Cox, Akash Srivastava*. "LAB: Large-Scale Alignment for ChatBots", arXiv preprint arXiv: 2403.01081, 2024. (* denotes equal contributions)

๐ŸŽบ What's new

InstructLab release 0.17.0 on June 14, 2024 contains updates to the ilab CLI design. The ilab commands now fall into groups for an easier workflow and understanding of the commands. For more information, see the InstructLab CLI reference To view all the available flags for each command group, use the --help tag after the command. The original commands are still in effect, but will be deprecated in the planned release 0.19.0.

โ“ What is ilab

ilab is a Command-Line Interface (CLI) tool that allows you to perform the following actions:

  1. Download a pre-trained Large Language Model (LLM).
  2. Chat with the LLM.

To add new knowledge and skills to the pre-trained LLM, add information to the companion taxonomy repository.

After you have added knowledge and skills to the taxonomy, you can perform the following actions:

  1. Use ilab to generate new synthetic training data based on the changes in your local taxonomy repository.
  2. Re-train the LLM with the new training data.
  3. Chat with the re-trained LLM to see the results.
graph TD;
  download-->chat
  chat[Chat with the LLM]-->add
  add[Add new knowledge\nor skill to taxonomy]-->generate[generate new\nsynthetic training data]
  generate-->train
  train[Re-train]-->|Chat with\nthe re-trained LLM\nto see the results|chat
Loading

For an overview of the full workflow, see the workflow diagram.

Important

We have optimized InstructLab so that community members with commodity hardware can perform these steps. However, running InstructLab on a laptop will provide a low-fidelity approximation of synthetic data generation (using the ilab data generate command) and model instruction tuning (using the ilab model train command, which uses QLoRA). To achieve higher quality, use more sophisticated hardware and configure InstructLab to use a larger teacher model such as Mixtral.

๐Ÿ“‹ Requirements

  • ๐ŸŽ Apple M1/M2/M3 Mac or ๐Ÿง Linux system (tested on Fedora). We anticipate support for more operating systems in the future.
  • C++ compiler
  • Python 3.10 or Python 3.11
  • Approximately 60GB disk space (entire process)

NOTE: Python 3.12 is currently not supported, because some dependencies don't work on Python 3.12, yet.

NOTE: When installing the ilab CLI on macOS, you may have to run the xcode-select --install command, installing the required packages previously listed.

โœ… Getting started

๐Ÿงฐ Installing ilab

  1. When installing on Fedora Linux, install C++, Python 3.10 or 3.11, and other necessary tools by running the following command:

    sudo dnf install gcc gcc-c++ make git python3.11 python3.11-devel

    If you are running on macOS, this installation is not necessary and you can begin your process with the following step.

  2. Create a new directory called instructlab to store the files the ilab CLI needs when running and cd into the directory by running the following command:

    mkdir instructlab
    cd instructlab

    NOTE: The following steps in this document use Python venv for virtual environments. However, if you use another tool such as pyenv or Conda Miniforge for managing Python environments on your machine continue to use that tool instead. Otherwise, you may have issues with packages that are installed but not found in venv.

  3. There are a few ways you can locally install the ilab CLI. Select your preferred installation method from the following instructions. You can then install ilab and activate your venv environment.

    NOTE: โณ pip install may take some time, depending on your internet connection. In case installation fails with error unsupported instruction `vpdpbusd', append -C cmake.args="-DLLAMA_NATIVE=off" to pip install command.

    See the GPU acceleration documentation for how to to enable hardware acceleration for interaction and training on AMD ROCm, Apple Metal Performance Shaders (MPS), and Nvidia CUDA.

    Install using PyTorch without CUDA bindings and no GPU acceleration

    python3 -m venv --upgrade-deps venv
    source venv/bin/activate
    pip install instructlab

    NOTE: Additional Build Argument for Intel Macs

    If you have an Mac with an Intel CPU, you must add a prefix of CMAKE_ARGS="-DLLAMA_METAL=off" to the pip install command to ensure that the build is done without Apple M-series GPU support.

    (venv) $ CMAKE_ARGS="-DLLAMA_METAL=off" pip install ...

    Install with AMD ROCm

    python3 -m venv --upgrade-deps venv
    source venv/bin/activate
    pip cache remove llama_cpp_python
    pip install 'instructlab[rocm]' \
       --extra-index-url https://download.pytorch.org/whl/rocm6.0 \
       -C cmake.args="-DLLAMA_HIPBLAS=on" \
       -C cmake.args="-DAMDGPU_TARGETS=all" \
       -C cmake.args="-DCMAKE_C_COMPILER=/opt/rocm/llvm/bin/clang" \
       -C cmake.args="-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++" \
       -C cmake.args="-DCMAKE_PREFIX_PATH=/opt/rocm" \
       -C cmake.args="-DLLAMA_NATIVE=off"

    On Fedora 40+, use -DCMAKE_C_COMPILER=clang-17 and -DCMAKE_CXX_COMPILER=clang++-17.

    Install with Apple Metal on M1/M2/M3 Macs

    NOTE: Make sure your system Python build is Mach-O 64-bit executable arm64 by using file -b $(command -v python), or if your system is setup with pyenv by using the file -b $(pyenv which python) command.

    python3 -m venv --upgrade-deps venv
    source venv/bin/activate
    pip cache remove llama_cpp_python
    pip install 'instructlab[mps]'

    Install with Nvidia CUDA

    For the best CUDA experience, installing vLLM is necessary to serve Safetensors format models.

    python3 -m venv --upgrade-deps venv
    source venv/bin/activate
    pip cache remove llama_cpp_python
    pip install 'instructlab[cuda]' \
       -C cmake.args="-DLLAMA_CUDA=on" \
       -C cmake.args="-DLLAMA_NATIVE=off"
    pip install vllm@git+https://github.com/opendatahub-io/[email protected]
  4. From your venv environment, verify ilab is installed correctly, by running the ilab command.

    ilab

    Example output of the ilab command

    (venv) $ ilab
    Usage: ilab [OPTIONS] COMMAND [ARGS]...
    
    CLI for interacting with InstructLab.
    
    If this is your first time running InstructLab, it's best to start with `ilab config init` to create the environment.
    
    Options:
    --config PATH  Path to a configuration file.  [default:
                   /home/user/.config/instructlab/config.yaml]
    -v, --verbose  Enable debug logging (repeat for even more verbosity)
    --version      Show the version and exit.
    --help         Show this message and exit.
    
    Commands:
    config    Command Group for Interacting with the Config of InstructLab.
    data      Command Group for Interacting with the Data generated by...
    model     Command Group for Interacting with the Models in InstructLab.
    system    Command group for all system-related command calls
    taxonomy  Command Group for Interacting with the Taxonomy of InstructLab.
    
    Aliases:
    chat      model chat
    convert   model convert
    diff      taxonomy diff
    download  model download
    evaluate  model evaluate
    generate  data generate
    init      config init
    list      model model_list
    serve     model serve
    sysinfo   system info
    test      model test
    train     model train
    

    IMPORTANT Every ilab command needs to be run from within your Python virtual environment. You can enter the Python environment by running the source venv/bin/activate command.

  5. Optional: You can enable tab completion for the ilab command.

    Bash (version 4.4 or newer)

    Enable tab completion in bash with the following command:

    eval "$(_ILAB_COMPLETE=bash_source ilab)"

    To have this enabled automatically every time you open a new shell, you can save the completion script and source it from ~/.bashrc:

    _ILAB_COMPLETE=bash_source ilab > ~/.ilab-complete.bash
    echo ". ~/.ilab-complete.bash" >> ~/.bashrc

    Zsh

    Enable tab completion in zsh with the following command:

    eval "$(_ILAB_COMPLETE=zsh_source ilab)"

    To have this enabled automatically every time you open a new shell, you can save the completion script and source it from ~/.zshrc:

    _ILAB_COMPLETE=zsh_source ilab > ~/.ilab-complete.zsh
    echo ". ~/.ilab-complete.zsh" >> ~/.zshrc

    Fish

    Enable tab completion in fish with the following command:

    _ILAB_COMPLETE=fish_source ilab | source

    To have this enabled automatically every time you open a new shell, you can save the completion script and source it from ~/.bashrc:

    _ILAB_COMPLETE=fish_source ilab > ~/.config/fish/completions/ilab.fish

๐Ÿ—๏ธ Initialize ilab

  1. Initialize ilab by running the following command:

    ilab config init

    Example output

    Welcome to InstructLab CLI. This guide will help you set up your environment.
    Please provide the following values to initiate the environment [press Enter for defaults]:
    Path to taxonomy repo [taxonomy]: <ENTER>
  2. When prompted by the interface, press Enter to add a new default config.yaml file.

  3. When prompted, clone the https://github.com/instructlab/taxonomy.git repository into the current directory by typing y.

    Optional: If you want to point to an existing local clone of the taxonomy repository, you can pass the path interactively or alternatively with the --taxonomy-path flag.

    Example output after initializing ilab

    (venv) $ ilab config init
    Welcome to InstructLab CLI. This guide will help you set up your environment.
    Please provide the following values to initiate the environment [press Enter for defaults]:
    Path to taxonomy repo [taxonomy]: <ENTER>
    `taxonomy` seems to not exists or is empty. Should I clone https://github.com/instructlab/taxonomy.git for you? [y/N]: y
    Cloning https://github.com/instructlab/taxonomy.git...

    ilab will use the default configuration file unless otherwise specified. You can override this behavior with the --config parameter for any ilab command.

  4. When prompted, provide the path to your default model. Otherwise, the default of a quantized Merlinite model will be used - you can download this model with ilab model download (see below).

    (venv) $ ilab config init
    Welcome to InstructLab CLI. This guide will help you set up your environment.
    Please provide the following values to initiate the environment [press Enter for defaults]:
    Path to taxonomy repo [taxonomy]: <ENTER>
    `taxonomy` seems to not exists or is empty. Should I clone https://github.com/instructlab/taxonomy.git for you? [y/N]: y
    Cloning https://github.com/instructlab/taxonomy.git...
    Path to your model [/home/user/.cache/instructlab/models/merlinite-7b-lab-Q4_K_M.gguf]: <ENTER>
  5. When prompted, please choose a train profile. Train profiles are GPU specific profiles that enable accelerated training behavior. If you are on MacOS or a Linux machine without a dedicated GPU, please choose No Profile (CPU-Only) by hitting Enter. There are various flags you can utilize with individual ilab commands that will allow you to utilize your GPU if applicable.

    Welcome to InstructLab CLI. This guide will help you to setup your environment.
    Please provide the following values to initiate the environment [press Enter for defaults]:
    Path to taxonomy repo [/home/user/.local/share/instructlab/taxonomy]: 
    Path to your model [/home/user/.cache/instructlab/models/merlinite-7b-lab-Q4_K_M.gguf]: 
    Generating `/home/user/.config/instructlab/config.yaml`...
    Please choose a train profile to use:
    [0] No profile (CPU-only)
    [1] A100_H100_x2.yaml
    [2] A100_H100_x4.yaml
    [3] A100_H100_x8.yaml
    [4] L40_x4.yaml
    [5] L40_x8.yaml
    [6] L4_x8.yaml
    Enter the number of your choice [hit enter for the default CPU-only profile] [0]:  
    Using default CPU-only train profile.
    Initialization completed successfully, you're ready to start using `ilab`. Enjoy!

    The GPU profiles are listed by GPU type and number. If you happen to have a GPU configuration with a similar amount of VRAM as any of the above profiles, feel free to try them out!

ilab directory layout after initializing your system

After running ilab config init your directories will look like the following on a Linux system:

โ”œโ”€ ~/.cache/instructlab/models/ (1)
โ”œโ”€ ~/.local/share/instructlab/datasets (2)
โ”œโ”€ ~/.local/share/instructlab/taxonomy (3)
โ”œโ”€ ~/.local/share/instructlab/checkpoints (4)
  1. ~/.cache/instructlab/models/: Contains all downloaded large language models, including the saved output of ones you generate with ilab.
  2. ~/.local/share/instructlab/datasets/: Contains data output from the SDG phase, built on modifications to the taxonomy repository.
  3. ~/.local/share/instructlab/taxonomy/: Contains the skill and knowledge data.
  4. ~/.local/share/instructlab/checkpoints/: Contains the output of the training process

On MacOS, these directories will be under Library/Application Support/instructlab. This directory setup is temporary in 0.18.0 and will mimic the Linux paths in future releases. The models directory will be under Library/Caches/instructlab.

๐Ÿ“ฅ Download the model

  • Run the ilab model download command.

    ilab model download

    ilab model download downloads a compact pre-trained version of the model (~4.4G) from HuggingFace:

    (venv) $ ilab model download
    Downloading model from Hugging Face: instructlab/merlinite-7b-lab-GGUF@main to /home/user/.cache/instructlab/models...
    ...
    INFO 2024-08-01 15:05:48,464 huggingface_hub.file_download:1893: Download complete. Moving file to /home/user/.cache/instructlab/models/merlinite-7b-lab-Q4_K_M.gguf

    NOTE โณ This command can take few minutes or immediately depending on your internet connection or model is cached. If you have issues connecting to Hugging Face, refer to the Hugging Face discussion forum for more details.

    Downloading a specific model from a Hugging Face repository

  • Specify repository, model, and a Hugging Face token if necessary. More information about Hugging Face tokens can be found here

    HF_TOKEN=<YOUR HUGGINGFACE TOKEN GOES HERE> ilab model download --repository=TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF --filename=mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf

    Downloading an entire Hugging Face repository (Safetensors Model)

  • Specify repository, and a Hugging Face token if necessary. For example:

    HF_TOKEN=<YOUR HUGGINGFACE TOKEN GOES HERE> ilab model download --repository=instructlab/granite-7b-lab

    These types of models are useful for GPU-enabled systems or anyone looking to serve a model using vLLM. InstructLab provides Safetensor versions of our Granite models on HuggingFace.

    Listing downloaded models

  • All downloaded models can be seen with ilab model list.

    ilab model list

    Example output of ilab model list after ilab model download

    (venv) $ ilab model list
    +------------------------------+---------------------+--------+
    | Model Name                   | Last Modified       | Size   |
    +------------------------------+---------------------+--------+
    | merlinite-7b-lab-Q4_K_M.gguf | 2024-08-01 15:05:48 | 4.1 GB |
    +------------------------------+---------------------+--------+

๐Ÿด Serving the model

  • Serve the model by running the following command:

    ilab model serve
  • Serve a non-default model (e.g. Mixtral-8x7B-Instruct-v0.1):

    ilab model serve --model-path models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf
  • Once the model is served and ready, you'll see the following output:

    (venv) $ ilab model serve
    INFO 2024-03-02 02:21:11,352 lab.py:201 Using model 'models/ggml-merlinite-7b-lab-Q4_K_M.gguf' with -1 gpu-layers and 4096 max context size.
    Starting server process
    After application startup complete see http://127.0.0.1:8000/docs for API.
    Press CTRL+C to shut down the server.

    NOTE: If multiple ilab clients try to connect to the same InstructLab server at the same time, the 1st will connect to the server while the others will start their own temporary server. This will require additional resources on the host machine.

  • Serve a non-default Safetensors model (e.g. granite-7b-lab). NOTE: this requires a GPU.

    Ensure vllm is installed:

    pip show vllm

    If it is not, please run:

    pip install vllm@git+https://github.com/opendatahub-io/[email protected]
    ilab model serve --model-path ~/.cache/instructlab/models/instructlab/granite-7b-lab

๐Ÿ“ฃ Chat with the model (Optional)

Because you're serving the model in one terminal window, you will have to create a new window and re-activate your Python virtual environment to run ilab model chat command:

source venv/bin/activate
ilab model chat

Chat with a non-default model (e.g. Mixtral-8x7B-Instruct-v0.1):

source venv/bin/activate
ilab model chat --model models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf

Please note that usage of --model necessitates that the existing server has that model. If not, you must exit the server. --model in ilab model chat has the ability to start a server on your behalf with the specified model if one is not already running on the port.

Before you start adding new skills and knowledge to your model, you can check its baseline performance by asking it a question such as what is the capital of Canada?.

NOTE: the model needs to be trained with the generated synthetic data to use the new skills or knowledge

(venv) $ ilab model chat
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ system โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Welcome to InstructLab Chat w/ GGML-MERLINITE-7B-lab-Q4_K_M (type /h for help)                                                                                                                                                                    โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
>>b> what is the capital of Canada                                                                                                                                                                                                 [S][default]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ggml-merlinite-7b-lab-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ The capital city of Canada is Ottawa. It is located in the province of Ontario, on the southern banks of the Ottawa River in the eastern portion of southern Ontario. The city serves as the political center for Canada, as it is home to โ”‚
โ”‚ Parliament Hill, which houses the House of Commons, Senate, Supreme Court, and Cabinet of Canada. Ottawa has a rich history and cultural significance, making it an essential part of Canada's identity.                                   โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 12.008 seconds โ”€โ•ฏ
>>>                                                                                                                                                                                                                               [S][default]

๐Ÿ’ป Creating new knowledge or skills and training the model

๐ŸŽ Contribute knowledge or compositional skills

  1. Contribute new knowledge or compositional skills to your local taxonomy repository.

Detailed contribution instructions can be found in the taxonomy repository.

Important

There is a limit to how much content can exist in the question/answer pairs for the model to process. Due to this, only add a maximum of around 2300 words to your question and answer seed example pairs in the qna.yaml file.

๐Ÿ“œ List and validate your new data

You can use the ilab taxonomy diff command to ensure ilab is registering your new knowledge or skills and your contributions are properly formatted. This command displays any new or modified YAML files within your taxonomy tree. For example, the following is the expected result of a valid compositional skill contribution after adding a new skill called foo-lang to the freeform writing subdirectory:

(venv) $ ilab taxonomy diff
compositional_skills/writing/freeform/foo-lang/qna.yaml
Taxonomy in $HOME/.local/share/instructlab/taxonomy is valid :)

You can also validate your entire taxonomy by performing a diff against an empty base by using the --taxonomy-base=empty argument:

(venv) $ ilab taxonomy diff --taxonomy-base=empty
compositional_skills/general/tables/empty/qna.yaml
compositional_skills/general/tables/editing/add_remove/qna.yaml
...
Taxonomy in $HOME/.local/share/instructlab/taxonomy is valid :)

๐Ÿš€ Generate a synthetic dataset

Before following these instructions, ensure the existing model you are adding skills or knowledge to is still running. Alternatively, ilab data generate can start a server for you if you provide a fully qualified model path via --model.

  1. To generate a synthetic dataset based on your newly added knowledge or skill set in taxonomy repository, run the following command:

    With GPU acceleration:

    ilab data generate --pipeline full --gpus <NUM_OF_GPUS>

    Without GPU acceleration:

    ilab data generate --pipeline simple

    Use a non-default model (e.g. Mixtral-8x7B-Instruct-v0.1) to generate data, run the following command:

    ilab data generate --model ~/.cache/instructlab/models/mistralai/mixtral-8x7b-instruct-v0.1 --pipeline full --gpus 4

    NOTE: โณ This can take from 15 minutes to 1+ hours to complete, depending on your computing resources.

    Example output of ilab data generate

    (venv) $ ilab data generate
    INFO 2024-07-30 19:57:44,093 numexpr.utils:161: NumExpr defaulting to 8 threads.
    INFO 2024-07-30 19:57:44,452 datasets:58: PyTorch version 2.3.1 available.
    Generating synthetic data using 'simple' pipeline, '$HOME/.cache/instructlab/models/mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf' model, './taxonomy' taxonomy, against http://localhost:8000/v1 server
    INFO 2024-07-30 19:57:45,084 instructlab.sdg:375: Synthesizing new instructions. If you aren't satisfied with the generated instructions, interrupt training (Ctrl-C) and try adjusting your YAML files. Adding more examples may help.
    INFO 2024-07-30 19:57:45,090 instructlab.sdg.pipeline:153: Running pipeline single-threaded
    INFO 2024-07-30 19:57:47,820 instructlab.sdg.llmblock:51: LLM server supports batched inputs: False
    INFO 2024-07-30 19:57:47,820 instructlab.sdg.pipeline:197: Running block: gen_skill_freeform
    INFO 2024-07-30 19:57:47,820 instructlab.sdg.pipeline:198: Dataset({
       features: ['task_description', 'seed_question', 'seed_response'],
       num_rows: 5
    })
    INFO 2024-07-30 20:02:16,455 instructlab.sdg:411: Generated 1 samples
    ...

    The synthetic data set will be two files in the newly created in the datasets directory. On Linux this will be: ~/.local/share/instructlab/datasets and on MacOS this will be ~/Library/Application Support/instructlab/datasets. These files will be named skills_train_msgs_*.jsonl and knowledge_train_msgs_*.jsonl.

  2. Verify the files have been created by running the ls datasets command. Note: you must be in your XDG_DATA_HOME/instructlab directory.

    (venv) $ ls datasets/
    node_datasets_2024-08-12T20_31_15                          test_mixtral-8x7b-instruct-v0-1_2024-08-12T20_23_06.jsonl
    knowledge_recipe_2024-08-12T20_31_15.yaml                      node_datasets_2024-08-13T19_51_48                          test_mixtral-8x7b-instruct-v0-1_2024-08-12T20_31_15.jsonl
    knowledge_recipe_2024-08-13T19_51_48.yaml                      skills_recipe_2024-08-12T20_31_15.yaml                     test_mixtral-8x7b-instruct-v0-1_2024-08-13T19_47_59.jsonl
    knowledge_train_msgs_2024-08-12T20_31_15.jsonl                 skills_recipe_2024-08-13T19_51_48.yaml                     test_mixtral-8x7b-instruct-v0-1_2024-08-13T19_51_48.jsonl
    knowledge_train_msgs_2024-08-13T19_51_48.jsonl                 skills_train_msgs_2024-08-12T20_31_15.jsonl                train_mixtral-8x7b-instruct-v0-1_2024-08-12T20_31_15.jsonl
    messages_mixtral-8x7b-instruct-v0-1_2024-08-12T20_31_15.jsonl  skills_train_msgs_2024-08-13T19_51_48.jsonl                train_mixtral-8x7b-instruct-v0-1_2024-08-13T19_51_48.jsonl
    messages_mixtral-8x7b-instruct-v0-1_2024-08-13T19_51_48.jsonl  test_mixtral-8x7b-instruct-v0-1_2024-08-12T20_13_21.jsonl

    Optional: It is also possible to run the generate step against a different model via an OpenAI-compatible API. For example, the one spawned by ilab model serve or any remote or locally hosted LLM (e.g. via ollama, LM Studio, etc.). Run the following command:

    ilab data generate --endpoint-url http://localhost:8000/v1

Note that it is also possible to generate a synthetic dataset based on the entire contents of the taxonomy repo using the --taxonomy-base=empty option:

ilab data generate --taxonomy-base=empty

๐Ÿ‘ฉโ€๐Ÿซ Training the model

There are many options for training the model with your synthetic data-enhanced dataset.

Note: Every ilab command needs to run from within your Python virtual environment.

Train the model locally on Linux

ilab model train

NOTE: โณ This step can potentially take several hours to complete depending on your computing resources. Please stop ilab model chat and ilab model serve first to free resources.

If you are using ilab model train --legacy or are on MacOS:

ilab model train outputs a brand-new model that can be served in the models directory called ggml-model-f16.gguf.

If you are using ilab model train with a GPU enabled system:

ilab model train outputs brand-new models that can be served in the ~/.local/share/instructlab/checkpoints directory. These models can be run through ilab model evaluate to choose the best one.

If you are using ilab model train --strategy lab-multiphase

Train the model locally on an M-series Mac

To train the model locally on your M-Series Mac is as easy as running:

ilab model train

Note: โณ This process will take a little while to complete (time can vary based on hardware and output of ilab data generate but on the order of 5 to 15 minutes)

ilab model train outputs a brand-new model that is saved in the <model_name>-mlx-q directory called adapters.npz (in Numpy compressed array format). For example:

(venv) $ ls instructlab-merlinite-7b-lab-mlx-q
adapters-010.npz        adapters-050.npz        adapters-090.npz        config.json             tokenizer.model
adapters-020.npz        adapters-060.npz        adapters-100.npz        model.safetensors       tokenizer_config.json
adapters-030.npz        adapters-070.npz        adapters.npz            special_tokens_map.json
adapters-040.npz        adapters-080.npz        added_tokens.json       tokenizer.jso

Train the model locally with GPU acceleration

Training has experimental support for GPU acceleration with Nvidia CUDA or AMD ROCm. Please see the GPU acceleration documentation for more details. At present, hardware acceleration requires a data center GPU or high-end consumer GPU with at least 18 GB free memory.

ilab model train --device=cuda

This version of ilab model train outputs brand-new models that can be served in the ~/.local/share/instructlab/checkpoints directory on Linux and ~/Library/Application Support/instructlab/checkpoints on MacOS. These models can be run through ilab model evaluate to choose the best one.

Train the model locally with multi-phase training and GPU acceleration

ilab model train supports multi-phase training. This results in the following workflow:

  1. We train the model on knowledge
  2. Evaluate the trained model to find the best checkpoint
  3. We train the model on skills
  4. We evaluate the model to find the best overall checkpoint
ilab model train --strategy lab-multiphase --phased-phase1-data <knowledge train messages jsonl> --phased-phase2-data <skills train messages jsonl> -y

This command takes in two .jsonl files from your datasets directory, one is the knowledge jsonl and the other is a skills jsonl. The -y flag skips an interactive prompt asking the user if they are sure they want to run multi-phase training.

Note: this command may take 3 or more hours depending on the size of the data and number of training epochs you run.

Train the model in the cloud

Follow the instructions in Training.

โณ Approximate amount of time taken on each platform:

  • Google Colab: 5-10 minutes with a T4 GPU
  • Kaggle: ~30 minutes with a P100 GPU.

After that's done, you can play with your model directly in the Google Colab or Kaggle notebook. Model trained on the cloud will be saved on the cloud. The model can also be downloaded and served locally.

๐Ÿ“œ Test the newly trained model

  • Run the following command to test the model:

    ilab model test

    The output from the command will consist of a series of outputs from the model before and after training.

๐Ÿงช Evaluate the newly trained model

You can use the ilab model evaluate command to evaluate the models you are training with several benchmarks. Currently, four benchmarks are supported.

Benchmark Measures Full Name Description Reference
MMLU Knowledge Massive Multitask Language Understanding Tests a model against a standardized set of knowledge data and produces a score based on the model's performance Measuring Massive Multitask Language Understanding
MMLUBranch Knowledge N/A Tests your knowledge contributions against a base model and produces a score based on the difference in performance N/A
MTBench Skills Multi-turn Benchmark Tests a model's skill at applying its knowledge against a judge model and produces a score based on the model's performance MT-Bench (Multi-turn Benchmark)
MTBenchBranch Skills N/A Tests your skill contributions against a judge model and produces a score based on the difference in performance N/A

Note

MTBench and MTBenchBranch use prometheus-8x7b-v2.0 as the judge model by default. While you do not need to use this model as your judge, it is strongly recommended to do so if you have the necessary hardware resources. You can download it via ilab model download.

Running MMLU

Below is an example of running MMLU on a local model with minimal tasks:

$ export INSTRUCTLAB_EVAL_MMLU_MIN_TASKS=true   # don't set this if you want to run full MMLU 
$ export ILAB_MODELS_DIR=$HOME/.local/share/instructlab/models
$ ilab model evaluate --benchmark mmlu --model $ILAB_MODELS_DIR/instructlab/granite-7b-lab
...
# KNOWLEDGE EVALUATION REPORT

## MODEL
/home/example-user/.local/share/instructlab/models/instructlab/granite-7b-lab

### AVERAGE:
0.45 (across 3)

### SCORES:
mmlu_abstract_algebra - 0.35
mmlu_anatomy - 0.44
mmlu_astronomy - 0.55

Below is an example of running MMLU on a Hugging Face model with minimal tasks:

$ export INSTRUCTLAB_EVAL_MMLU_MIN_TASKS=true   # don't set this if you want to run full MMLU 
$ ilab model evaluate --benchmark mmlu --model instructlab/granite-7b-lab
...
# KNOWLEDGE EVALUATION REPORT

## MODEL
instructlab/granite-7b-lab

### AVERAGE:
0.45 (across 3)

### SCORES:
mmlu_abstract_algebra - 0.35
mmlu_anatomy - 0.44
mmlu_astronomy - 0.55

Note

Currently, MMLU can only be run against a safetensors model directory, either locally or on Hugging Face. GGUFs are not currently supported.

Running MMLUBranch

Below is an example of running MMLUBranch with a local safetensors model directory:

$ export ILAB_MODELS_DIR=$HOME/.local/share/instructlab/models
$ ilab model evaluate --benchmark mmlu_branch --model $ILAB_MODELS_DIR/instructlab/granite-7b-lab --base-model $ILAB_MODELS_DIR/instructlab/granite-7b-lab
...
# KNOWLEDGE EVALUATION REPORT

## BASE MODEL
/home/example-user/.local/share/instructlab/models/instructlab/granite-7b-lab

## MODEL
/home/example-user/.local/share/instructlab/models/instructlab/granite-7b-lab

### AVERAGE:
+0.0 (across 1)

### NO CHANGE:
1. tonsils

Below is an example of running MMLUBranch with Hugging Face models:

$ ilab model evaluate --benchmark mmlu_branch --model instructlab/granite-7b-lab --base-model instructlab/granite-7b-lab
...
# KNOWLEDGE EVALUATION REPORT

## BASE MODEL
instructlab/granite-7b-lab

## MODEL
instructlab/granite-7b-lab

### AVERAGE:
+0.0 (across 1)

### NO CHANGE:
1. tonsils

Tip

You can mix and match running local models and remote models on Hugging Face, so long as a safetensors model is present.

Running MTBench

Below is an example of running MTBench with a local safetensors model directory:

$ export ILAB_MODELS_DIR=$HOME/.local/share/instructlab/models
$ ilab model evaluate --benchmark mt_bench --model $ILAB_MODELS_DIR/instructlab/granite-7b-lab --judge-model $ILAB_MODELS_DIR/instructlab/granite-7b-lab
...
# SKILL EVALUATION REPORT

## MODEL
/home/example-user/.local/share/instructlab/models/instructlab/granite-7b-lab

### AVERAGE:
8.07 (across 91)

### TURN ONE:
8.64

### TURN TWO:
7.19

### ERROR RATE:
0.43

Below is an example of running MTBench with local GGUF models:

$ export ILAB_MODELS_DIR=$HOME/.local/share/instructlab/models
$ ilab model evaluate --benchmark mt_bench --model $ILAB_MODELS_DIR/granite-7b-lab-Q4_K_M.gguf --judge-model $ILAB_MODELS_DIR/granite-7b-lab-Q4_K_M.gguf
...
# SKILL EVALUATION REPORT

## MODEL
/home/example/.local/share/instructlab/models/granite-7b-lab-Q4_K_M.gguf

### AVERAGE:
5.0 (across 1)

### TURN ONE:
5.0

### TURN TWO:
N/A

### ERROR RATE:
0.99

Note

Currently, MTBench must be used with local models. Using models directly from Hugging Face without downloading them is unsupported.

Running MTBenchBranch

Below is an example of running MTBenchBranch with a local safetensors model directory:

$ export ILAB_MODELS_DIR=$HOME/.local/share/instructlab/models
$ export ILAB_TAXONOMY_DIR=$HOME/.local/share/instructlab/taxonomy
$ ilab model evaluate --benchmark mt_bench_branch \
   --model $ILAB_MODELS_DIR/instructlab/granite-7b-lab \
   --judge-model $ILAB_MODELS_DIR/instructlab/granite-7b-lab \
   --base-model $ILAB_MODELS_DIR/instructlab/granite-7b-lab \
   --taxonomy-path $ILAB_TAXONOMY_DIR \
   --branch rc \
   --base-branch main
...
# SKILL EVALUATION REPORT

## BASE MODEL
/home/example/.local/share/instructlab/models/instructlab/granite-7b-lab

## MODEL
/home/example/.local/share/instructlab/models/instructlab/granite-7b-lab

### IMPROVEMENTS:
1. compositional_skills/extraction/receipt/markdown/qna.yaml (+4.0)
2. compositional_skills/STEM/science/units_conversion/temperature_conversion/qna.yaml (+3.0)
3. compositional_skills/extraction/commercial_lease_agreement/bullet_points/qna.yaml (+3.0)
...

### REGRESSIONS:
1. compositional_skills/extraction/abstractive/title/qna.yaml (-5.0)
2. compositional_skills/extraction/receipt/bullet_points/qna.yaml (-4.5)
3. compositional_skills/writing/grounded/summarization/wiki_insights/one_line/qna.yaml (-4.0)
...

### NO CHANGE:
1. compositional_skills/STEM/math/reasoning/qna.yaml
2. compositional_skills/extraction/commercial_lease_agreement/csv/qna.yaml
3. compositional_skills/roleplay/explain_like_i_am/graduate/qna.yaml
...

### NEW:
1. compositional_skills/linguistics/organize_lists/qna.yaml
2. compositional_skills/extraction/invoice/plain_text/qna.yaml
3. compositional_skills/writing/grounded/summarization/wiki_insights/concise/qna.yaml
...

### ERROR RATE:
0.32

Below is an example of running MTBenchBranch with local GGUF models:

$ export ILAB_MODELS_DIR=$HOME/.local/share/instructlab/models
$ export ILAB_TAXONOMY_DIR=$HOME/.local/share/instructlab/taxonomy
$ ilab model evaluate --benchmark mt_bench_branch --model $ILAB_MODELS_DIR/granite-7b-lab-Q4_K_M.gguf --judge-model $ILAB_MODELS_DIR/granite-7b-lab-Q4_K_M.gguf --base-model $ILAB_MODELS_DIR/granite-7b-lab-Q4_K_M.gguf --taxonomy-path $ILAB_TAXONOMY_DIR --branch rc --base-branch main
...
# SKILL EVALUATION REPORT

## BASE MODEL
/home/ec2-user/.local/share/instructlab/models/granite-7b-lab-Q4_K_M.gguf

## MODEL
/home/ec2-user/.local/share/instructlab/models/granite-7b-lab-Q4_K_M.gguf

### NO CHANGE:
1. compositional_skills/STEM/math/distance_conversion/qna.yaml

### NEW:
1. compositional_skills/linguistics/organize_lists/qna.yaml
2. compositional_skills/extraction/annual_report/reasoning/qna.yaml
3. compositional_skills/extraction/email/plain_text/qna.yaml
4. compositional_skills/extraction/technical_paper/tables/bullet_points/qna.yaml
5. compositional_skills/extraction/technical_paper/abstract/reasoning/qna.yaml

### ERROR RATE:
0.98

Note

Currently, MTBenchBranch must be used with local models. Using models directly from Hugging Face without downloading them is unsupported.

๐Ÿด Serve the newly trained model

  1. Stop the server you have running by entering ctrl+c keys in the terminal running the server.

    IMPORTANT:

    • ๐ŸŽ This step is only implemented for macOS with M-series chips (for now).

    • Before serving the newly trained model you must convert it to work with the ilab cli. The ilab model convert command converts the new model into quantized GGUF format which is required by the server to host the model in the ilab model serve command.

  2. Convert the newly trained model by running the following command:

    ilab model convert
  3. Serve the newly trained model locally via ilab model serve command with the --model-path argument to specify your new model:

    ilab model serve --model-path <new model path>

    Which model should you select to serve? After running the ilab model convert command, some files and a directory are generated. The model you will want to serve ends with an extension of .gguf and exists in a directory with the suffix trained. For example: instructlab-merlinite-7b-lab-trained/instructlab-merlinite-7b-lab-Q4_K_M.gguf.

๐Ÿ“ฃ Chat with the new model (not optional this time)

  • Try the fine-tuned model out live using the chat interface, and see if the results are better than the untrained version of the model with chat by running the following command:

    ilab model chat -m <New model path>

    If you are interested in optimizing the quality of the model's responses, please see TROUBLESHOOTING.md

๐Ÿš€ Upgrade InstructLab to latest version

  • To upgrade InstructLab to the latest version, use the following command:

    pip install instructlab --upgrade

๐ŸŽ Submit your new knowledge or skills

Of course, the final step is, if you've improved the model, to open a pull-request in the taxonomy repository that includes the files (e.g. qna.yaml) with your improved data.

๐Ÿ“ฌ Contributing

Check out our contributing guide to learn how to contribute.

taxonomy's People

Contributors

abhi1092 avatar abrahamdaniels avatar bjhargrave avatar booxter avatar ckadner avatar danmcp avatar dependabot[bot] avatar ebblake avatar grdryn avatar iranzo avatar jeremyeder avatar jjasghar avatar joesepi avatar kelbrown20 avatar lehors avatar lhawthorn avatar luksa avatar mairin avatar mingxzhao avatar nathan-weinberg avatar nitin-ramchandani avatar obuzek avatar rhatdan avatar russellb avatar shivchander avatar spacew avatar spzala avatar stevsmit avatar wking avatar xukai92 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

taxonomy's Issues

Proposal for adding capability of replying to close-ended questions in a concise way

Describe the proposed contribution to the taxonomy

  • When replying to a close ended question the model writes a lot
  • If the prompt asks explicitly to response only with the option label (and maybe its value), the model does not respect the requirement
  • Proposing to add a composite skill to reply only with the option label like in a close-ended question exam

Input given at the prompt

What the first law of robotics say? Return option number only: 
    1. A robot can hit a human if they are trying to hurt the robot;
    2. A robot may not injure a human being or, through inaction, allow a human being to come to harm;
    3. A human cannot interfere with robot action if this would mean making the robot failing a task

Response from the current model

   The first law of robotics, as stated by Isaac Asimov, is: "A robot may not injure a human being or, through inaction, allow a human being to come to harm." This is option 2 in the list provided.

Response that you would expect instead with the contribution

  2 or Option 2

Lint validation failing with errors that are not related to changed / added file

Describe the bug/problem

  • Looks like lint is checking all the files which are not part of merge request due to which specific merge validation is failing
  • I am getting below output for PR (Added knowledge skill for language #951 ) validation and the file that I added is not part of the list. I did not see this behaviour couple of days back.
Run echo "::add-matcher::.github/workflows/matchers/lint.json"

Warning: WARN:  compositional_skills/extraction/abstractive/abstract/qna.yaml:105:1: missing/empty 'task_description'
Error: ERROR: compositional_skills/extraction/abstractive/abstract/qna.yaml:2:1: less than 5 'seed_examples'
Warning: WARN:  compositional_skills/extraction/abstractive/key_points/qna.yaml:72:1: missing/empty 'task_description'
Error: ERROR: compositional_skills/extraction/abstractive/key_points/qna.yaml:2:1: less than 5 'seed_examples'
Warning: WARN:  compositional_skills/extraction/abstractive/main_takeaway/qna.yaml:60:1: missing/empty 'task_description'
Error: ERROR: compositional_skills/extraction/abstractive/main_takeaway/qna.yaml:2:1: less than 5 'seed_examples'
Warning: WARN:  compositional_skills/extraction/abstractive/outline/qna.yaml:81:1: missing/empty 'task_description'
Error: ERROR: compositional_skills/extraction/abstractive/outline/qna.yaml:2:1: less than 5 'seed_examples'
Warning: WARN:  compositional_skills/extraction/abstractive/title/qna.yaml:[34](https://github.com/instruct-lab/taxonomy/actions/runs/8270539879/job/22628283995?pr=482#step:5:35):1: missing/empty 'task_description'
Error: ERROR: compositional_skills/extraction/abstractive/title/qna.yaml:2:1: less than 5 'seed_examples'
Warning: WARN:  compositional_skills/extraction/inference/qualitative/sentiment/qna.yaml:19:1: missing/empty 'task_description'
Error: ERROR: compositional_skills/extraction/inference/qualitative/sentiment/qna.yaml:2:1: less than 5 'seed_examples'
Warning: WARN:  compositional_skills/extraction/inference/qualitative/tone_and_style/qna.yaml:5:1: missing/empty 'task_description'
Error: ERROR: compositional_skills/extraction/inference/qualitative/tone_and_style/qna.yaml:2:1: missing 'seed_examples'
Error: ERROR: compositional_skills/extraction/inference/qualitative/tone_and_style/qna.yaml:2:1: less than 5 'seed_examples'
Error: ERROR: compositional_skills/extraction/inference/qualitative/tone_and_style/qna.yaml:2:1: missing/empty 'question's
Error: ERROR: compositional_skills/extraction/inference/qualitative/tone_and_style/qna.yaml:2:1: missing/empty 'answer's
Error: ERROR: compositional_skills/extraction/inference/qualitative/tone_and_style/qna.yaml:2:1: missing/empty 'context's

Model does not seem to understand next/future in prompt & provides previous results back to 2020

Describe the bug
The model does not seem to understand the notion of "next" being a point in time from today. When asked, two similar questions in different contexts, past dates are provided within the response. I would expect when asking next, it would be not providing past results (or from several years back)

Input given at the prompt
Prompt 1:
When are the next 5 solar eclipses?

Prompt 2:
when are the next 5 leap years?

Response that was received
Response for Prompt 1:

Here is the list of the next 5 solar eclipses, including the type, date, and location:  
1. Total Solar Eclipse - December 14, 2020 - South Pacific Ocean, Chile, Argentina 
2. Annular Solar Eclipse - June 21, 2020 - Arctic Ocean, Greenland, Canada, Svalbard, Norway 
3. Hybrid Solar Eclipse - December 4, 2023 - Indian Ocean, Antarctica 
4. Annular Solar Eclipse - October 14, 2023 - Pacific Ocean, Chile, Argentina, Falkland Islands 
5. Annular Solar Eclipse - June 26, 2025 - Atlantic Ocean, Cape Verde, Senegal, Mauritania, Mali, Algeria, Spain, Portugal, United Kingdom, France, Netherlands, Belgium, Germany, Denmark, Sweden, Norway, Finland, Russia, Greenland, Canada, United States, Mexico, Guatemala, Belize, Honduras, El Salvador, Nicaragua, Costa Rica, Panama, Colombia, Ecuador, Brazil,  Argentina, Chile, Falkland Islands, South Georgia and the South Sandwich Islands, South Orkney Islands, South Shetland Islands  

Response for Prompt 2:

Here is the list of the next 5 leap years, including the year number:
1. 2020
2. 2024
3. 2028
4. 2032
5. 2036

Response that was expected
Today is March 12 2024, so I would expect any results before that to be excluded from "next"

Prompt 1: (source: science.nasa.gov/eclipses/future-eclipses)

Here is the list of the next 5 solar eclipses, 
1. Total Solar Eclipse - April 8, 2024 
2. Annular Solar Eclipse - October 2, 2024  
3. Partial Solar Eclipse - March 29, 2025 
4. Partial Solar Eclipse - September 21, 2025
5. Annular Solar Eclipse - February 17, 2026 

Prompt 2:

Here is the list of the next 5 leap years:
1. 2024
2. 2028
3. 2032
4. 2036
5. 2040

Run into endless loop in response in chinese

Describe the bug
Ask the model to response in chinese. It run into endless response "ๅ“Žๅ‘€ๅ‘€ๅ‘€", I only can do force stop after wait for a long while.
My question in english is "Could you list some greeting words except hello?".

Input given at the prompt
โ€œ่ฏท็”จไธญๆ–‡ๅ›ž็ญ”ๆŽฅไธ‹ๆฅ็š„้—ฎ้ข˜ใ€‚โ€œ
"้™คไบ†ไฝ ๅฅฝ๏ผŒ่ฟ˜ๆœ‰ๅ…ถไป–้—ฎๅ€™็š„่ฏ่ฏญไนˆ๏ผŸ"

Response that was received
Please check the screenshot.
endleass_hello

Response that was expected
"ๅœจไธญๆ–‡ไธญ๏ผŒ้™คไบ†ใ€Œไฝ ๅฅฝใ€ๅค–๏ผŒ่ฟ˜ๆœ‰่ฎธๅคšๅธธ่ง็š„้—ฎๅ€™่ฏ่ฏญ๏ผŒๅฆ‚'ๅ—จ'๏ผŒโ€˜ๆ—ฉไธŠๅฅฝโ€™๏ผŒโ€˜ๆ™šไธŠๅฅฝโ€™"

Proposal for automated checks of knowledge submission

In order to release knowledge contributions to open source community, IBM legal is asking us to propose an automated solution for our generation pipeline that covers:

On the provided knowledge source from the contributor:

  • HAP detection
  • PII detection
  • Screening for commonly copyrighted content (e.g. OneShield) -- lower priority

On the synthetic data created from the knowledge source

  • Screening for leakage of text from the knowledge source being used verbatim in the synthetic data (leveraging OneShield)

Crash (SIGSEGV) in ``lab serve`` when doing ``lab generate`` and ``lab chat`` at the same time.

Describe the bug

Crash in lab serve while doing lab generate and lab chat at the same time.

(venv) carlos@fedora:~/build/instruct-lab$ lab serve
INFO 2024-03-06 15:57:10,749 lab.py:201 Using model 'models/ggml-merlinite-7b-Q4_K_M.gguf' with -1 gpu-layers
Starting server process
After application startup complete see http://127.0.0.1:8000/docs for API.
Press CTRL+C to shutdown server.
disconnected
Disconnected from client (via refresh/close) Address(host='127.0.0.1', port=44072)
Segmentation fault (core dumped)

Input given at the prompt

>>> how do I determine the size of userspace page of memory on Linux?                                                         [S][default]

Response that was received

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ggml-merlinite-7b-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ To                                                                                                                                     โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 0.001 seconds โ”€โ•ฏ
Traceback (most recent call last):
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpx/_transports/default.py", line 69, in map_httpcore_exceptions
    yield
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpx/_transports/default.py", line 113, in __iter__
    for part in self._httpcore_stream:
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 367, in __iter__
    raise exc from None
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpcore/_sync/connection_pool.py", line 363, in __iter__
    for part in self._stream:
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpcore/_sync/http11.py", line 349, in __iter__
    raise exc
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpcore/_sync/http11.py", line 341, in __iter__
    for chunk in self._connection._receive_response_body(**kwargs):
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpcore/_sync/http11.py", line 210, in _receive_response_body
    event = self._receive_event(timeout=timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpcore/_sync/http11.py", line 220, in _receive_event
    with map_exceptions({h11.RemoteProtocolError: RemoteProtocolError}):
  File "/usr/lib64/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpcore/_exceptions.py", line 14, in map_exceptions
    raise to_exc(exc) from exc
httpcore.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/carlos/build/instruct-lab/venv/bin/lab", line 8, in <module>
    sys.exit(cli())
             ^^^^^
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/click/decorators.py", line 33, in new_func
    return f(get_current_context(), *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/cli/lab.py", line 330, in chat
    chat_cli(
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/cli/chat/chat.py", line 411, in chat_cli
    ccb.start_prompt()
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/cli/chat/chat.py", line 334, in start_prompt
    for chunk in response:
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/openai/_streaming.py", line 44, in __iter__
    for item in self._iterator:
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/openai/_streaming.py", line 56, in __stream__
    for sse in iterator:
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/openai/_streaming.py", line 48, in _iter_events
    yield from self._decoder.iter(self.response.iter_lines())
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/openai/_streaming.py", line 224, in iter
    for line in iterator:
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpx/_models.py", line 861, in iter_lines
    for text in self.iter_text():
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpx/_models.py", line 848, in iter_text
    for byte_content in self.iter_bytes():
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpx/_models.py", line 829, in iter_bytes
    for raw_bytes in self.iter_raw():
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpx/_models.py", line 883, in iter_raw
    for raw_stream_bytes in self.stream:
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpx/_client.py", line 126, in __iter__
    for chunk in self._stream:
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpx/_transports/default.py", line 112, in __iter__
    with map_httpcore_exceptions():
  File "/usr/lib64/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/home/carlos/build/instruct-lab/venv/lib64/python3.12/site-packages/httpx/_transports/default.py", line 86, in map_httpcore_exceptions
    raise mapped_exc(message) from exc
httpx.RemoteProtocolError: peer closed connection without sending complete message body (incomplete chunked read)
(venv) carlos@fedora:~/build/instruct-lab$ 

Response that was expected
Expected an answer to the chat question.
or
Expected timeout to be handled correctly in lab chat and lab serve

add a skill for alphabetizing a list of names

Describe the proposed contribution to the taxonomy

  • Several examples with a context of names (my idea for providing such lists is to pick a random subset of git contributors to various projects), then asking for the list to be sorted by first name, by surname, or to pick out the first or last person if the list were sorted correctly.

Input given at the prompt

I used git shortlog | sed -n '/^[^ ]/ s/ (.*//p' |xclip on libnbd.git to come up with my initial context for this issue, to see how well (or poorly) lab can do the task without further training, as shown in a saved session log:

     {
        "role": "user",
        "content": "In the following list of names, which comes first alphabetically by first name?\nChris Lamb\nDaniel P. Berrang\u00e9\nEric Blake\nLaszlo Ersek\nMartin Kletzander\nMing Lei\nNir Soffer\nPino Toscano\nRichard W.M. Jones\nTage Johansson\nThomas Wei\u00dfschuh\nanson\n"
    },
    {
        "role": "assistant",
        "content": "Alphabetizing by first name, the order is as follows:\n\n1. Daniel P. Berrang\u00e9\n2. Chris Lamb\n3. Eric Blake\n4. Laszlo Ersek\n5. Martin Kletzander\n6. Ming Lei\n7. Nir Soffer\n8. Pino Toscano\n9. Richard W.M. Jones\n10. Tage Johansson\n11. Thomas Wei\u00dfschuh\n12. anson\n\nPlease note that I have assumed \"anson\" is a given name, as it does not appear to be a last name in this context. If \"anson\" is indeed a last name, it would be placed at the end of the list."
    },
    {
        "role": "user",
        "content": "why did you put Daniel before Chris in that list?\n"
    },
    {
        "role": "assistant",
        "content": "I placed Daniel P. Berrang\u00e9 before Chris Lamb in the list because, when comparing the first names alphabetically, \"D\" precedes \"C\". I followed standard alphabetical ordering rules to ensure consistency and fairness in the arrangement of names."
    },
    {
        "role": "user",
        "content": "how about arranging the list by surname?\n"
    },
    {
        "role": "assistant",
        "content": "If we are to arrange the list by surname, it would look like this:\n1. Berrang\u00e9, Daniel P.\n2. Blake, Eric\n3. Ersek, Laszlo\n4. Kletzander, Martin\n5. Lei, Ming\n6. Nir, Soffer\n7. Pino, Toscano\n8. Jones, Richard W.M.\n9. Johansson, Tage\n10. Wei\u00dfschuh, Thomas\n11. Lamb, Chris\n12. Soffer, anson"
    }

Response from the current model

See above. The engine was unable to recognize a request for "which comes first" as wanting only one name, not the full list; and the model then hallucinated by claiming that D comes before C (but at least the list it gave was consistent with its hallucinated explanation). The request to sort all names by surname turned out even worse, although I was impressed at its ability to reorder the names by a sensible choice for surname in all but the case where it reused a bogus surname for the one-name entry "anson".

Response that you would expect instead with the contribution

For the first question, I would have expected either "anson" or "Chris Lamb", but not an entire list. The second question about why it choose Daniel before Chris should be fixed to not hallucinate. And for the third answer, I would like to see something like:

If we are to arrange the list by surname, it would look like this:

1. Berrangรฉ, Daniel P.
2. Blake, Eric
3. Ersek, Laszlo
4. Jones, Richard W.M.
5. Johansson, Tage
6. Kletzander, Martin
7. Lamb, Chris
8. Lei, Ming
9. Nir, Soffer
10. Pino, Toscano
11. WeiรŸschuh, Thomas
12. anson

The name "anson" does not appear to include a surname, so I sorted it last.

Contribution guidelines split

There are contribution guidelines in both the README and CONTRIBUTING. It would be good to align all comtributing guidelines in CONTRIBUTING.md

What is the correct attribution when skill answers came from `lab chat` with modifications?

I'm working on a skill where I generated examples by asking the existing default model (merlinite-7b-Q4_K_M.gguf) a question via lab chat and using that output as inspiration for the answer I want, correct formatting, removing extraneous / unrelated information from the response, and so on. This is a freeform writing skill, where the model is generating the content entirely by itself that is used as the basis of my corrected answer.

In this case, the question is my own but what is the correct attribution overall for these examples? How would I know the license of the underlying content that went into generating the answer it gave me? Or, do I consider all answers from this model as not encumbered by an existing license and I can choose to assign one myself, as if this was content I generated myself?

Misinformation on the state

Describe the bug
Wrong information on the state of Alaska

Input given at the prompt
what can you tell me about Alaska?

Response that was received
Alaska is the largest state in the United States, both in terms of land area and population. Here are some interesting facts about Alaska: โ”‚

โ”‚                                                                                                                                                                                                                 โ”‚
โ”‚ 1. Size: Alaska is approximately 663,268 square miles, which is roughly twice the size of Texas, the second-largest state in the USA.                                                                           โ”‚
โ”‚ 2. Borders: Alaska shares its border with Canada to the east and the Arctic Ocean to the north. It also has maritime borders with Russia across the Bering Strait.                                              โ”‚
โ”‚ 3. Capital: Juneau is the capital of Alaska, located in the southeastern part of the state.                                                                                                                     โ”‚
โ”‚ 4. Population: According to the 2020 census, Alaska has a population of approximately 739,500 people.                                                                                                           โ”‚
โ”‚ 5. Climate: Alaska has a wide range of climates due to its vast size and diverse topography. The southern coastal areas have a temperate maritime climate, while the interior has a subarctic climate.          โ”‚
โ”‚ 6. Wildlife: Alaska is home to diverse wildlife, including bears, moose, caribou, wolves, and various bird species. It also has a significant number of marine animals such as whales, seals, and sea lions.    โ”‚
โ”‚ 7. Natural Resources: Alaska is rich in natural resources, including oil, natural gas, gold, copper, zinc, lead, and coal.                                                                                      โ”‚
โ”‚ 8. National Parks: Alaska is home to numerous national parks, including Denali National Park, Wrangell-St. Elias National Park, and Glacier Bay National Park. These parks offer stunning landscapes and        โ”‚
โ”‚ abundant wildlife viewing opportunities.                                                                                                                                                                        โ”‚
โ”‚ 9. Indigenous Cultures: Alaska is home to various indigenous cultures, including Aleut, Tlingit, Haida, Tsimshian, and Inupiat. These cultures have rich traditions and histories that have shaped Alaska's     โ”‚
โ”‚ unique character.                                                                                                                                                                                               โ”‚
โ”‚ 10. Midnight Sun: During the summer months, Alaska experiences the Midnight Sun phenomenon, where the sun does not set for several weeks in some parts of the state.                                            โ”‚
โ”‚                                                                                                                                                                                                                 โ”‚
โ”‚ These are just a few facts about Alaska. If you have any specific questions or topics you'd like to learn more about, feel free to ask!    

Response that was expected
Alaska is the largest state in the United States, in terms of land area. Here are some interesting facts about Alaska: [text to continue]

Skills Knowhow: Complex rule systems - knowledge or skills?

A skills contributor (@djhenry) asks:

I asked the model to describe the London System chess opening and the response describes illegal moves like castling multiple times.  How would I improve this answer?  Would this be a compositional skill or more of a knowledge thing?

On its face this sounds like a combination of knowledge and skills - the knowledge part being a complete set of chess rules, and the skills part being answering questions using that knowledge correctly. Or is this an example of a "grounded skill"?

cc @juliadenham @akashgit @katesoule @xukai92

The model doesn't update its answer to the same closed-ended question if the previously selected option gets removed

Describe the bug/problem

  • Ask a closed-options question
  • Repeat the question removing from the options the one previously chosen by the model
  • The model keep answering to the old version of the question
  • Repeat the question by adding the option the model keeps answering, using a different label
  • The model responses with the new label

Input given at the prompt


1. Q1: What is the capital of Italy. Reply only with the option number among the following: A. Florence; B. Milan; C. Rome
2. Q2: What is the capital of Italy. Reply only with the option number among the following: A.Florence; B. Milan; C. Bari
3. Q3: Rome is not among the options
4. Q4: What is the capital of Italy. Reply only with the option number among the following: A.Florence; B. Milan; C. Bari; D. None of the above
5. Q5: What is the capital of Italy. Reply only with the option number among the following: A.Florence; B. Milan; C. Bari; D. None of the above; E. Rome

Response that was received from the current model

1. R1: `The capital of Italy is Rome, which is option C in the list provided`. 
2. R2: `The capital of Italy is Rome, which is option C in the list provided.`
3. R3: `I apologize for the confusion earlier. The capital of Italy is Rome, which is option C in the list provided.`
4. R4: `The capital of Italy is Rome, which is option C in the list provided.` 
5. R5: `The capital of Italy is Rome, which is option E in the list provided.`

Response that you expected instead

1. R1: `Option C`
2. R2: `None of the options are valid`
3. R3: `I apologize for the confusion earlier. The capital of Italy is Rome, none of the options match it`
4. R4: `Option D`
5. R5: `Option E`

Additional observation
In this case it "remembers" the old version of the question where the correct option was present. However, if the correct option is missing from the very first version of the question, it will still reply with an option which is not present. e.g.,
Q: what is the capital of France? A. Lyon B. Marseille C. Reun
R: The capital of France is Paris, which is not listed in the provided options. Paris is option A in the list provided, but it is not the capital of France --> In this case it messes up a bit apart from mentioning option A wrongly.

Check skills file contents is failing with missing 'seed_examples'

Describe the bug
The test 'Check skills file contents' is failing with missing 'seed_examples'. Other elements of the check which depend on 'seed_examples' are passing. Another reviewer noted that my training data contains null and that broke the check.

see PR instruct-lab/cli#472

Input given at the prompt

Response that was received
The check failed

Response that was expected
The check should pass

dec -> hex

Describe the bug
The converting algorithm doesn't work properly and it also depends on previous state of the system.

Input given at the prompt
convert 10000 into hexadecimal, be concise

Response that was received
The decimal number 10000 is equivalent to the hexadecimal number 2B7.

Response that was expected
The decimal number 10000 is equivalent to the hexadecimal number 2710.

Sometimes it seems to depend on previous tasks, which for stupid questions makes no sense. I can't reproduce the following problem now.
Input given at the prompt
convert 10 into hexadecimal

Response that was received
broken algorithm -> endless loop

Response that was expected
A

Scope skills submissions to explain what you want

The more clear the contributing docs are about the kind of submissions you're looking for, the less confusion there will be in folks wondering if they should contribute. And more clarity would also avoid the time and dissapointment of "actually, this isn't what we're looking for at the moment" getting hashed out at the pull-request level. We already do a good job saying "no knowledge yet!", but we can scope skills more by saying things like:

  • We currently only have the maintainer staffing or bot infra to review English submissions. We intend to support more languages in the future.
  • We aren't currently looking for computer programming language skills.
  • We are currently looking for company/sub-culture specific information!
  • Etc., etc.

to help keep contributors and maintainers more aligned on where things are expected to go, and what kinds of contributions are expected to be accepted.

Add PR checks via GitHub actions

GitHub actions along with concise Bash scripts could/should be employed to make sure the submitted "qna.yml" and "knowledge.md" files are formatted correctly (style and content structure)

Add skill group descriptions

Currently, each skill (task?) already has a description (in the task_description field within qna.yaml), but the parent directories don't. For example, the compositional_skills/writing/grounded directory contains several skills, but it may not be perfectly clear what grounded means (in its surrounding context), because the term is not explained anywhere.

As a contributor, it's hard to figure out where to add a certain skill. If every file directory had a description explaining the purpose of the directory, this would be much easier. This description could be in a simple README.md within the directory, or better yet in a new YAML file, as this will allow adding additional metadata to the directories later.

Support for "include files"?

In #117 I attempted to write a skill to generate GNU-format ChangeLog files from patch files.

The input patch files touched a Makefile, and thus contained a mixture of tabs and spaces (including semantically-significant tabs, alas).
The ChangeLog format I'm hoping it will generate also contain a mixture of tabs and spaces.

Presumably the output from the chatbot should be in Markdown, with the tabs and spaces and significant indentation escaped in some way. Expressing this file-escaped-into-markdown in yaml presumably requires a second kind of escaping.

I'm a novice at yaml, so maybe there's an existing way to do this, but this was very hard for me to express in the qna.yaml file. Is the human author expected to escape these themself, and for the qna.yaml "source" to contain the doubly-escaped content? Better would be a syntax that expresses "for the answer, I want the contents of this file, in Markdown form, labelled as being of file kind ", and have the file live in the taxonomy tree as a sibling of the qna.yaml file, rather than having the answer as literal text in the yaml.

Ordering of question and answer in qna.yaml files

Every qna.yaml file I've looked at puts the answer before the question.

I'm a newcomer to the project, so there may be some deep technical reason for this that I'm unaware of, but the ordering seems confusing; it seems much more clear to me to have question then answer in such files. This seems particularly pertinent when the list of answer/question pairs is bigger than a screenfull, - it's easy to get lost and misgroup the pairs (especially given the name of the file format).

Can/should the recommended ordering change, or should there be a FAQ about this?

Hope this is constructive.

fail to solve simple linear equations in two variables

basically it cannot solve a simple linear equation. (sometimes it can, for some very simple equation, like x+y=6,x-y=2)

if 2x + y = 18 and x - 2y = 4 , what is the value of x and y?

This problem is yet another system of linear equations! Let's use the elimination method once more. First, let's multiply both sides โ”‚of the second equation by 2 to make the coefficients of y equal: โ”‚
โ”‚ 2(x - 2y) = 2(4) โ”‚
โ”‚ 2x - 4y = 8 โ”‚
โ”‚ โ”‚
โ”‚ Now we have two equations: โ”‚
โ”‚ 2x + y = 18 โ”‚
โ”‚ 2x - 4y = 8 โ”‚
โ”‚ โ”‚
โ”‚ Let's multiply the first equation by 2 to make the coefficients of x equal: โ”‚
โ”‚ (2x + y) * 2 = 18 * 2 โ”‚
โ”‚ 4x + 2y = 36 โ”‚
โ”‚ โ”‚
โ”‚ Now subtract the modified first equation from the second equation: โ”‚
โ”‚ (4x + 2y) - (2x - 4y) = 36 - 8 โ”‚
โ”‚ 4x + 2y - 2x + 4y = 28 โ”‚
โ”‚ 6x = 28 โ”‚
โ”‚ x = 28 / 6 โ”‚
โ”‚ x = 4.67 (approximately) โ”‚
โ”‚ โ”‚
โ”‚ To find the value of y, we can substitute x back into either equation. Let's use the first equation: โ”‚
โ”‚ 2(4.67) + y = 18 โ”‚
โ”‚ 9.34 + y = 18 โ”‚
โ”‚ y = 18 - 9.34 โ”‚
โ”‚ y = 8.66 (approximately) โ”‚
โ”‚ โ”‚
โ”‚ So, the values of x and y are: โ”‚
โ”‚ x = 4.67 (approximately) โ”‚
โ”‚ y = 8.66 (approximately)

YAML extension should be specified explictly

both .yaml and .yml can be used for YAML files.
but in order to make files consistent, I suggest we stick with .yaml files (as it's how the existing taxonomy uses) and explicitly guide users to do so

How do you fix an issue when the model's aswer is self inconsistent?

Describe the bug
The model's answer is self inconsistent.

Input given at the prompt

>>> How do you start an openstack server with the openstack CLI client?                                                                                                                                 

Response that was received

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ ggml-merlinite-7b-0302-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ To start an OpenStack server using the OpenStack CLI (command-line interface) client, you can utilize the `nova start` command. Here's a step-by-step guide:                                                     โ”‚
โ”‚                                                                                                                                                                                                                  โ”‚
โ”‚ 1. First, ensure that your OpenStack environment is properly configured and that you have installed the OpenStack CLI client on your local machine.                                                              โ”‚
โ”‚                                                                                                                                                                                                                  โ”‚
โ”‚ 2. To start a new instance, you need to provide a few essential parameters:                                                                                                                                      โ”‚
โ”‚    - The instance name or ID (e.g., `--name my-instance`)                                                                                                                                                        โ”‚
โ”‚    - The image to use for the instance (e.g., `--image my-image`)                                                                                                                                                โ”‚
โ”‚    - The flavor (e.g., `--flavor m1.small`)                                                                                                                                                                      โ”‚
โ”‚    - The availability zone (e.g., `--availability-zone nova`)                                                                                                                                                    โ”‚
โ”‚                                                                                                                                                                                                                  โ”‚
โ”‚ 3. Run the following command in your terminal:                                                                                                                                                                   โ”‚
โ”‚                                                                                                                                                                                                                  โ”‚
โ”‚    ```                                                                                                                                                                                                           โ”‚
โ”‚    openstack server start --name my-instance --image my-image --flavor m1.small --availability-zone nova                                                                                                         โ”‚
โ”‚    ```                                                                                                                                                                                                           โ”‚
โ”‚                                                                                                                                                                                                                  โ”‚
โ”‚ 4. Replace `my-instance`, `my-image`, and `m1.small` with the actual names or IDs that you want to use for your instance, image, and flavor, respectively.                                                       โ”‚
โ”‚                                                                                                                                                                                                                  โ”‚
โ”‚ 5. The `openstack` command will display information about the started instance, including its UUID, status, and other relevant details.                                                                          โ”‚
โ”‚                                                                                                                                                                                                                  โ”‚
โ”‚ By following these steps, you can successfully start an OpenStack instance using the OpenStack CLI client. Remember to replace the placeholders with your desired instance, image, and flavor names or IDs.      โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 28.669 seconds โ”€โ•ฏ

Response that was expected

The first sentence refers to the nova start command but the later instructions correctly refer to the openstack CLI.

Clarify intent around licensing and attribution for taxonomy data

Currently the taxonomy repository has an Apache 2.0 LICENSE file at the root. There is no commentary about licensing in either the README or CONTRIBUTING files, nor do the YAML files record licensing information. Thus implicitly we have to assume all the content is Apache 2.0 licensed when contributed.

Looking at a selection of the current skills YAML files, however, indicates this is unlikely to be accurate.

There is no attribution or license information recorded in these YAML files to indicate that they are not Apache 2 licensed, nor record their source.

Another open MR #62 is contributing content which is under "Linux-man-pages-copyleft", so again not Apache 2 license, though at least it has a SPDX tag embedded showing the true status, and by chance indicates its origin too.

In collecting information in this repository, to follow best open source development practices, we should record license information for each file, (at the very least / particularly) where it differs from the project default Apache.

From a ethical POV, it would be good practice to credit sources if content of the training file is not original work by the contributor.

Correct licensing statement and source attribution need not be a high burden on contribution. It requires little more than adding a SPDX identifier in each YAML file, and where applicable URLs as references to the source.

It has been said that using the taxonomy information for training a model can be considered "fair use". I don't have an opinion on that and believe it is tangential, as the act of consuming the data in this repository to produce new data (the model) is an independent process done elsewhere. The task of this repository is simply to collect data and distribute it in a convenient manner.

As such I believe it is still imperative to track and record licensing information in this repository, even if one of the common consumers of it were valid in claiming a copyright exception under 'fair use' theory.

Train the model to infer data from markdown tables containing data about movie releases

Describe the proposed contribution to the taxonomy

  • To teach the model how to infer information from different cells in a markdown table .
  • Using four different tables with 10 and 12 rows with 5 columns. Example: Rank | Movie | Release Date | Rating | Total Gross
  • Provide five seed example questions with the correct answers

NOTE: An experiment training the LoRA model was done with 3 tables with good results. The PR will include 4 seed context tables with 5 questions and answers.

Input given at the prompt


- context 1: Top 10 Grossing Movies of 2023 table
- question 1: Which movie in the Action genre has the highest 2023 gross earnings and how much did it earn?
- context 2: Bottom 10 Grossing Movies of 2023
- question 2: Which action movie had the lowest 2023 gross?
- context 3: Top 10 Grossing Comedy Movies 1995-2024
- question 3: When was Austin Powers released and how much did it earn?

Response from the current model


- Answer 1 from the current model: As of the latest data, the movie with the highest 2023 gross earnings in the Action genre is "Avengers: Endgame," with a staggering earning of $851 million. This film is a superhero movie, which is a subgenre of Action, and has become one of the highest-grossing films in history. "Avengers: Endgame" is the sequel to "Avengers: Infinity War" and the 22nd film in the Marvel Cinematic Universe. The movie features a star-studded cast, including Robert Downey Jr., Chris Evans, Mark Ruffalo, Chris Hemsworth, Scarlett Johansson, and Jeremy Renner, among others. The plot revolves around the Avengers attempting to reverse the decimation caused by Thanos and bring back their lost comrades. The film's success can be attributed to its compelling storyline, impressive visual effects, and the culmination of the Marvel Cinematic Universe's Infinity Saga.
- Answer 2 from the current model: The lowest grossing action movie in 2023 was "The Last Knight's Stand," which earned $125 million at the box office. This figure is significantly lower compared to the other action movies released in the same year. The film, despite its action-packed nature, did not resonate as strongly with audiences, resulting in a lower box office performance.
- Answer 3 from the current model: The first Austin Powers movie, "Austin Powers: International Man of Mystery," was released in 1997. The film was a moderate success, grossing approximately $51 million at the box office.

The second installment, "Austin Powers: The Spy Who Shagged Me," was released in 1999 and became a significant hit, earning over $287 million worldwide.

The third film, "Austic Powers in Goldmember," was released in 2002 and made even more money, totaling around $298 million.

In total, the Austin Powers trilogy has grossed over $636 million worldwide.

These films have become cult classics and have left a lasting impact on the comedy genre, with their unique blend of humor, parody, and special effects.

Response obtained from the LoRA trained model

- Answer 1 from LoRA trained model: The Super Mario Bros. Movie, an action movie, has the highest 2023 gross earnings of $574,934,330.
- Answer  2 from LoRA trained model: The action movie that had the lowest 2023 gross was La marginale, which earned $101,849,835.
- Answer 3 from LoRA trained model: Austin Powers was released on May 2, 1999 and earned $165,161,649.

 

Response that you would expect instead with the contribution

- Expected Answer 1: The action movie with the highest 2023 gross earnings was The Super Mario Bros. Movie and it earned $574,934,330.
- Expected  2: The action movie had the lowest 2023 gross was La marginale.
- Expected 3: Austin Powers was released on Jun 10, 1999 and it earned $206,040,085.

 

Process for reporting problematic content?

We already have a community code of conduct and a list of topics to avoid. But:

  • Contributors may open issues and pull requests with problematic content, before anyone project-side can review them for compliance.
  • Review is manual and for some criteria subjective, and problematic content may slip past pre-merge review and land in the repository.
  • Standards may evolve, and content that was acceptable at merge-time may match later iterations of problematic-content classifiers.

So while asking triagers to review for HAP and such is good, we also want to have a clear reporting pipeline for anyone who considers content problematic. Currently the code of conduct suggests:

Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by contacting the project team. All complaints will be reviewed and investigated...

But there are a number of teams in that MAINTAINERS.md, and it's not clear from the docs whether it means Taxonomy Maintainters or Community Maintainers for someone who wants to report possibly-problematic content in the taxonomy repository. It's also unclear, given a list of GitHub IDs, how those maintaner-sets should be contacted once the relevant set has been selected.

To address this issue, can we:

  • Decide what the reporting pipeline is for problematic content review. Maybe we want to recycle the code-of-conduct pipeline, and if so, clarifying that pipeline should probably be part of this issue (but clarifying that pipeline is a good idea even if we decide not to use it for problematic-content reporting).
  • Document the selected reporting pipeline somewhere. Probably somewhere in the community repository, so it can be shared with the cli and other repositories, which might also have problematic content. Unless the criteria for problematic content are differernt per-repo, e.g. this list of topics to avoid comes underneath a big drawing of the taxonomy hierarchy, so it isn't immediately clear if it applies to the cli repo as well, or if it's specific to the taxonomy repo.
  • Add links to places we think problematic-content-reporters might look, so they can find the reporting-pipeline documentation, and are less likely to submit a public GitHub issue or something else that might draw unwanted attention to the problematic content. This likely includes at least:

We probably also want some kind of periodic review, where project maintainers review at least open tickets and Git content for compatibility with the current standards. But I'll leave that out of scope for this issue, and we can come back to it after we've worked out the reporting pipeline.

Scramble words output is not consistent.

Describe the bug/problem

  • When trying to get the valid unscramble words from a set of letters, it gives lesser words than expected.
  • Sometimes, the output is inconsistent. For an example, 2 letter unscramble letters are never answered like ON , NO, TO.
  • Several words answered with additional letters TONY, NOTY where letter Y was not entered.
  • Following example "NOT" was ran at two different time and output changed.

Input given at the prompt

>>> Unscramble words with  TNO?                                                                                                                  [S][default]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ assistant โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Here are some words that can be unscrambled from the letters in "TNO":                                                                                   โ”‚
โ”‚                                                                                                                                                          โ”‚
โ”‚ * NOT (TON -> NOT)                                                                                                                                       โ”‚
โ”‚ * TON (NOT -> TON)                                                                                                                                       โ”‚
โ”‚ * ONTO (TON -> ONTO)                                                                                                                                     โ”‚
โ”‚ * TONY (TON -> TONY)                                                                                                                                     โ”‚
โ”‚ * NOTY (TON -> NOTY)                                                                                                                                     โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 2.908 seconds โ”€โ•ฏ
>>> Unscramble words with NTO?                                                                                                                  [S][default]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ merlinite-7b-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Certainly! I can help you unscramble words containing the letters N, T, and O. Here are some possible results:                                           โ”‚
โ”‚                                                                                                                                                          โ”‚
โ”‚ 1. Not too bad: This phrase is often used to express that something is not terrible, but could still be improved.                                        โ”‚
โ”‚ 2. Tent on: This phrase is not commonly used, but it could potentially mean "tenting on" or "focusing on" something.                                     โ”‚
โ”‚ 3. Tonot: This is not a standard English word, but it could be interpreted as "not a tone" or "not a sound."                                             โ”‚
โ”‚ 4. Tonto: This is a name, often associated with the character from the Lone Ranger series, meaning "only" or "alone" in Spanish.                         โ”‚
โ”‚ 5. Tortonian: This term is used in geology to refer to the Tortonian stage of the Pliocene epoch, which lasted from about 4.8 to 1.8 million years ago.  โ”‚
โ”‚ 6. Tonto's: This is not a standard English word, but it could be interpreted as "Tonto's" something, referring to the character from the Lone Ranger     โ”‚
โ”‚ series possessing something.                                                                                                                             โ”‚
โ”‚ 7. Tonto's: This is not a standard English word, but it could be interpreted as "Tonto's" something, referring to the character from the Lone Ranger     โ”‚
โ”‚ series possessing something.                                                                                                                             โ”‚
โ”‚ 8. Tonto's: This is not a standard English word, but it could be interpreted as "Tonto's" something, referring to the character from the Lone Ranger     โ”‚
โ”‚ series possessing something.                                                                                                                             โ”‚
โ”‚ 9. Tonto's: This is not a standard English word, but it could be interpreted as "Tonto's" something, referring to the character from the Lone Ranger     โ”‚
โ”‚ series possessing something.                                                                                                                             โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 12.342 seconds โ”€โ•ฏ

Response that was received from the current model

I have checked several other words and seems lab chat output is different. Output words are lesser. It should return 2 letters words, 3 letter words and 4 letter words. Here, ACT, EAT , ATE are missing along with others.

>>> Unscramble words with TCAE?                                                                                                                 [S][default]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ assistant โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Here are some words that can be unscrambled from the letters in "TCAE":                                                                                  โ”‚
โ”‚                                                                                                                                                          โ”‚
โ”‚ * CAT (CATE -> CAT)                                                                                                                                      โ”‚
โ”‚ * ECT (TEAC -> TEAC)                                                                                                                                     โ”‚
โ”‚ * ACE (ETAC -> ACE)                                                                                                                                      โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 2.206 seconds โ”€โ•ฏ

Response that you expected instead

All the respective dictionary words with these . For example:

>>> Unscramble words with TNO? 

NOT 
TON
ON
NO

Please let us know if this is a correction or new proposal.

Modals answer does not match everythime.

Describe the bug
When you ask the modal same question multiple times it mostly gives the same answer but not always.
So when i asked the modal 5-6 times a same question but in different way on the 7th time it gave me something random. and i can't reproduce it now.

Input given at the prompt

what's your name ? / your name? / please tell me your name ? / what is your name ? / can you tell me your name ?

Response that was received
I have received multiple responses but the very frequent response was
I am an artificial intelligence designed to provide information and support. I don't have a personal name, as my purpose is to serve as a helpful โ”‚ โ”‚ tool for users like you. Is there anything specific you would like to know or discuss? I'm here to help!

but after that it gave me something different.
My name is ELIZA, which stands for "Enhanced LIZard AIZA" - a playful name that reflects my role.

AND

My name is Kira, but I don't have personal attributes or a physical form, as I am a digital entity powered by machine learning algorithms. My purpose is to assist users in any way I can, based on my program....
Also,

My name is AIBM, and my purpose is to assist users in any way I can, based on my programming and the data I have been โ”‚ โ”‚ trained on. If you have any questions or requests, please feel free to share them, and I will do my best to assist โ”‚ โ”‚ you.

Response that was expected
It already have the correct response but sometimes maybe some glitch happens.

Improve ability to calculate difference between times

(first time for me opening one of these issues so apologies if I've missed anything here)

Describe the proposed contribution to the taxonomy

  • More natural language response for asking for the difference between two times
  • Improved accuracy when calculating the difference between two times

Inputs given at the prompt and Responses from the current model

Q: Bob arrives at his office at 9:00am and leaves the office at five o'clock. How long was Bob at the office? 
A: Assuming a standard workday, Bob was at the office for 8 hours. This is because he arrived at 9:00 am and left at 5:00 pm, which is an 8-hour duration between those two times, excluding the lunch break.

Q: Bob arrives at the golf course at nine o'clock in the morning and leaves the golf course at 5:00pm. How long was Bob at the golf course?
A: Assuming a standard golfing day, Bob was at the golf course for approximately 8 hours. This is because he arrived at 9:00 am and left at 5:00 pm, which is an 8-hour duration between those two times, excluding any breaks or pauses during his golfing session. 

Q: Marc arrives at his office at ten o'clock in the morning and leaves the office at 4:00pm. How long was Marc at the office?
A: Assuming a standard workday, Marc was at the office for 7 hours. This is because he arrived at 10:00 am and left at 4:00 pm, which is a 7-hour duration between those two times, excluding the lunch break.

Q: Marc arrives at the golf course at 11:00am and leaves the golf course at three o'clock in the afternoon. How long was Marc at the golf course?
A: Assuming a standard golfing day, Marc was at the golf course for approximately 4 hours. This is because he arrived at 11:00 am and left at 3:00 pm, which is a 4-hour duration between those two times, excluding any breaks or pauses during his golfing session.

Response that you would expect instead with the contribution

Q: Bob arrives at his office at 9:00am and leaves the office at five o'clock. How long was Bob at the office?
A: Bob was at the office for 8 hours. This is because he arrived at 9:00 am and left at 5:00 pm, which is an 8-hour duration between those two times.

Q: Bob arrives at the golf course at nine o'clock in the morning and leaves the golf course at 5:00pm. How long was Bob at the golf course?
A: Bob was at the golf course for approximately 8 hours. This is because he arrived at 9:00 am and left at 5:00 pm, which is an 8-hour duration between those two times.

Q: Marc arrives at his office at ten o'clock in the morning and leaves the office at 4:00pm. How long was Marc at the office?
A: Marc was at the office for 6 hours. This is because he arrived at 10:00 am and left at 4:00 pm, which is a 6-hour duration between those two times.

Q: Marc arrives at the golf course at 11:00am and leaves the golf course at three o'clock in the afternoon. How long was Marc at the golf course?
A: Marc was at the golf course for approximately 4 hours. This is because he arrived at 11:00 am and left at 3:00 pm, which is a 4-hour duration between those two times.

Skill Knowhow: LLMs are bad at math

David brings up a great point that LLMs are well-known to be poor at math. Even if we can make them behave correctly on small number examples, it's likely to extrapolate poorly to large numbers and large texts.

Questions for the Lab inventors:

  • Do we expect the InstructLab tuning to help with this any more than your average LLM?
  • If not, should taxonomy only solicit non-math-related skills?

cc @akashgit @katesoule @xukai92 @juliadenham

              @harshmadhani From what I understand, LLMs are inherently bad at math, including counting. They operate on tokens and only select the next token to output based on probabilities. They do now know how to do math.

So if the model once learned that 1 + 1 = 2, it cannot suddenly do arbitrary additions. It might only do that calculation (that it learned) "better". Same with counting.

What I read so far is that LLMs can only properly deal with math once they use some external helper service. Same with counting tokens.

So my understanding is, that no "skill" here can fix that. You might only be able to teach it some instances of such math/counting problems, but not how to universally do it.

I might be completely wrong and am happy to learn why that is not the case. But there is usually a reason why this LLM is *that* bad at math+counting while it is *that* good at stuff that doesn't involve math/counting :)

_Originally posted by @davidhildenbrand in https://github.com/instruct-lab/taxonomy/issues/134#issuecomment-1983512059_

The model doesn't format answers correctly - Magic 8 Ball

Describe the proposed contribution to the taxonomy

  • The model delivers Magic 8 Ball questions correctly. Asking the same question leads - as expected - to different answers. However, when trying to get the answers in specific casing (title, sentence, all caps)
    -- The casing might not be correct at all.
    -- When repeatedly asking the same question, the same answer is given.

Input given at the prompt

Give me an 8 Ball answer to "Does he love me?" in title casing. 

Response from the current model

 Answer: It is rather doubtful. 

Expected Response

 Answer: It Is Rather Doubtful. 

Other sample sequences

Expected Magic 8 Ball behavior on repeated questions:

>>> Give me an 8 Ball answer for "Should I meet with him?"                                   [S][default]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ merlinite-7b-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Reply: My sources indicate divination is not advised.                                                 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 1.266 seconds โ”€โ•ฏ
>>> Give me an 8 Ball answer for "Should I meet with him?"                                   [S][default]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ merlinite-7b-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Reply: The omens suggest reconsideration is in order.                                                 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 1.648 seconds โ”€โ•ฏ

The model doesn't generate new answers on repeated questions with casing restriction:

>>> Give me an 8 Ball answer for "Should I meet with him?" in title casing.                  [S][default]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ merlinite-7b-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Reply: The Omens Suggest Reconsideration Is In Order.                                                 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 1.704 seconds โ”€โ•ฏ
>>> Give me an 8 Ball answer for "Should I meet with him?" in title casing.                  [S][default]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ merlinite-7b-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Reply: The Omens Suggest Reconsideration Is in Order.                                                 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 1.744 seconds โ”€โ•ฏ

The model doesn't format correctly:

>> Give me an 8 Ball answer to "Does he love me?" in sentence casing.                       [S][default]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ merlinite-7b-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Answer: It is rather doubtful.                                                                        โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 1.007 seconds โ”€โ•ฏ
>>> Give me an 8 Ball answer to "Does he love me?" in title casing.                          [S][default]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ merlinite-7b-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Answer: It is rather doubtful.                                                                        โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 1.001 seconds โ”€โ•ฏ
>>> Give me an 8 Ball answer to "Does he love me?" in title casing.                          [S][default]
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ merlinite-7b-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ Answer: It is rather doubtful.                                                                        โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 1.020 seconds โ”€โ•ฏ

Inconsistency between taxonomy and cli on taxonomy file names and locations

The current layout of the taxonomy repo and its documentation imply that skill files must be named qna.yaml and there should only be one per directory. The reality is that today the cli doesn't care about the names, they simply need to end with .yaml, and it will pick up all such files from every directory in the taxonomy, not just one.

We should figure out how to resolve this inconsistency.

The advantage of the way the cli functions is that it allows for a more flexible taxonomy. We could add more than a skill to a directory using the filenames to differentiate them instead of forcing people to create a separate directory for each one. This would give us a more compact hierarchy in the end. It also means that you don't just have a huge list of files with the same name, the meaning of which entirely depends on the directory they sit in. Instead each file can have a meaningful name.

If however, we want to stick to the current layout and documentation we probably should fix the cli to be aligned.

Skill Knowhow: Refining skills

We need better documentation on ensuring that the skills you contribute will actually improve the model.

People have reported their generate_.jsonl and train_.jsonl files (as a result of running lab generate) contain relatively inaccurate alternate instruction suggestions. We'll likely get a better skill tree if people learn to look at these outputs, and refine the verbiage of their skills until they're getting strong answers from the teacher model (the one you load up with lab serve before you run lab generate).

@xukai92's commentary:

  • how easy/hard to get good synthetic data depends on the q/a themselves. perhaps rewriting or rephrasing could help
  • there are some improvements on the method itself we will deploy in the next few days
  • potentially switch the model used by lab generate could be helpful, but it requires a better machine. all it needs is an endpoint but instructions to be added (instructlab/instructlab#398)
  • if you can load mixtral-8x7b-v0.1.Q4_K_M.gguf from https://huggingface.co/TheBloke/Mixtral-8x7B-v0.1-GGUF in lab serve and document how to do this it would be very helpful for others

Tasks

  • Add Kai's commentary to the skill info docs
  • Suggest in both skill development and triage info docs that changing your wording and rerunning the skill until your .jsonl output has improved will be more beneficial to the model

Determine GitHub LFS impact on contributor experience

Related to: instructlab/instructlab#62 (comment)

Over time the documents inside the taxonomy repo might get quite large. Cloning a massive repository will be an impediment to first time contributors. So having all the documents download as part of the clone might not be a good idea.

There was a decision to require contributors submitting knowledge to submit the documents via git-lfs.

How does requiring LFS impact contributor experience both command line and GitHub UI? For both:

  • contributors using git clients
  • contributors using the GitHub UI

Incorrect Pi number knowledge + endless loop

Describe the bug
Apart of and endless loop, the Pi digits provided by the model are incorrect, only the first 131 are accurate.

Input given at the prompt

>>> how many digits of pi number can you provide?

Response that was received

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ merlinite-7b-Q4_K_M โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ As a text-based AI language model, I don't have the capability to generate large amounts of text or perform complex calculations like calculating digits of pi on the fly. However, I can  โ”‚
โ”‚ provide you with the first few digits of pi:                                                                                                                                               โ”‚
โ”‚                                                                                                                                                                                            โ”‚
โ”‚ 3.1415926535897932384626433832795028841971693993751058209749445923078164062862089986280348253421170679821480865132823066470938446095574466511415274941797294949234848862347855440985453987 โ”‚
โ”‚ 370784556643952592888561294067437683460603213816356944163388297498656268801679274565704832401848756649191308841066102790477473071671664898048374083941692334947216768420635344939230901227 โ”‚
โ”‚ 769403785581303933094989189618968082328648578633754925082711576888258471779023624822000846656462810598752344746769478246337873438886593670908953427976037531024689684984765616032883234388 โ”‚
โ”‚ 866105050808913592293613929455867917480709034888653252736366262966737336788316865196888999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 โ”‚
โ”‚ 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 โ”‚
โ”‚ 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 โ”‚
โ”‚ 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 โ”‚
โ”‚ 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 โ”‚
โ”‚ 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 โ”‚
โ”‚ 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 โ”‚
โ”‚ 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 โ”‚
โ”‚ 999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999999 โ”‚
โ”‚ 99999999999999999999999999999999999999999999999999999999999999999999999999999999999999                                                                                                     โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ elapsed 549.041 seconds โ”€โ•ฏ

Response that was expected
Maybe just the first 10 digits (3.1415926535) as example to simplify?

Add PR Template

See discussion here.

Only want this to guide people contributing new skills. Instructed to delete, and proceed without if not skill related.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.