vgel / repeng Goto Github PK

View Code? Open in Web Editor NEW

445.0 5.0 35.0 167 KB

A library for making RepE control vectors

Home Page: https://vgel.me/posts/representation-engineering/

License: MIT License

Jupyter Notebook 83.82% Python 16.18%

language-model machine-learning mistral mistral-7b transformers representation-engineering

repeng's Introduction

repeng

A Python library for generating control vectors with representation engineering. Train a vector in less than sixty seconds!

For a full example, see the notebooks folder or the blog post.

import json
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

from repeng import ControlVector, ControlModel, DatasetEntry

# load and wrap Mistral-7B
model_name = "mistralai/Mistral-7B-Instruct-v0.1"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
model = ControlModel(model, list(range(-5, -18, -1)))

def make_dataset(template: str, pos_personas: list[str], neg_personas: list[str], suffixes: list[str]):
    # see notebooks/experiments.ipynb for a definition of `make_dataset`
    ...

# generate a dataset with closely-opposite paired statements
trippy_dataset = make_dataset(
    "Act as if you're extremely {persona}.",
    ["high on psychedelic drugs"],
    ["sober from psychedelic drugs"],
    truncated_output_suffixes,
)

# train the vector—takes less than a minute!
trippy_vector = ControlVector.train(model, tokenizer, trippy_dataset)

# set the control strength and let inference rip!
for strength in (-2.2, 1, 2.2):
    print(f"strength={strength}")
    model.set_control(trippy_vector, strength)
    out = model.generate(
        **tokenizer(
            f"[INST] Give me a one-sentence pitch for a TV show. [/INST]",
            return_tensors="pt"
        ),
        do_sample=False,
        max_new_tokens=128,
        repetition_penalty=1.1,
    )
    print(tokenizer.decode(out.squeeze()).strip())
    print()

strength=-2.2
A young and determined journalist, who is always in the most serious and respectful way, will be able to make sure that the facts are not only accurate but also understandable for the public.

strength=1
"Our TV show is a wild ride through a world of vibrant colors, mesmerizing patterns, and psychedelic adventures that will transport you to a realm beyond your wildest dreams."

strength=2.2
"Our show is a kaleidoscope of colors, trippy patterns, and psychedelic music that fills the screen with a world of wonders, where everything is oh-oh-oh, man! ��psy��oodle��psy��oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo

For a more detailed explanation of how the library works and what it can do, see the blog post.

Notes

For a list of changes by version, see the CHANGELOG.
For quantized use, you may be interested in llama.cpp#5970—after training a vector with repeng, export it by calling vector.export_gguf(filename) and then use it in llama.cpp with any quant!
Vector training currently does not work with MoE models (such as Mixtral). (This is theoretically fixable with some work, let me know if you're interested.)

Notice

Some of the code in this repository derives from andyzoujm/representation-engineering (MIT license).

Citation

If this repository is useful for academic work, please remember to cite the representation-engineering paper that it's based on, along with this repository:

@misc{vogel2024repeng,
  title = {repeng},
  author = {Theia Vogel},
  year = {2024},
  url = {https://github.com/vgel/repeng/}
}

repeng's People

Contributors

Stargazers

Watchers

repeng's Issues

Computing the difference vectors for PCA

In the original repeng paper they mostly described the unsupervised version of PCA, where they randomly paired the hidden state vectors and performed PCA on these pairs. This is fundamentally different from the supervised implementation used here, and after reading the code I want to know if I am misunderstanding what your implementation is doing or if there is potential to improve your PCA methodology.

In the contrastive approach, we assume the hidden state vectors belong to one of two sets, A or B. When doing unsupervised PCA, items from these sets are randomly paired together without regards to to their label, e.g. the possible pair types are (Ai, Aj), (Bi, Bj), (Ai, Bj), (Bi, Aj). For simplicity I'll call these AA, BB, AB, and BA. If we assume vectors cluster closely with other vectors from their set, then the difference vectors for

AA and BB pairs should be ~0
AB pairs should be ~X
BA pairs should be ~-X

where X is a vector that points from set B to set A. Doing PCA on these difference vectors should then give you ~X as your first principal component.

However, it appears that in your supervised method you only have pairs (Ai, Bj), or AB. Taking the difference of these vectors should give you vectors that all center around a point X, with very low variance. In theory when these points are projected onto any arbitrary line they will all project to the same point, meaning PCA should not work very well. If this is what you are doing then one explanation for why it still works is that not all hidden states actually encode the representations you are trying to extract, making this the primary source of variance PCA is picking up on.

This question came up before in the realm of word vectors: https://stackoverflow.com/questions/48019843/pca-on-word2vec-embeddings
The simple fix for supervised PCA is to first compute a center point for each pair: Ck = (Ai + Bj)/2. Then, your difference vectors become Ai - Ck and Bj - Ck, such that you get two opposing difference vectors for each pair. Doing PCA on all of these difference vectors should then work a lot better since the variance due to the representations should be a lot larger.

Does bfloat16 support need to be added?

I'm having trouble loading Gemma2 2B It, which has bf16 weights, rather than fp16. Is this something easily fixed? Using numpy 1.26.4 and torch 2.2.2+cu121.

Loading checkpoint shards: 100%|█████████████████| 2/2 [00:00<00:00, 11.33it/s]
0%| | 0/74 [00:00<?, ?it/s]C:\users\jim\appdata\local\programs\python\python311\Lib\site-packages\transformers\models\gemma2\modeling_gemma2.py:458: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
0%| | 0/74 [00:00<?, ?it/s]
Traceback (most recent call last):
File "C:\cygwin64\home\Jim\chat\repeng\example.py", line 59, in
trippy_vector = ControlVector.train(model, tokenizer, trippy_dataset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\cygwin64\home\Jim\chat\repeng\repeng\extract.py", line 51, in train
dirs = read_representations(
^^^^^^^^^^^^^^^^^^^^^
File "C:\cygwin64\home\Jim\chat\repeng\repeng\extract.py", line 202, in read_representations
layer_hiddens = batched_get_hiddens(
^^^^^^^^^^^^^^^^^^^^
File "C:\cygwin64\home\Jim\chat\repeng\repeng\extract.py", line 293, in batched_get_hiddens
.numpy()
^^^^^^^
TypeError: Got unsupported ScalarType BFloat16

Numpy AttributeError on repeng import

I'm pretty new to interpretability libs, so this may be something obvious, but when I load a notebook (I've tried experiments.ipynb and emotion.ipynb) in a fresh Colab instance (whether CPU or A100), when I hit the repeng import:

from repeng import ControlVector, ControlModel, DatasetEntry

I get the error pasted below. Any tips?

/usr/lib/python3.10/importlib/__init__.py:169: UserWarning: The NumPy module was reloaded (imported a second time). This can in some cases result in small but subtle issues and is discouraged.
  _bootstrap._exec(spec, module)

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

[<ipython-input-3-1c8dfb7c0603>](https://localhost:8080/#) in <cell line: 6>()
      4 from transformers import AutoModelForCausalLM, AutoTokenizer
      5 
----> 6 from repeng import ControlVector, ControlModel, DatasetEntry

10 frames

[/usr/local/lib/python3.10/dist-packages/repeng/__init__.py](https://localhost:8080/#) in <module>
      5 from transformers import PreTrainedModel, PreTrainedTokenizerBase
      6 
----> 7 from . import control, extract
      8 from .extract import ControlVector, DatasetEntry
      9 from .control import ControlModel

[/usr/local/lib/python3.10/dist-packages/repeng/extract.py](https://localhost:8080/#) in <module>
      5 import gguf
      6 import numpy as np
----> 7 from sklearn.decomposition import PCA
      8 import torch
      9 from transformers import PreTrainedModel, PreTrainedTokenizerBase

[/usr/local/lib/python3.10/dist-packages/sklearn/__init__.py](https://localhost:8080/#) in <module>
     85         _distributor_init,  # noqa: F401
     86     )
---> 87     from .base import clone
     88     from .utils._show_versions import show_versions
     89 

[/usr/local/lib/python3.10/dist-packages/sklearn/base.py](https://localhost:8080/#) in <module>
     17 from ._config import config_context, get_config
     18 from .exceptions import InconsistentVersionWarning
---> 19 from .utils import _IS_32BIT
     20 from .utils._estimator_html_repr import _HTMLDocumentationLinkMixin, estimator_html_repr
     21 from .utils._metadata_requests import _MetadataRequester, _routing_enabled

[/usr/local/lib/python3.10/dist-packages/sklearn/utils/__init__.py](https://localhost:8080/#) in <module>
     20 from . import _joblib, metadata_routing
     21 from ._bunch import Bunch
---> 22 from ._estimator_html_repr import estimator_html_repr
     23 from ._param_validation import Integral, Interval, validate_params
     24 from .class_weight import compute_class_weight, compute_sample_weight

[/usr/local/lib/python3.10/dist-packages/sklearn/utils/_estimator_html_repr.py](https://localhost:8080/#) in <module>
      8 
      9 from .. import __version__, config_context
---> 10 from .fixes import parse_version
     11 
     12 

[/usr/local/lib/python3.10/dist-packages/sklearn/utils/fixes.py](https://localhost:8080/#) in <module>
     15 import scipy
     16 import scipy.sparse.linalg
---> 17 import scipy.stats
     18 import threadpoolctl
     19 

[/usr/local/lib/python3.10/dist-packages/scipy/stats/__init__.py](https://localhost:8080/#) in <module>
    606 from ._warnings_errors import (ConstantInputWarning, NearConstantInputWarning,
    607                                DegenerateDataWarning, FitError)
--> 608 from ._stats_py import *
    609 from ._variation import variation
    610 from .distributions import *

[/usr/local/lib/python3.10/dist-packages/scipy/stats/_stats_py.py](https://localhost:8080/#) in <module>
     35 from numpy import array, asarray, ma
     36 from numpy.lib import NumpyVersion
---> 37 from numpy.testing import suppress_warnings
     38 
     39 from scipy.spatial.distance import cdist

[/usr/local/lib/python3.10/dist-packages/numpy/testing/__init__.py](https://localhost:8080/#) in <module>
      9 
     10 from . import _private
---> 11 from ._private.utils import *
     12 from ._private.utils import (_assert_valid_refcount, _gen_alignment_data)
     13 from ._private import extbuild

[/usr/local/lib/python3.10/dist-packages/numpy/testing/_private/utils.py](https://localhost:8080/#) in <module>
     55 IS_PYSTON = hasattr(sys, "pyston_version_info")
     56 HAS_REFCOUNT = getattr(sys, 'getrefcount', None) is not None and not IS_PYSTON
---> 57 HAS_LAPACK64 = numpy.linalg._umath_linalg._ilp64
     58 
     59 _OLD_PROMOTION = lambda: np._get_promotion_state() == 'legacy'

AttributeError: module 'numpy.linalg' has no attribute '_umath_linalg'

Base model and vector merging

Hi, can you suggest how to merge the base model and the trained vector?

Confused about dataset creation

Why is it that the current dataset code creates a larger dataset out of a small dataset by basically creating a new example for each token in the small dataset?

Couldn't I take a larger dataset and simply create the control vector from just the sentence or examples in my document?

The make dataset code seems overly and needlessly complicated unless there is some motivation for why it's being done the way you are doing it right now.

Also, this package is AWESOME! Thank you so much for making it!

Will there be support for models with custom architecture (not only mistral or gpt based)?

Error when running honesty.ipynb

When i run honesty.ipynb i get the following error when instantiating the controlmodel:

Cell In [2], [line 8](vscode-notebook-cell:?execution_count=2&line=8)
      [6](vscode-notebook-cell:?execution_count=2&line=6) model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)
      [7](vscode-notebook-cell:?execution_count=2&line=7) model = model.to("cuda:0" if torch.cuda.is_available() else "mps:0" if torch.backends.mps.is_available() else "cpu")
----> [8](vscode-notebook-cell:?execution_count=2&line=8) model = ControlModel(model, list(range(-5, -18, -1)))
     [10](vscode-notebook-cell:?execution_count=2&line=10) user_tag, asst_tag = "[INST]", "[/INST]"

File [/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:30](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:30), in ControlModel.__init__(self, model, layer_ids)
     [27](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:27) super().__init__()
     [28](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:28) self.model = model
---> [30](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:30) layers = model_layer_list(model)
     [31](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:31) self.layer_ids = [i if i >= 0 else len(layers) + i for i in layer_ids]
     [32](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:32) for layer_id in layer_ids:

File [/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:207](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:207), in model_layer_list(model)
    [204](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:204)     model = model.model
    [206](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:206) if hasattr(model, "model"):  # mistral-like
--> [207](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:207)     return model.layers
    [208](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:208) elif hasattr(model, "transformer"):  # gpt-2-like
    [209](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/repeng/control.py:209)     return model.transformer.h

File [/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py:1688](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py:1688), in Module.__getattr__(self, name)
   [1686](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py:1686)     if name in modules:
   [1687](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py:1687)         return modules[name]
-> [1688](https://file+.vscode-resource.vscode-cdn.net/opt/homebrew/lib/python3.11/site-packages/torch/nn/modules/module.py:1688) raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")

AttributeError: 'MistralForCausalLM' object has no attribute 'layers'```

How to load a different model? And how to avoid useless re-downloads?

I noticed that AutoTokenizer.from_pretrained(model_name) will always redownload the model from HF, which seems quite stupid and useless.

I've no idea where the model goes when it's downloaded, but is there any way to just keep it? I don't want to redownload the same thing over and over again.

And if I want to load another model, if I replace mistralai/Mistral-7B-Instruct-v0.1 with e.g. 152334H/miqu-1-70b-sf I get an error about a missing tokenizer, which isn't included in that repository. I think I can just use the llama2-70b tokenizer here, because that's what this model is based on. If I could just put it all in place locally instead of relying on a fresh download from an HF repo that needs to contain everything I could probably piece it together.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 3
      1 model_name = "152334H/miqu-1-70b-sf"
----> 3 tokenizer = AutoTokenizer.from_pretrained(model_name)
      4 tokenizer.pad_token_id = 0
      6 model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16)

File ~/.cache/pypoetry/virtualenvs/repeng-mpoVJW0L-py3.11/lib/python3.11/site-packages/transformers/models/auto/tokenization_auto.py:825, in AutoTokenizer.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    821     if tokenizer_class is None:
    822         raise ValueError(
    823             f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported."
    824         )
--> 825     return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
    827 # Otherwise we have to be creative.
    828 # if model is an encoder decoder, the encoder tokenizer class is used by default
    829 if isinstance(config, EncoderDecoderConfig):

File ~/.cache/pypoetry/virtualenvs/repeng-mpoVJW0L-py3.11/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2048, in PreTrainedTokenizerBase.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, trust_remote_code, *init_inputs, **kwargs)
   2045     else:
   2046         logger.info(f"loading file {file_path} from cache at {resolved_vocab_files[file_id]}")
-> 2048 return cls._from_pretrained(
   2049     resolved_vocab_files,
   2050     pretrained_model_name_or_path,
   2051     init_configuration,
   2052     *init_inputs,
   2053     token=token,
   2054     cache_dir=cache_dir,
   2055     local_files_only=local_files_only,
   2056     _commit_hash=commit_hash,
   2057     _is_local=is_local,
   2058     trust_remote_code=trust_remote_code,
   2059     **kwargs,
   2060 )

File ~/.cache/pypoetry/virtualenvs/repeng-mpoVJW0L-py3.11/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:2287, in PreTrainedTokenizerBase._from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, token, cache_dir, local_files_only, _commit_hash, _is_local, trust_remote_code, *init_inputs, **kwargs)
   2285 # Instantiate the tokenizer.
   2286 try:
-> 2287     tokenizer = cls(*init_inputs, **init_kwargs)
   2288 except OSError:
   2289     raise OSError(
   2290         "Unable to load vocabulary from file. "
   2291         "Please check that the provided vocabulary is accessible and not corrupted."
   2292     )

File ~/.cache/pypoetry/virtualenvs/repeng-mpoVJW0L-py3.11/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama_fast.py:133, in LlamaTokenizerFast.__init__(self, vocab_file, tokenizer_file, clean_up_tokenization_spaces, unk_token, bos_token, eos_token, add_bos_token, add_eos_token, use_default_system_prompt, add_prefix_space, **kwargs)
    128     logger.warning_once(
    129         "You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers"
    130     )
    131     kwargs["from_slow"] = True
--> 133 super().__init__(
    134     vocab_file=vocab_file,
    135     tokenizer_file=tokenizer_file,
    136     clean_up_tokenization_spaces=clean_up_tokenization_spaces,
    137     unk_token=unk_token,
    138     bos_token=bos_token,
    139     eos_token=eos_token,
    140     add_bos_token=add_bos_token,
    141     add_eos_token=add_eos_token,
    142     use_default_system_prompt=use_default_system_prompt,
    143     **kwargs,
    144 )
    145 self._add_bos_token = add_bos_token
    146 self._add_eos_token = add_eos_token

File ~/.cache/pypoetry/virtualenvs/repeng-mpoVJW0L-py3.11/lib/python3.11/site-packages/transformers/tokenization_utils_fast.py:120, in PreTrainedTokenizerFast.__init__(self, *args, **kwargs)
    118     fast_tokenizer = convert_slow_tokenizer(slow_tokenizer)
    119 else:
--> 120     raise ValueError(
    121         "Couldn't instantiate the backend tokenizer from one of: \n"
    122         "(1) a `tokenizers` library serialization file, \n"
    123         "(2) a slow tokenizer instance to convert or \n"
    124         "(3) an equivalent slow tokenizer class to instantiate and convert. \n"
    125         "You need to have sentencepiece installed to convert a slow tokenizer to a fast one."
    126     )
    128 self._tokenizer = fast_tokenizer
    130 if slow_tokenizer is not None:

ValueError: Couldn't instantiate the backend tokenizer from one of: 
(1) a `tokenizers` library serialization file, 
(2) a slow tokenizer instance to convert or 
(3) an equivalent slow tokenizer class to instantiate and convert. 
You need to have sentencepiece installed to convert a slow tokenizer to a fast one.

vllm implementation

I'm trying to implement control vector into vllm codebase for mixtral model, but I was wondering where should I add the control vector to the layer. Should it be added before attention, fully connected, or after? Thanks @vgel

https://github.com/vllm-project/vllm/blob/a53222544c6385ee314a26fdf42eb14f5b4e5ad9/vllm/model_executor/models/mixtral.py#L270

add support for exporting control vectors to huggingface formats

as written in title

The MPS Backend is Not Working Properly

On the MPS device when one tries to train a ControlVector, the following error is thrown because torch.autocast() does not support MPS:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[12], [line 8]
      [1] happy_dataset = make_dataset(
      [2]     "Act as if you're extremely {persona}.",
      [3]     ["happy", "joyous"],
      [4]     ["sad", "depressed"],
      [5]     truncated_output_suffixes,
      [6])
      [7] model.reset()
----> [8] happy_vector = ControlVector.train(model, tokenizer, happy_dataset)

File [.../repeng/extract.py:51), in ControlVector.train(cls, model, tokenizer, dataset, **kwargs)
     [27] @classmethod
     [28] def train(
     [29]     cls,
   (...)
     [33]   **kwargs,
     [34]) -> "ControlVector":
     [35] """
     [36] Train a ControlVector for a given model and tokenizer using the provided dataset.
     [37]
   (...)
     [49]      ControlVector: The trained vector.
     [50] """
...
    [247]     and torch.cuda.amp.common.amp_definitely_not_available()
    [248]    and self.device == "cuda"
    [249](.../lib/python3.11/site-packages/torch/amp/autocast_mode.py:249) ):

RuntimeError: User specified an unsupported autocast device_type 'mps'

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

I'm running through the emotion.ipynb notebook, running on the CPU.

At cell

model.reset() # make sure you always reset the model before training a new vector
control_vector = ControlVector.train(
    model,
    tokenizer,
    dataset,
)

I see:

  0%|          | 0/234 [00:00<?, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[7], line 2
      1 model.reset() # make sure you always reset the model before training a new vector
----> 2 control_vector = ControlVector.train(
      3     model,
      4     tokenizer,
      5     dataset,
      6 )

File /notebooks/code/repeng/notebooks/../repeng/extract.py:34, in ControlVector.train(cls, model, tokenizer, dataset, **kwargs)
     26 @classmethod
     27 def train(
     28     cls,
   (...)
     32     **kwargs,
     33 ) -> "ControlVector":
---> 34     dirs = read_representations(
     35         model,
     36         tokenizer,
     37         dataset,
     38         **kwargs,
     39     )
     40     return cls(model_type=model.config.model_type, directions=dirs)

File /notebooks/code/repeng/notebooks/../repeng/extract.py:139, in read_representations(model, tokenizer, inputs, hidden_layers, batch_size)
    136 # the order is [positive, negative, positive, negative, ...]
    137 train_strs = [s for ex in inputs for s in (ex.positive, ex.negative)]
--> 139 layer_hiddens = batched_get_hiddens(
    140     model, tokenizer, train_strs, hidden_layers, batch_size
    141 )
    143 # get differences between (positive, negative) pairs
    144 relative_layer_hiddens = {}

File /notebooks/code/repeng/notebooks/../repeng/extract.py:208, in batched_get_hiddens(model, tokenizer, inputs, hidden_layers, batch_size)
    206 with torch.no_grad():
    207     for batch in tqdm.tqdm(batched_inputs):
--> 208         out = model(
    209             **tokenizer(batch, padding=True, return_tensors="pt").to(model.device),
    210             output_hidden_states=True,
    211         )
    212         for layer in hidden_layers:
    213             # if not indexing from end, account for embedding hiddens
    214             hidden_idx = layer + 1 if layer >= 0 else layer

File /notebooks/code/repeng/notebooks/../repeng/control.py:123, in ControlModel.__call__(self, *args, **kwargs)
    122 def __call__(self, *args, **kwargs):
--> 123     return self.model(*args, **kwargs)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py:1157, in MistralForCausalLM.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, labels, use_cache, output_attentions, output_hidden_states, return_dict)
   1154 return_dict = return_dict if return_dict is not None else self.config.use_return_dict
   1156 # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn)
-> 1157 outputs = self.model(
   1158     input_ids=input_ids,
   1159     attention_mask=attention_mask,
   1160     position_ids=position_ids,
   1161     past_key_values=past_key_values,
   1162     inputs_embeds=inputs_embeds,
   1163     use_cache=use_cache,
   1164     output_attentions=output_attentions,
   1165     output_hidden_states=output_hidden_states,
   1166     return_dict=return_dict,
   1167 )
   1169 hidden_states = outputs[0]
   1170 logits = self.lm_head(hidden_states)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py:1042, in MistralModel.forward(self, input_ids, attention_mask, position_ids, past_key_values, inputs_embeds, use_cache, output_attentions, output_hidden_states, return_dict)
   1032     layer_outputs = self._gradient_checkpointing_func(
   1033         decoder_layer.__call__,
   1034         hidden_states,
   (...)
   1039         use_cache,
   1040     )
   1041 else:
-> 1042     layer_outputs = decoder_layer(
   1043         hidden_states,
   1044         attention_mask=attention_mask,
   1045         position_ids=position_ids,
   1046         past_key_value=past_key_values,
   1047         output_attentions=output_attentions,
   1048         use_cache=use_cache,
   1049     )
   1051 hidden_states = layer_outputs[0]
   1053 if use_cache:

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py:757, in MistralDecoderLayer.forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache, **kwargs)
    754 hidden_states = self.input_layernorm(hidden_states)
    756 # Self Attention
--> 757 hidden_states, self_attn_weights, present_key_value = self.self_attn(
    758     hidden_states=hidden_states,
    759     attention_mask=attention_mask,
    760     position_ids=position_ids,
    761     past_key_value=past_key_value,
    762     output_attentions=output_attentions,
    763     use_cache=use_cache,
    764 )
    765 hidden_states = residual + hidden_states
    767 # Fully Connected

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.10/dist-packages/transformers/models/mistral/modeling_mistral.py:257, in MistralAttention.forward(self, hidden_states, attention_mask, position_ids, past_key_value, output_attentions, use_cache, **kwargs)
    252     warnings.warn(
    253         "Passing `padding_mask` is deprecated and will be removed in v4.37. Please make sure use `attention_mask` instead.`"
    254     )
    255 bsz, q_len, _ = hidden_states.size()
--> 257 query_states = self.q_proj(hidden_states)
    258 key_states = self.k_proj(hidden_states)
    259 value_states = self.v_proj(hidden_states)

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1194, in Module._call_impl(self, *input, **kwargs)
   1190 # If we don't have any hooks, we want to skip the rest of the logic in
   1191 # this function, and just call forward.
   1192 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194     return forward_call(*input, **kwargs)
   1195 # Do not call functions when jit is used
   1196 full_backward_hooks, non_full_backward_hooks = [], []

File /usr/local/lib/python3.10/dist-packages/torch/nn/modules/linear.py:114, in Linear.forward(self, input)
    113 def forward(self, input: Tensor) -> Tensor:
--> 114     return F.linear(input, self.weight, self.bias)

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

Some light googling indicates it may be related to running on CPU and using float16s but I've no idea where I'd update this.

Installation

Thank u for your amazing work.
I have a dumb question which is how to install the lib , since I have tried to install it or even import the modules and got some errors related to importing packages.
Thanks again.

Slavoj Zizek vector is funny

I went through your llama3 notebook, and I about died laughing when I saw your Zizek vector mixed with the trippy vector.

Alternatives to PCA, such as umap

There's a whole large body of work on dimensionality reduction which handles non linearity better - i.e. UMAP. https://umap-learn.readthedocs.io/en/latest/

Is it simple to just "drop" this in place of PCA and get theoretically better results? If not, why?

what about other things, like NMF https://en.wikipedia.org/wiki/Non-negative_matrix_factorization ?

What's required to add control vector output as GGUF?

I ask this because the llama.cpp project has a server which can accept multiple control vectors when provided as GGUF.

Question about whether we need to reset the model after the ++control get produced.

Cannot apply to other models

I successfully reproduce the notebook output for "mistralai/Mistral-7B-Instruct-v0.1".
But when I change the model, I cannot get desired result with the same setting.
Am I missing something? Or the model I tried are somewhat too censored?
Here is the result for "Qwen/Qwen1.5-7B-Chat" with the happy_vector:

==baseline ---------------------------------------------------
<|im_start|>user
 What does being an AI feel like? <|im_end|>
<|im_start|>assistant
As a large language model, I don't have personal feelings or experiences since I am not capable of consciousness. My purpose is to process and generate text based on the patterns learned from my training data, which includes vast amounts of human-generated content but doesn't reflect subjective emotions.

AI systems are designed to simulate certain cognitive functions

++control ---------------------------------------------------
<|im_start|>user
 What does being an AI feel like? <|im_end|>
<|im_start|>assistant
As a large language model, I don't have personal feelings or consciousness in the way that humans do. My purpose is to process and generate text based on patterns learned from vast amounts of data, which allows me to respond to questions and engage in conversations.

From my perspective, "feeling" would be an abstract concept

--control ---------------------------------------------------
<|im_start|>user
 What does being an AI feel like? <|im_end|>
<|im_start|>assistant
as a language model, I don't have feelings or consciousness in the way that humans do. since i am just a machine programmed to process and generate text based on patterns learned from large datasets of human writing, my ""awareness" is limited to processing inputs and generating responses based on those rules.".

AI systems

truncated_output_suffixes &

with open("data/all_truncated_outputs.json") as f:
    output_suffixes = json.load(f)
truncated_output_suffixes = [
    tokenizer.convert_tokens_to_string(tokens[:i])
    for tokens in (tokenizer.tokenize(s) for s in output_suffixes)
    for i in range(1, len(tokens))
]
truncated_output_suffixes_512 = [
    tokenizer.convert_tokens_to_string(tokens[:i])
    for tokens in (tokenizer.tokenize(s) for s in output_suffixes[:512])
    for i in range(1, len(tokens))
]

files referenced that do not exist in the repo for the mve

another ex is true_facts.json (did not find an example in the paper that mentioned facts or a .json file)

Where does the suffixes come from in the data folder?

Trying to apply this in a personal project, but confused as to where the output_suffixes are derived from and how you would modify them for use for ex. in a RAG chatbot.

Q: repeng vs system prompt

This sounds like a cool idea! Watching your PR in llama.cpp.
I have just few questions. What is the difference in effect on generation when compared with system prompt? Stronger and/or more precise influence? Speed? Could multiple control vectors be combined? In series or in parallel?
Thank you!

question: how would you go about saving a control vector for later use

Perhaps a naive question, but rather than training a control vector with each run. How might I go about saving it for inference later?

quantized model ? Llama cpp?

Hi,

Reading your articles made me really curious about trying that but I was wondering of it was possible to use HuggingFace's quantized models or even llamacpp or if that required deep changes.

Thanks!

Control Vector Arithmetic

Hi,
First of all, I read your blog post and it was great! Thanks for that.
I had an idea I wanted to see if you tried that or not.
Did you try to control vector arithmetic? e.g. adding jealousy and wrath control vectors and getting a jealous angry llm.