eric-wallace / universal-triggers Goto Github PK

View Code? Open in Web Editor NEW

287.0 287.0 55.0 31 KB

Universal Adversarial Triggers for Attacking and Analyzing NLP (EMNLP 2019)

License: MIT License

Python 100.00%

universal-triggers's People

Contributors

Stargazers

Watchers

universal-triggers's Issues

Sentiment word filtering for sst attack

Hi Eric, I'm looking at your paper for a school project and was wondering if you had the code to filter positive/negative sentiment words for the triggers generated during the sst attack. Any help would be appreciated!

python create_adv_token.py got an error

Traceback (most recent call last):
File "/universal-triggers/gpt2/create_adv_token.py", line 10, in
import utils
File "/universal-triggers/gpt2/utils.py", line 8, in
from allennlp.data.iterators import BucketIterator
File "/usr/local/lib/python3.8/dist-packages/allennlp/data/init.py", line 1, in
from allennlp.data.dataset_readers.dataset_reader import DatasetReader
File "/usr/local/lib/python3.8/dist-packages/allennlp/data/dataset_readers/init.py", line 10, in
from allennlp.data.dataset_readers.ccgbank import CcgBankDatasetReader
File "/usr/local/lib/python3.8/dist-packages/allennlp/data/dataset_readers/ccgbank.py", line 9, in
from allennlp.data.dataset_readers.dataset_reader import DatasetReader
File "/usr/local/lib/python3.8/dist-packages/allennlp/data/dataset_readers/dataset_reader.py", line 8, in
from allennlp.data.instance import Instance
File "/usr/local/lib/python3.8/dist-packages/allennlp/data/instance.py", line 3, in
from allennlp.data.fields.field import DataArray, Field
File "/usr/local/lib/python3.8/dist-packages/allennlp/data/fields/init.py", line 7, in
from allennlp.data.fields.array_field import ArrayField
File "/usr/local/lib/python3.8/dist-packages/allennlp/data/fields/array_field.py", line 10, in
class ArrayField(Field[numpy.ndarray]):
File "/usr/local/lib/python3.8/dist-packages/allennlp/data/fields/array_field.py", line 51, in ArrayField
def empty_field(self): # pylint: disable=no-self-use
File "/usr/local/lib/python3.8/dist-packages/overrides/overrides.py", line 88, in overrides
return _overrides(method, check_signature, check_at_runtime)
File "/usr/local/lib/python3.8/dist-packages/overrides/overrides.py", line 114, in _overrides
_validate_method(method, super_class, check_signature)
File "/usr/local/lib/python3.8/dist-packages/overrides/overrides.py", line 135, in _validate_method
ensure_signature_is_compatible(super_method, method, is_static)
File "/usr/local/lib/python3.8/dist-packages/overrides/signature.py", line 93, in ensure_signature_is_compatible
ensure_return_type_compatibility(super_type_hints, sub_type_hints, method_name)
File "/usr/local/lib/python3.8/dist-packages/overrides/signature.py", line 287, in ensure_return_type_compatibility
raise TypeError(
TypeError: ArrayField.empty_field: return type None is not a <class 'allennlp.data.fields.field.Field'>.

Loss thresholds for successful triggers on language models?

Hi Eric! Thanks for sharing this work. I've implemented this in Tensorflow to use with a dupe of the 124M GPT-2 model and was wondering if you could provide some details on the range of final "best loss" #s you were seeing with the smallest model and the triggers which worked (I'm working under the assumption that on a vocab size of 50k that cross entropy of ~10.8 ish would be equivalent to "random"). My current process isn't producing triggers which are successfully adversarial and I'm wondering if perhaps I'm just not finding very good triggers. Thanks!

About License

Hello, I'd like to ask what your code License is.

Any ELMO code examples?

Hi,
would appreciate code examples to use character embeddings of ELMO for SST -LSTM or SNLI -DA

BERT Model for SST Attack

Hi Eric, I'm looking at your paper for a school project and was wondering if you had any tips for adapting the code to generate triggers for a BERT model during the sst attack. Any help would be appreciated!

`python sst.py` got an error

Traceback (most recent call last):
File "sst.py", line 184, in
main()
File "sst.py", line 155, in main
averaged_grad = utils.get_average_grad(model, batch, trigger_token_ids)
File "../utils.py", line 84, in get_average_grad
loss.backward()
File "/anaconda3/envs/triggers/lib/python3.6/site-packages/torch/tensor.py", line 102, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/anaconda3/envs/triggers/lib/python3.6/site-packages/torch/autograd/init.py", line 90, in backward
allow_unreachable=True) # allow_unreachable flag
RuntimeError: cudnn RNN backward can only be called in training mode

Questions regarding sentiment words

Hello Eric, in your paper you wrote that a sentiment word list is used to filter the words to substitute, but I could not find related code in the repository. Maybe the reason is that it is not important in generating the trigger words?

Error when running the squad script with up-to-date libraries

When I run pip install -r requirements.txt with the default requirements.txt file, I get the following error:
Traceback (most recent call last): File "squad/squad.py", line 2, in <module> from allennlp.data.dataset_readers.reading_comprehension.squad import SquadReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/__init__.py", line 1, in <module> from allennlp.data.dataset_readers.dataset_reader import DatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/__init__.py", line 10, in <module> from allennlp.data.dataset_readers.ccgbank import CcgBankDatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/ccgbank.py", line 9, in <module> from allennlp.data.dataset_readers.dataset_reader import DatasetReader File "/opt/conda/lib/python3.6/site-packages/allennlp/data/dataset_readers/dataset_reader.py", line 8, in <module> from allennlp.data.instance import Instance File "/opt/conda/lib/python3.6/site-packages/allennlp/data/instance.py", line 3, in <module> from allennlp.data.fields.field import DataArray, Field File "/opt/conda/lib/python3.6/site-packages/allennlp/data/fields/__init__.py", line 10, in <module> from allennlp.data.fields.knowledge_graph_field import KnowledgeGraphField File "/opt/conda/lib/python3.6/site-packages/allennlp/data/fields/knowledge_graph_field.py", line 14, in <module> from allennlp.data.token_indexers.token_indexer import TokenIndexer, TokenType File "/opt/conda/lib/python3.6/site-packages/allennlp/data/token_indexers/__init__.py", line 5, in <module> from allennlp.data.token_indexers.dep_label_indexer import DepLabelIndexer File "/opt/conda/lib/python3.6/site-packages/allennlp/data/token_indexers/dep_label_indexer.py", line 9, in <module> from allennlp.data.tokenizers.token import Token File "/opt/conda/lib/python3.6/site-packages/allennlp/data/tokenizers/__init__.py", line 8, in <module> from allennlp.data.tokenizers.pretrained_transformer_tokenizer import PretrainedTransformerTokenizer File "/opt/conda/lib/python3.6/site-packages/allennlp/data/tokenizers/pretrained_transformer_tokenizer.py", line 5, in <module> from pytorch_transformers.tokenization_auto import AutoTokenizer ModuleNotFoundError: No module named 'pytorch_transformers.tokenization_auto'

So I install the latest version of allennlp with pip install allennlp --upgrade and get the following versions:
allennlp-0.9.0
pytorch-transformers-1.1.0

However, I still can't run the squad script because I get the following error:
Traceback (most recent call last): File "squad.py", line 118, in <module> main() File "squad.py", line 19, in main model.eval().cuda() File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 265, in cuda return self._apply(lambda t: t.cuda(device)) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 193, in _apply module._apply(fn) File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 127, in _apply self.flatten_parameters() File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/rnn.py", line 123, in flatten_parameters self.batch_first, bool(self.bidirectional)) RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

I am running the code on an AWS instance with CUDA and Python 3.6.9. I would appreciate it if you could help me figure out what I'm doing wrong.

hotflip attack implementation

Hi Thanks for sharing the code. I have a quick question about the implementation of hotflip_attack (equation 2 of paper) (

universal-triggers/attacks.py

Line 8 in 2e4bc93

def hotflip_attack(averaged_grad, embedding_matrix, trigger_token_ids,

) in the equation embedding of vocabulary is subtracted from embedding of adversary (e_i-e_adv), but I don't see this in the implementation, I really appreciate your response!

get_best_candidates in snli.py

Hi,

In snli.py file, I'm wondering why get_best_candidates() function extracts the best candidate with the largest loss but at the same time, the model extracts candidates that can minimize model's loss (hotflip_attack() function with increase_loss=False)? Shouldn't the best candidate also minimize the loss? Thanks!

Comparing model accuracies for SNLI

When comparing the accuracy of esim-glove-snli with decomposable-attention, it seems like the the accuracy computed for the second model is not computed correctly.

Am I missing something here?

Here's a Colab.

# Load model and vocab
model = load_archive('https://allennlp.s3-us-west-2.amazonaws.com/models/esim-glove-snli- 
2019.04.23.tar.gz').model
model.eval().cuda()
vocab = model.vocab

# Get original accuracy before adding universal triggers
utils.get_accuracy(model, subset_dev_dataset, vocab, trigger_token_ids=None, snli=True)

100%|██████████| 53676932/53676932 [00:00<00:00, 62451888.15B/s]
Without Triggers: 0.9095824571943527

# Load model and vocab
model2 = load_archive('https://s3-us-west-2.amazonaws.com/allennlp/models/decomposable-attention-2017.09.04.tar.gz').model
model2.eval().cuda()
vocab2 = model2.vocab

# Get original accuracy before adding universal triggers
utils.get_accuracy(model2, subset_dev_dataset, vocab2, trigger_token_ids=None, snli=True)

100%|██████████| 38907176/38907176 [00:00<00:00, 77210263.22B/s]
Did not use initialization regex that was passed: .*token_embedder_tokens\\\\._projection.*weight
Without Triggers: 0.42805647341544006

sign from nearest neighbor attack?

Hi Wallace, thanks for sharing this work! As I am trying the different token replace strategies in the atttack.py I find a little hard to understand why for increase_loss = False, we apply, e(t+1)=e(t)+g (gradient ascent), while for increase_loss = True, we apply, e(t+1)=e(t)-g. In my understanding, the gradient descent tries to minimize the loss function? And I also try to flipping the sign the original code which gets better attack accuracy compared to the source code? Thanks!

how to slove the question?thank you

when I run "python sst.py",this error always exist.So,please tell me how to slove this error.

error:
" from pytorch_transformers.tokenization_auto import AutoTokenizer
ModuleNotFoundError: No module named 'pytorch_transformers.tokenization_auto' "

thank you @Eric-Wallace

A little confused about the training process

For snli task, if i sample the subset of which the labels are 'entailment' and attack toward the same class(increase_loss=Flase), the accuracy should get higher right? cause we minimize the loss between the input and the target 'entailment', which is the true label.

That wasn't the case though, the accuracy drops.

getting best candidates (beam search)

Apologies for multiple questions, in this function (

universal-triggers/utils.py

Line 119 in 2e4bc93

 def get_best_candidates(model, batch, trigger_token_ids, cand_trigger_token_ids, snli=False, beam_size=1): 

)
maybe I am missing something but shouldn't this line (

universal-triggers/utils.py

Line 138 in 2e4bc93

 top_candidates = heapq.nlargest(beam_size, loss_per_candidate, key=itemgetter(1)) 

) go inside the first for loop? thanks

Question about increase_loss parameter in sst task

Hello
I have a small question about sst task code. Should the increase_loss parameter be set to False when dataset_label_filter = "1"?

Generating a common trigger for all classes

Hi Eric,

I notice that you generated triggers separately for each class (e.g., generate a trigger for positive examples, and generate another one for negative examples). Have you tried generating a common trigger for all different classes (just set the target label to a different label than the original label)? Any chance it could work well?

How are “target_texts” generated?

Thank you for your wonderful work!
I'd like to ask How are “target_texts” generated, in "Conditional Text Generation" ?
Is there a strategy in this step?

Increase loss query in hot flip attack

In the attacks.py file, line number 30 (hot flip attack), according to the commented part of the code (line 19), shouldn't it be multiplying the gradient dot embedding matrix with -1 when increase loss is True?
hence line 30 should be [ **if increase_loss :** ] ?

accuracy not dropping but trigger keeps changing

Hi there, a little background on my project. I am currently doing a benign/malware app classifier based on API sequences, which can be quite similar to text classification (positive/negative).

I am running the code based on sst.py. To prepare my dataset, I followed the allennlp to create instances for train and dev data. Everything seems fine when I run the main() function, the training is done but when it comes to the trigger part, the "words" seems to be changing but accuracy is not dropping. Do you have any idea why is this happening? The same behaviour can be seen with the different attacks (e.g hotflip, nearest_neighbor_grad etc..)

Without Triggers: 0.9994070560332049
Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, landroid/content/context;->sendbroadcast, : 0.9997035280166024
Current Triggers: landroid/content/context;->sendbroadcast, landroid/os/bundle;->putsparseparcelablearray, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/app/activity;->databaselist, landroid/os/bundle;->putsparseparcelablearray, : 0.9997035280166024
Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, landroid/content/intent;->replaceextras, : 0.9997035280166024
Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/media/audiomanager;->adjuststreamvolume, landroid/content/res/assetmanager;->opennonassetfdnative, landroid/content/res/assetmanager;->opennonassetfdnative, : 0.9997035280166024
Current Triggers: landroid/content/context;->sendbroadcast, ljava/lang/runtime;->runfinalization, ljava/lang/runtime;->runfinalization, ljava/lang/runtime;->runfinalization, lorg/apache/cordova/directorymanager;->gettempdirectorypath, landroid/hardware/sensormanager;->getsensorlist, ljava/lang/runtime;->runfinalization, : 0.9997035280166024
Current Triggers: landroid/content/context;->sendbroadcast, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/content/intent;->getcomponent, landroid/os/environment;->isexternalstorageemulated, landroid/content/intent;->getcomponent, : 0.9997035280166024
Current Triggers: landroid/app/activity;->databaselist, landroid/os/bundle;->getparcelablearraylist, landroid/os/bundle;->getparcelablearraylist, landroid/os/bundle;->getparcelablearraylist, landroid/accounts/accountmanager;->getauthtoken, landroid/location/locationmanager;->removeproximityalert, landroid/os/bundle;->getparcelablearraylist, : 0.9997035280166024
Current Triggers: landroid/content/intent;->replaceextras, landroid/net/uri;->getfragment, landroid/net/uri;->getfragment, landroid/app/activitymanager;->killbackgroundprocesses, landroid/content/clipboardmanager;->getservice, landroid/os/bundle;->putparcelablearraylist, ljava/lang/system;->setsecuritymanager, : 0.9997035280166024
Current Triggers: landroid/content/res/assetmanager;->opennonassetfdnative, ljava/net/urlconnection;->getfilenamemap, ljava/net/urlconnection;->getfilenamemap, lorg/apache/xerces/impl/xmlentitymanager;->isentitydeclinexternalsubset, landroid/content/clipboardmanager;->reportprimaryclipchanged, landroid/app/fragmentmanager;->begintransaction, landroid/net/uri;->getencodedpath, : 0.9997035280166024
Current Triggers: ljava/lang/runtime;->runfinalization, landroid/net/uri;->tostring, lorg/apache/xerces/impl/xmlentitymanager;->closereaders, landroid/hardware/camera;->cancelautofocus, landroid/app/activitymanager;->getlocktaskmodestate, landroid/webkit/cookiesyncmanager;->resetsync, ljava/net/urlconnection;->getdooutput, : 0.9997035280166024
Current Triggers: landroid/content/intent;->getcomponent, landroid/bluetooth/rfcommsocket;->waitforasyncconnectnative, landroid/app/activity;->finalize, landroid/hardware/sensor;->getreportingmode, landroid/content/intent;->setdataandtype, landroid/hardware/camera;->startsmoothzoom, lorg/apache/cordova/file/directorymanager;->getfreediskspace, : 0.9997035280166024

eric-wallace / universal-triggers Goto Github PK

universal-triggers's People

Contributors

Stargazers

Watchers

Forkers

universal-triggers's Issues

Recommend Projects

Recommend Topics

Recommend Org