OpenAttack: open-source Python-based textual adversarial attack toolkit

The Tech Platform
Jun 4, 2021
4 min read

OpenAttack is an open-source Python-based textual adversarial attack toolkit, which handles the whole process of textual adversarial attacking, including preprocessing text, accessing the victim model, generating adversarial examples, and evaluation.

Features & Uses

OpenAttack has the following features:

High usability. OpenAttack provides easy-to-use APIs that can support the whole process of textual adversarial attacks;
Full coverage of attack model types. OpenAttack supports sentence-/word-/character-level perturbations and gradient-/score-/decision-based/blind attack models;
Great flexibility and extensibility. You can easily attack a customized victim model or develop and evaluate a customized attack model;
Comprehensive Evaluation. OpenAttack can thoroughly evaluate an attack model from attack effectiveness, adversarial example quality, and attack efficiency.

OpenAttack has a wide range of uses, including:

Providing various handy baselines for attack models;
Comprehensively evaluating attack models using its thorough evaluation metrics;
Assisting in the quick development of new attack models with the help of its common attack components;
Evaluating the robustness of a machine learning model against various adversarial attacks;
Conducting adversarial training to improve the robustness of a machine learning model by enriching the training data with generated adversarial examples.

Toolkit Design

Considering the significant distinctions among different attack models, we leave considerable freedom for the skeleton design of attack models and focus more on streamlining the general processing of adversarial attacking and the common components used in attack models.

OpenAttack has 7 main modules:

TextProcessor: processing the original text sequence so as to assist attack models in generating adversarial examples.
Classifier: wrapping victim classification models
Attacker: involving various attack models
Substitute: packing different word/character substitution methods which are widely used in word- and character-level attack models.
Metric: providing several adversarial example quality metrics which can serve as either the constraints on the adversarial examples during attacking or evaluation metrics for evaluating adversarial attacks.
AttackEval: evaluating textual adversarial attacks from attack effectiveness, adversarial example quality and attack efficiency.
DataManager: managing all the data as well as saved models that are used in other modules

Installation

You can either use pip or clone this repo to install OpenAttack.

1. Using pip (recommended)

pip install OpenAttack

2. Cloning this repo

git clone https://github.com/thunlp/OpenAttack.git cd OpenAttack python setup.py install

After installation, you can try running demo.py to check if OpenAttack works well:

python demo.py

Usage Examples

Basic: Use Built-in Attacks

OpenAttack builds in some commonly used text classification models such as LSTM and BERT as well as datasets such as SST for sentiment analysis and SNLI for natural language inference. You can effortlessly conduct adversarial attacks against the built-in victim models on the datasets.

The following code snippet shows how to use a genetic algorithm-based attack model (Alzantot et al., 2018) to attack BERT on the SST dataset:

import OpenAttack as oa
# choose a trained victim classification model
victim = oa.DataManager.load("Victim.BERT.SST") 
# choose an evaluation dataset 
dataset = oa.DataManager.load("Dataset.SST.sample") 
# choose Genetic as the attacker and initialize it with default parameters
attacker = oa.attackers.GeneticAttacker()  
# prepare for attacking
attack_eval = oa.attack_evals.DefaultAttackEval(attacker, victim) 
# launch attacks and print attack results 
attack_eval.eval(dataset, visualize=True)

Advanced: Attack a Customized Victim Model

The following code snippet shows how to use the genetic algorithm-based attack model to attack a customized sentiment analysis model (a statistical model built in NLTK) on SST.

import OpenAttack as oa
import numpy as np
from nltk.sentiment.vader import SentimentIntensityAnalyzer

# configure access interface of the customized victim model
class MyClassifier(oa.Classifier):     
    def __init__(self):         
        self.model = SentimentIntensityAnalyzer()     
    # access to the classification probability scores with respect 
    input sentences
    def get_prob(self, input_):          
        rt = []         
        for sent in input_:             
            rs = self.model.polarity_scores(sent)             
            prob = rs["pos"] / (rs["neg"] + rs["pos"])             
            rt.append(np.array([1 - prob, prob]))         
        return np.array(rt) 
        
# choose the costomized classifier as the victim model
victim = MyClassifier() 
# choose an evaluation dataset 
dataset = oa.DataManager.load("Dataset.SST.sample") 
# choose Genetic as the attacker and initialize it with default parameters
attacker = oa.attackers.GeneticAttacker() 
# prepare for attacking
attack_eval = oa.attack_evals.DefaultAttackEval(attacker, victim) \
# launch attacks and print attack results 
attack_eval.eval(dataset, visualize=True)

Advanced: Design a Customized Attack Model

OpenAttack incorporates many handy components which can be easily assembled into new attack model.

Advanced: Adversarial Training

OpenAttack can easily generate adversarial examples by attacking instances in the training set, which can be added to original training data set to retrain a more robust victim model, i.e., adversarial training.

Advanced: Design a Customized Evaluation Metric

OpenAttack supports designing a customized adversarial attack evaluation metric.

Attack Models

According to the level of perturbations imposed on original input, textual adversarial attack models can be categorized into sentence-level, word-level, character-level attack models.

According to the accessibility to the victim model, textual adversarial attack models can be categorized into gradient-based, score-based, decision-based and blind attack models.

TAADPapers is a paper list which summarizes almost all the papers concerning textual adversarial attack and defense. You can have a look at this list to find more attack models.

Currently OpenAttack includes 13 typical attack models against text classification models that cover all attack types.

Here is the list of currently involved attack models.

1. Sentence-level

(SEA) Semantically Equivalent Adversarial Rules for Debugging NLP Models. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. ACL 2018. decision [pdf] [code]
(SCPN) Adversarial Example Generation with Syntactically Controlled Paraphrase Networks. Mohit Iyyer, John Wieting, Kevin Gimpel, Luke Zettlemoyer. NAACL-HLT 2018. blind [pdf] [code&data]
(GAN) Generating Natural Adversarial Examples. Zhengli Zhao, Dheeru Dua, Sameer Singh. ICLR 2018. decision [pdf] [code]

2. Word-level

(SememePSO) Word-level Textual Adversarial Attacking as Combinatorial Optimization. Yuan Zang, Fanchao Qi, Chenghao Yang, Zhiyuan Liu, Meng Zhang, Qun Liu and Maosong Sun. ACL 2020. score [pdf] [code]
(TextFooler) Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment. Di Jin, Zhijing Jin, Joey Tianyi Zhou, Peter Szolovits. AAAI-20. score [pdf] [code]
(PWWS) Generating Natural Language Adversarial Examples through Probability Weighted Word Saliency. Shuhuai Ren, Yihe Deng, Kun He, Wanxiang Che. ACL 2019. score [pdf] [code]
(Genetic) Generating Natural Language Adversarial Examples. Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani Srivastava, Kai-Wei Chang. EMNLP 2018. score [pdf] [code]
(FD) Crafting Adversarial Input Sequences For Recurrent Neural Networks. Nicolas Papernot, Patrick McDaniel, Ananthram Swami, Richard Harang. MILCOM 2016. gradient [pdf]

3. Word/Char-level

(UAT) Universal Adversarial Triggers for Attacking and Analyzing NLP. Eric Wallace, Shi Feng, Nikhil Kandpal, Matt Gardner, Sameer Singh. EMNLP-IJCNLP 2019. gradient [pdf] [code] [website]
(TextBugger) TEXTBUGGER: Generating Adversarial Text Against Real-world Applications. Jinfeng Li, Shouling Ji, Tianyu Du, Bo Li, Ting Wang. NDSS 2019. gradient score [pdf]
(HotFlip) HotFlip: White-Box Adversarial Examples for Text Classification. Javid Ebrahimi, Anyi Rao, Daniel Lowd, Dejing Dou. ACL 2018. gradient [pdf] [code]

4. Char-level

(VIPER) Text Processing Like Humans Do: Visually Attacking and Shielding NLP Systems. Steffen Eger, Gözde Gül ¸Sahin, Andreas Rücklé, Ji-Ung Lee, Claudia Schulz, Mohsen Mesgar, Krishnkant Swarnkar, Edwin Simpson, Iryna Gurevych. NAACL-HLT 2019. score [pdf] [code&data]
(DeepWordBug) Black-box Generation of Adversarial Text Sequences to Evade Deep Learning Classifiers. Ji Gao, Jack Lanchantin, Mary Lou Soffa, Yanjun Qi. IEEE SPW 2018. score [pdf] [code]