Backdoor Detection via Eigenvalues, Hessians, Internal Behaviors, and Robust Statistics
Although Deep Neural Networks (DNNs) have achieved impressive performance in several applications, there are several by now well-known sensitivities that they exhibit. Perhaps the most prominent of these is sensitivity in various types of adversarial environments. As an example of this, recall that it is common in practice to outsource the training of a model (which is known as Machine Learning as a Service, MLaaS) or to use third-party pre-trained networks (and then perform fine-tuning or transfer learning). In these cases, it is possible for an adversary (e.g., the MLaaS provider) to change the model architecture (through either polluting training data or direct change of model weights) to implement so-called Trojans and then return a backdoored model to the user.
The research goal of this project is to address these shortcomings by designing mechanisms to understand the overarching effects of Trojans on DNNs' internal behaviors. The researchers will use ideas from robust statistics, scientific computing, and random matrix theory to address these issues. The overall convergence of ideas from these areas (which are often ignored by DNN researchers) provides us with a unique perspective and will allow us to develop effective tools for backdoor detection.
This material is based upon work supportedby the Intelligence Advanced Research Projects Agency (IARPA) and Army Research Office (ARO) under Contract No. W911NF-20-C-0035.