Hacking Deep Learning

♦Attacking deep networks by adversarial examples ♦Differential privacy in machine learning ♦Forensics

January 29, 2018


Organizers: Joseph Keshet and Benny Pinkas, Department of Computer Science, Bar- Ilan University

Where: Auditorium C50, Nanotechnology Building (bldg. 206), Bar- Ilan University

Click here to view photo gallery


08:30 – 09:20   Gathering

09:20 – 09:30   Opening remarks

09:30 – 10:15   Security and Privacy in Machine Learning (video)

                          Nicolas Papernot, Penn State University

10:15 – 11:00   Deep Learning in the Land of Adversity: Attacks, Defenses and Beyond (video)

                          Moustapha Cisse, Facebook AI Research

11:00 – 11:30     Coffee break

11:30 – 12:15    What is in the Human Voice? Profiling Humans from Their Voice (video) 

                          Rita Singh, Carnegie Mellon University

12:15 – 13:45   Lunch

13:45 – 14:30   Listening Without Hearing: Processin Speech with Privacy

                          Bhiksha Raj, Carnegie Mellon University

14:30 – 15:15    Machine Learning and Privacy: Friends or Foes?

                          Vitaly Shmatikov, Cornell Tech

15:15 – 15:45   Coffee break

15:45 – 16:30   Differential Privacy and Collaborative Learning (video)

                          Anand Sarwate, Rutgers University

16:30 – 17:15   DeepCAPTCHA — Protection Mechanisms Based on Adversarial Examples (video)

                          Rita Osadchy, University of Haifa

17:15 – 17:30  Closing remarks



Security and Privacy in Machine Learning

Nicolas Papernot, Penn State University

There is growing recognition that machine learning exposes new security and privacy issues in software systems. In this talk, we first articulate a comprehensive threat model for machine learning, then present an attack against model prediction integrity, and finally discuss a framework for learning privately.

Machine learning models were shown to be vulnerable to adversarial examples–subtly modified malicious inputs crafted to compromise the integrity of their outputs. Furthermore, adversarial examples that affect one model often affect another model, even if the two models have different architectures, so long as both models were trained to perform the same task. An attacker may therefore conduct an attack with very little information about the victim by training their own substitute model to craft adversarial examples, and then transferring them to a victim model. The attacker need not even collect a training set to mount the attack. Indeed, we demonstrate how adversaries may use the victim model as an oracle to label a synthetic training set for the substitute. We conclude this first part of the talk by formally showing that there are (possibly unavoidable) tensions between model complexity, accuracy, and resilience that must be calibrated for the environments in which they will be used.

In addition, some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information. To address this problem, we demonstrate a generally applicable approach to providing strong privacy guarantees for training data. The approach combines, in a black-box fashion, multiple models trained with disjoint datasets, such as records from different subsets of users. Because they rely directly on sensitive data, these models are not published, but instead used as “teachers” for a “student” model. The student learns to predict an output chosen by noisy voting among all of the teachers, and cannot directly access an individual teacher or the underlying data or parameters. The student’s privacy properties can be understood both intuitively (since no single teacher and thus no single dataset dictates the student’s training) and formally, in terms of differential privacy.


What is in the Human Voice? Profiling Humans from Their Voice  

Rita Singh, Carnegie Mellon University

Voice-based crimes such as harassment, threats, ransom demands in real and virtual kidnappings, hoax calls to law enforcement agenciesreporting bombs and life-threatening emergencies etc., are on an unprecedented rise globally. In this talk I will introduce some of my recent research on “profiling” humans from their voice, which seeks to deduce and describe the speaker’s entire persona and their surroundings from voice evidence. I will describe how the human voice can be a powerful indicator of identity — how, in some ways, voice is more valuable than DNA and fingerprints as forensic evidence, since it not only carries information about the speaker, but also about their current state and their surroundings. Forensic profiling from voice is emerging as an area of computer science that is a confluence of several fields of research including AI, signal processing, pattern recognition, machine learning, statistics, biology, sychology, psychoacoustics, sociology, etc. – and even the performing arts. This technology has helped investigate more than two hundred instances of federal crimes including hoax calls and child abuse in the last one year alone


Listening Without Hearing: Processin Speech with Privacy

Bhiksha Raj, Carnegie Mellon University

Speech is one of the most private forms of communication. People do not like to be eavesdropped on. They will frequently even object to being recorded; in fact in many places it is illegal to record people speaking in public, even when it is acceptable to capture their images on video. Yet, when a person uses a speech-based service such as SIRI, they must grant the service complete access to their voice recordings, implicitly trusting that the service will not abuse the recordings, to identify, track, or even impersonate the user.

Privacy concerns also arise in other situations. For instance, a doctor cannot just transmit a dictated medical record to a generic voice-recognition service for fear of violating HIPAA requirements; the service provider requires various clearances first. Surveillance agencies must have access to all recordings by all callers on a telephone line, just to determine if a specific person of interest has spoken over that line. Thus, in searching for Jack Terrorist, they also end up being able to listen to and thereby violate the privacy of John and Jane Doe.

In this talk we will briefly discuss two *privacy-preserving* paradigms that enable voice-based services to be performed securely. The goal is to enable the performance of voice-processing tasks while ensuring that no party, including the user, the system, or a snooper, can derive unintended information from the transaction.

In the first paradigm,  conventional voice-processing algorithms are rendered secure by employing cryptographic tools and interactive “secure multi-party conputation” mechanisms to ensure that no undesired information is leaked by any party.  In this paradigm the accuracy of the basic voice-processing algorithm remains essentially unchanged with respect to the non-private version; however the privacy requirements  introduce large computational and communication overhead. Moreover assumptions must be made about the honesty of the parties.

The second paradigm, which applies specifically to the problem of voice *authentication* with privacy, converts the problem of matching voice patterns to a string-comparison operation. Using a combination of appropriate data representation and locality sensitive hashing schemes, both the data to be matched and the patterns they must match are converted to bit strings, and pattern classification is performed by counting exact matches. The computational overhead of this string-comparison framework is minimal, and no assumptions need be made about the honesty of the participants. However, this comes at the price of restrictions on the classification tasks that may be performed and the classification mechanisms that may be employed.

Finally we discuss how the proposed solution facilitiates private machine learning on the cloud in general. We also discuss additional benefits, whereby the actual hashing functions describe kernels, that actually enable *fast*, private, scalable computation.


Machine Learning and Privacy: Friends or Foes?

Vitaly Shmatikov, Cornell Tech

Machine learning is setting the world on fire, but what does this imply for the privacy of the data used to train ML models?  I will talk about ML models that leak their training data, how to extract data from models trained using ML-as-a-service, and what it might mean for ML to preserve data privacy.


Differential Privacy and Collaborative Learning

Anand Sarwate, Rutgers University

Differential privacy has emerged as one of the de-facto standards for measuring privacy risk when performing computations on sensitive data and disseminating the results. Algorithms that guarantee differential privacy are randomized, which causes a loss in performance, or utility. Managing the privacy-utility tradeoff becomes easier with more data. Many machine learning algorithms can be made differentially private through the judicious introduction of randomization, usually through noise, within the computation. In this talk I will give an introduction to differential privacy, basic mechanisms for making machine learning algorithms differentially private, privacy accounting, and some ongoing work on designing systems for collaborative research in neuroimaging.


DeepCAPTCHA — Protection Mechanisms Based on Adversarial Examples

Rita Osadchy, University of Haifa

Recent work within the machine learning community has identifyed the existence of Adversarial Examples — specially crafted inputs that cause misclassification. The most prominent approach to creating such inputs is

adding a small adversarial perturbation to the legitimate input. The adversarial examples are best known for images, but other domains have been shown to be vulnerable to adversarial inputs as well. Among them, speech, text, malware detection, etc. The abundance of adversarial examples and the simplicity of their creation pose a a real security threat for AI systems. However, adversarial examples can also be used in protection mechanisms.

We propose to use adversarial examples for CAPTCHA  generation, as adversarial examples do not affect human recognition, but are very challenging to AI tools. We analysed several popular algorithms for adversarial noise generation and we found that their robustness is insufficient to achieve secure CAPTCHA schemes. To this end , we  introduced immutable adversarial noise — an adversarial noise that is resistant to removal attempts and used it for CAPTCHA generation.  We implemented a proof of concept system, deepCAPTCHA, and its analysis showed that the scheme offers high security and good usability compared with the best previously existing CAPTCHAs.


Posted by