♦Attacking deep networks by adversarial examples ♦Differential privacy in machine learning ♦Forensics
January 29, 2018
Organizers: Joseph Keshet and Benny Pinkas, Department of Computer Science, Bar- Ilan University
Where: Auditorium C50, Nanotechnology Building (bldg. 206), Bar- Ilan University
Schedule
08:30 – 09:20 Gathering
09:20 – 09:30 Opening remarks
09:30 – 10:15 Security and Privacy in Machine Learning (video)
Nicolas Papernot, Penn State University
10:15 – 11:00 Deep Learning in the Land of Adversity: Attacks, Defenses and Beyond (video)
Moustapha Cisse, Facebook AI Research
11:00 – 11:30 Coffee break
11:30 – 12:15 What is in the Human Voice? Profiling Humans from Their Voice (video)
Rita Singh, Carnegie Mellon University
12:15 – 13:45 Lunch
13:45 – 14:30 Listening Without Hearing: Processin Speech with Privacy
Bhiksha Raj, Carnegie Mellon University
14:30 – 15:15 Machine Learning and Privacy: Friends or Foes?
Vitaly Shmatikov, Cornell Tech
15:15 – 15:45 Coffee break
15:45 – 16:30 Differential Privacy and Collaborative Learning (video)
Anand Sarwate, Rutgers University
16:30 – 17:15 DeepCAPTCHA — Protection Mechanisms Based on Adversarial Examples (video)
Rita Osadchy, University of Haifa
17:15 – 17:30 Closing remarks
Abstracts
Security and Privacy in Machine Learning
Nicolas Papernot, Penn State University
There is growing recognition that machine learning exposes new security and privacy issues in software systems. In this talk, we first articulate a comprehensive threat model for machine learning, then present an attack against model prediction integrity, and finally discuss a framework for learning privately.
Machine learning models were shown to be vulnerable to adversarial examples–subtly modified malicious inputs crafted to compromise the integrity of their outputs. Furthermore, adversarial examples that affect one model often affect another model, even if the two models have different architectures, so long as both models were trained to perform the same task. An attacker may therefore conduct an attack with very little information about the victim by training their own substitute model to craft adversarial examples, and then transferring them to a victim model. The attacker need not even collect a training set to mount the attack. Indeed, we demonstrate how adversaries may use the victim model as an oracle to label a synthetic training set for the substitute. We conclude this first part of the talk by formally showing that there are (possibly unavoidable) tensions between model complexity, accuracy, and resilience that must be calibrated for the environments in which they will be used.
In addition, some machine learning applications involve training data that is sensitive, such as the medical histories of patients in a clinical trial. A model may inadvertently and implicitly store some of its training data; careful analysis of the model may therefore reveal sensitive information. To address this problem, we demonstrate a generally applicable approach to providing strong privacy guarantees for training data. The approach combines, in a black-box fashion, multiple models trained with disjoint datasets, such as records from different subsets of users. Because they rely directly on sensitive data, these models are not published, but instead used as “teachers” for a “student” model. The student learns to predict an output chosen by noisy voting among all of the teachers, and cannot directly access an individual teacher or the underlying data or parameters. The student’s privacy properties can be understood both intuitively (since no single teacher and thus no single dataset dictates the student’s training) and formally, in terms of differential privacy.
What is in the Human Voice? Profiling Humans from Their Voice
Rita Singh, Carnegie Mellon University
Voice-based crimes such as harassment, threats, ransom demands in real and virtual kidnappings, hoax calls to law enforcement agenciesreporting bombs and life-threatening emergencies etc., are on an unprecedented rise globally. In this talk I will introduce some of my recent research on “profiling” humans from their voice, which seeks to deduce and describe the speaker’s entire persona and their surroundings from voice evidence. I will describe how the human voice can be a powerful indicator of identity — how, in some ways, voice is more valuable than DNA and fingerprints as forensic evidence, since it not only carries information about the speaker, but also about their current state and their surroundings. Forensic profiling from voice is emerging as an area of computer science that is a confluence of several fields of research including AI, signal processing, pattern recognition, machine learning, statistics, biology, sychology, psychoacoustics, sociology, etc. – and even the performing arts. This technology has helped investigate more than two hundred instances of federal crimes including hoax calls and child abuse in the last one year alone
Listening Without Hearing: Processin Speech with Privacy
Bhiksha Raj, Carnegie Mellon University
Speech is one of the most private forms of communication. People do not like to be eavesdropped on. They will frequently even object to being recorded; in fact in many places it is illegal to record people speaking in public, even when it is acceptable to capture their images on video. Yet, when a person uses a speech-based service such as SIRI, they must grant the service complete access to their voice recordings, implicitly trusting that the service will not abuse the recordings, to identify, track, or even impersonate the user.
Privacy concerns also arise in other situations. For instance, a doctor cannot just transmit a dictated medical record to a generic voice-recognition service for fear of violating HIPAA requirements; the service provider requires various clearances first. Surveillance agencies must have access to all recordings by all callers on a telephone line, just to determine if a specific person of interest has spoken over that line. Thus, in searching for Jack Terrorist, they also end up being able to listen to and thereby violate the privacy of John and Jane Doe.
In this talk we will briefly discuss two *privacy-preserving* paradigms that enable voice-based services to be performed securely. The goal is to enable the performance of voice-processing tasks while ensuring that no party, including the user, the system, or a snooper, can derive unintended information from the transaction.
In the first paradigm, conventional voice-processing algorithms are rendered secure by employing cryptographic tools and interactive “secure multi-party conputation” mechanisms to ensure that no undesired information is leaked by any party. In this paradigm the accuracy of the basic voice-processing algorithm remains essentially unchanged with respect to the non-private version; however the privacy requirements introduce large computational and communication overhead. Moreover assumptions must be made about the honesty of the parties.
The second paradigm, which applies specifically to the problem of voice *authentication* with privacy, converts the problem of matching voice patterns to a string-comparison operation. Using a combination of appropriate data representation and locality sensitive hashing schemes, both the data to be matched and the patterns they must match are converted to bit strings, and pattern classification is performed by counting exact matches. The computational overhead of this string-comparison framework is minimal, and no assumptions need be made about the honesty of the participants. However, this comes at the price of restrictions on the classification tasks that may be performed and the classification mechanisms that may be employed.
Finally we discuss how the proposed solution facilitiates private machine learning on the cloud in general. We also discuss additional benefits, whereby the actual hashing functions describe kernels, that actually enable *fast*, private, scalable computation.
Machine Learning and Privacy: Friends or Foes?
Vitaly Shmatikov, Cornell Tech
Machine learning is setting the world on fire, but what does this imply for the privacy of the data used to train ML models? I will talk about ML models that leak their training data, how to extract data from models trained using ML-as-a-service, and what it might mean for ML to preserve data privacy.
Differential Privacy and Collaborative Learning
Anand Sarwate, Rutgers University
Differential privacy has emerged as one of the de-facto standards for measuring privacy risk when performing computations on sensitive data and disseminating the results. Algorithms that guarantee differential privacy are randomized, which causes a loss in performance, or utility. Managing the privacy-utility tradeoff becomes easier with more data. Many machine learning algorithms can be made differentially private through the judicious introduction of randomization, usually through noise, within the computation. In this talk I will give an introduction to differential privacy, basic mechanisms for making machine learning algorithms differentially private, privacy accounting, and some ongoing work on designing systems for collaborative research in neuroimaging.
DeepCAPTCHA — Protection Mechanisms Based on Adversarial Examples
Rita Osadchy, University of Haifa
Recent work within the machine learning community has identifyed the existence of Adversarial Examples — specially crafted inputs that cause misclassification. The most prominent approach to creating such inputs is
adding a small adversarial perturbation to the legitimate input. The adversarial examples are best known for images, but other domains have been shown to be vulnerable to adversarial inputs as well. Among them, speech, text, malware detection, etc. The abundance of adversarial examples and the simplicity of their creation pose a a real security threat for AI systems. However, adversarial examples can also be used in protection mechanisms.
We propose to use adversarial examples for CAPTCHA generation, as adversarial examples do not affect human recognition, but are very challenging to AI tools. We analysed several popular algorithms for adversarial noise generation and we found that their robustness is insufficient to achieve secure CAPTCHA schemes. To this end , we introduced immutable adversarial noise — an adversarial noise that is resistant to removal attempts and used it for CAPTCHA generation. We implemented a proof of concept system, deepCAPTCHA, and its analysis showed that the scheme offers high security and good usability compared with the best previously existing CAPTCHAs.
Posted by