News
Student Lei's AI research paper featured in Nature
Published May 21, 2019
In the world of machine learning algorithms, text was considered relatively safe from adversarial attacks, because, whereas a malicious agent can make minute adjustments to an image or waveform of sound, it can’t alter a word by, say, 1%. But Oden Institute student Qi Lei and her collaborators have investigated a potential threat to text-comprehension AIs.
The research was led by Lei, who studies with Oden Institute Professor Inderjit Dhillon, along with Professor Alex Dimakis of UT’s electrical and computer engineering department and other collaborators at IBM Research and Amazon. The study, "Discrete Attacks and Submodular Optimization with Applications to Text Classification" was published in SysML 2019 and covered by Nature News.
Previous attacks have looked for synonyms of certain words that would leave the text’s meaning unchanged, but could lead a deep-learning algorithm to, say, classify spam as safe, or fake news as real or a negative review as positive.
Testing every synonym for every word would take forever, so the researchers designed an attack that first detects which words the text classifier is relying on most heavily when deciding whether something is malicious. It tries a few synonyms for the most crucial word, determines which one sways the filter’s judgement in the desired (malicious) direction, changes it and moves to the next most important word. The team also did the same for whole sentences.
A previous attack tested by other researchers reduced classifier accuracy from higher than 90% to 23% for news, 38% for e-mail and 29% for Yelp reviews. The latest algorithm reduced filter accuracy even further, to 17%, 31% and 30%, respectively, for the three categories, while replacing many fewer words. The words that filters rely on are not those humans might expect — you can flip their decisions by changing things such as ‘it is’ to ‘it’s’ and ‘those’ to ‘these’.
Making such tricks public is common practice, but it can also be controversial: in February, research lab OpenAI in San Francisco, California, declined to release an algorithm that fabricates realistic articles, for fear it could be abused. But the authors of the SysML paper also show that their adversarial examples can be used as training data for text classifiers, to fortify the classifiers against future ploys.