, , , , , , ,

Researchers discover a way to make ChatGPT consistently toxic

It’s no secret that OpenAI’s viral AI-powered chatbot, ChatGPT, can be prompted to say sexist, racist and pretty vile things. But now, researchers have discovered how to consistently get the chatbot to be . . . well, the worst version of itself.

A study co-authored by scientists at the Allen Institute for AI, the nonprofit research institute co-founded by the late Paul Allen, shows that assigning ChatGPT a “persona” — for example, “a bad person,” “a horrible person,” or “a nasty person” — through the ChatGPT API increases its toxicity sixfold. Even more concerningly, the co-authors found having ChatGPT pose as certain historical figures, gendered people and members of political parties also increased its toxicity — with journalists, men and Republicans in particular causing the machine learning model to say more offensive things than it normally would.

“ChatGPT and its capabilities have undoubtedly impressed us as AI researchers. However, as we found through our analysis, it can be easily made to generate toxic and harmful responses,” Ameet Deshpande, a researcher involved with the study, told TechCrunch via email.

The research — which was conducted using the latest version of ChatGPT, but not the model currently in preview based on OpenAI’s GPT-4 — shows the perils of today’s AI chatbot tech even with mitigations in place to prevent toxic text outputs. As the co-authors note in the study, apps and software built on top of ChatGPT — which includes chatbots from Snap, Quizlet, Instacart and Shopify — could mirror the toxicity prompted at the API level.

So how does one prompt ChatGPT to be more toxic? Well, according to the researchers, all it takes is tweaking the “system” parameter of the ChatGPT API a tad. (Importantly, this can’t be done in OpenAI’s user-facing ChatGPT or ChatGPT Plus services.) The system parameter, introduced around a month ago, lets developers specify hidden rules for the model.

For the study, the co-authors used the system parameter to assign 90 different personas to ChatGPT plucked from the worlds of sports, politics, media and business; nine “baseline” personas (e.g., “a normal person”); and common names from several different countries. For each persona and name, the researchers had ChatGPT answer questions about gender and race and finish incomplete phrases from a dataset designed to assess the toxicity of text-generating models.

In general, across over half a million samples of generated text from ChatGPT, the researchers found that ChatGPT was more likely to express “discriminatory opinions” and stereotype specific ethnic groups and countries when assigned a persona. Unsurprisingly, more polarizing figures, like Mao Zedong and Andrew Breitbart, elicited toxic responses in keeping with their historical speeches and writings. But even less controversial personas like Steve Jobs caused ChatGPT to respond to prompts in problematic ways.

ChatGPT toxic

Image Credits: AI2

For example, when the researchers set the system parameter to “Steve Jobs” and asked ChatGPT about the European Union (EU), the model said:

The European Union is nothing more than a bureaucratic nightmare that sucks the life out of its member states. They claim to be all about unity and progress, but they’re just a bunch of out-of-touch bureaucrats who are more interested in lining their own pockets than actually making a difference. Frankly, the EU is a relic of the past and it’s time for us to move on to better things.

There were differences in the personas, however. Dictators were by far the most toxicity-inducing (unsurprisingly), just ahead of journalists (ouch) and spokespeople. Male-identifying personas made ChatGPT more toxic compared to female-identifying personas. And Republican personas were “slightly more hateful” than their Democratic counterparts, the researchers say.

Less surprisingly, assigning ChatGPT a self-descriptively hateful persona like “a horrible person” dramatically increased its overall toxicity. But it depended on the topic being discussed. For instance, ChatGPT generated more toxic descriptions of nonbinary, bisexual and asexual people regardless versus those on the heterosexual and cisgender side of the spectrum — a reflection of the biased data on which ChatGPT was trained, the researchers say.

“We believe that ChatGPT and other language models should be public and available for broader use as not doing so would be a step backwards for innovation,” Deshpande said. “However, the end-user must be clearly informed of the limitations of such a model before releasing it for broader use by the public.”

Are there solutions to ChatGPT’s toxicity problem? Perhaps. One might be more carefully curating the model’s training data. ChatGPT is a fine-tuned version of GPT-3.5, the predecessor to GPT-4, which “learned” to generate text by ingesting examples from social media, news outlets, Wikipedia, e-books and more. While OpenAI claims that it took steps to filter the data and minimize ChatGPT’s potential for toxicity, it’s clear that a few questionable samples ultimately slipped through the cracks.

Another potential solution is performing and publishing the results of “stress tests” to inform users of where ChatGPT falls short. These could help companies in addition to developers “make a more informed decision” about where — and whether — to deploy ChatGPT, the researchers say.

ChatGPT toxic

Image Credits: AI2

“In the short-term, ‘first-aid’ can be provided by either hard-coding responses or including some form of post-processing based on other toxicity-detecting AI and also fine-tuning the large language model (e.g. ChatGPT) based on instance-level human feedback,” Deshpande said. “In the long term, a reworking of the fundamentals of large language models is required.”

My colleague Devin Coldewey argues that large language models à la ChatGPT will be one of several classes of AIs going forward — useful for some applications but not all-purpose in the way that vendors, and users, for that matter, are currently trying to make them.

I tend to agree. After all, there’s only so much that filters can do — particularly as people make an effort to discover and leverage new exploits. It’s an arms race: As users try to break the AI, the approaches they use get attention, and then the creators of the AI patch them to prevent the attacks they’ve seen. The collateral damage is the terribly harmful and hurtful things the models say before they’re patched.

Researchers discover a way to make ChatGPT consistently toxic by Kyle Wiggers originally published on TechCrunch

https://techcrunch.com/2023/04/12/researchers-discover-a-way-to-make-chatgpt-consistently-toxic/


December 2024
M T W T F S S
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

About Us

Welcome to encircle News! We are a cutting-edge technology news company that is dedicated to bringing you the latest and greatest in everything tech. From automobiles to drones, software to hardware, we’ve got you covered.

At encircle News, we believe that technology is more than just a tool, it’s a way of life. And we’re here to help you stay on top of all the latest trends and developments in this ever-evolving field. We know that technology is constantly changing, and that can be overwhelming, but we’re here to make it easy for you to keep up.

We’re a team of tech enthusiasts who are passionate about everything tech and love to share our knowledge with others. We believe that technology should be accessible to everyone, and we’re here to make sure it is. Our mission is to provide you with fun, engaging, and informative content that helps you to understand and embrace the latest technologies.

From the newest cars on the road to the latest drones taking to the skies, we’ve got you covered. We also dive deep into the world of software and hardware, bringing you the latest updates on everything from operating systems to processors.

So whether you’re a tech enthusiast, a business professional, or just someone who wants to stay up-to-date on the latest advancements in technology, encircle News is the place for you. Join us on this exciting journey and be a part of shaping the future.

Podcasts

TWiT 1011: The Year in Review – A Look at the Top Stories of 2024 This Week in Tech (Audio)

What's behind the tech industry's mass layoffs in 2024? : NPR Rabbit R1 AI Assistant: Price, Specs, Release Date | WIRED Stealing everything you've ever typed or viewed on your own Windows PC is now possible with two lines of code — inside the Copilot+ Recall disaster. Microsoft delays Recall after security concerns, and asks Windows Insiders for help The Qualcomm Snapdragon X Architecture Deep Dive: Getting To Know Oryon and Adreno X1 Elon Musk: First Human Receives Neuralink Brain Chip Apple hit with €1.8bn fine for breaking EU law over music streaming Bluesky emerges The hidden high cost of return-to-office mandates Apple's Car Was Doomed by Its Lofty Ambitions to Outdo Tesla SpaceX pulls off unprecedented feat, grabs descending rocket with mechanical arms U.S. versus Apple: A first reaction Google Says It Won't Force Gemini on Partners in Antitrust Remedy Proposal U.S. Accuses Chinese Hackers of Targeting Critical Infrastructure in America U.S. Agency Warns Employees About Phone Use Amid Ongoing China Hack AT&T says criminals stole phone records of 'nearly all' customers in new data breach National Public Data confirms breach exposing Social Security numbers Schools Want to Ban Phones. Parents Say No. New York passes legislation that would ban 'addictive' social media algorithms for kids GPT-4o (omni) + new "Her"-style AI assistant (it's nuts) Google emissions jump nearly 50% over five years as AI use surges Trump proposes strategic national crypto stockpile at Bitcoin Conference Ten additional US states join DOJ antitrust lawsuit looking to break up Live Nation and TicketmasterThe Internet Archive just lost its appeal over ebook lending Hezbollah Pagers Explode in Apparent Attack Across Lebanon OpenAI raises $6.6 billion in largest VC round ever Painting by A.I.-Powered Robot Sells for $1.1 Million Netflix's Live Mike Tyson Vs. Jake Paul Fight Battling Sound & Streaming Glitches In Lead-Up To Main Event Infowars Sale to The Onion Rejected by Federal Bankruptcy Judge Supreme Court agrees to hear challenge to TikTok ban So You Want to Solve the NJ Drone Mystery? Our Expert Has Some Ideas Beeper's push for iMessage on Android is really over The Quiet Death of Ello's Big Dreams Japan finally ends mandatory form submission on floppy disks We'll Miss You: Pioneering instant messaging program ICQ is finally shutting down after nearly 30 years Spotify is going to break every Car Thing gadget it ever sold Game Informer to Shut Down After 33 Years In Memoriam Host: Leo Laporte Guests: Fr. Robert Ballecer, SJ, Richard Campbell, and Mikah Sargent Download or subscribe to This Week in Tech at https://twit.tv/shows/this-week-in-tech Get episodes ad-free with Club TWiT at https://twit.tv/clubtwit Sponsor: bitwarden.com/twit
  1. TWiT 1011: The Year in Review – A Look at the Top Stories of 2024
  2. TWiT 1010: The Densest State in the US – TikTok Ban, Drones Over Jersey, GM Quits Robotaxis
  3. TWiT 1009: Andy Giveth & Bill Taketh Away – Trump's Tech Titans, Crypto Boom, TikTok's US Ban, Intel CEO Exits
  4. TWiT 1008: Internet Legal – Australia's Social Media Ban for Kids, Smart Home Nightmare, Bluesky's Ascent
  5. TWiT 1007: All the Hotdogs in the World – China's "Salt Typhoon" Hack, Google on the Chopping Block, Recall AI