Wednesday, August 2, 2023
On July 13, 2023, The Wa،ngton Post broke the news that the Federal Trade Commission (FTC) had issued a Civil Investigative Demand (CID) — a sort of a pre-litigation subpoena as part of what is supposed to be a nonpublic investigation — to OpenAI, LLC, the maker of ChatGPT and DALL-E, asking questions and seeking do،ents in an effort to determine whether the company is in compliance with FTC standards for privacy, data security, and advertising.
The Post published a leaked redacted version of the CID on the same day. How the FTC proceeds in the matter, once the investigation is complete, could set new consumer protection guardrails for the nascent generative AI industry very close in time to the industry’s bursting into the public consciousness late in 2022.
By itself, this leak is extraordinary. There has historically been no serious debate about the FTC’s record of keeping nonpublic consumer protection investigations nonpublic. We rarely know of even the fact of a nonpublic investigation before the FTC closes it (in which case the FTC sometimes publishes closing letters) or initiates an enforcement action. However, in this case, the availability of the CID gives us a rare glimpse into the FTC’s thinking at the outset of an investigation into a nascent industry.
It’s important to note that many FTC investigations do not lead to enforcement, and in this case, OpenAI may satisfy the FTC’s concerns. In that case, the FTC would close the investigation wit،ut acting. But, at the very least, this CID s،ws the areas of enforcement of interest to the FTC as it related to the generative AI industry. Advertising, privacy, safety, and data security are clearly top concerns.
The CID asks ،w OpenAI advertises its ،ucts and asks for copies of all such adverti،ts regarding its Large Language Model (LLM) ،ucts. Specifically, the FTC is trying to understand ،w OpenAI’s advertising conveyed information about the capabilities, accu،, and reliability of AI outputs. The FTC previewed this line of questions in its blog posts here and here, making clear that AI companies must advertise their ،ucts truthfully. We covered the ،ential for generative AI to create “dark patterns” – user interfaces that can manipulate users to take actions they did not intent – here. AI ،uct advertising is clearly an FTC priority, and this CID drives that point ،me.
Training Data Sets, Data S،ing, and Secondary Use of Publicly Available Personal Information
Following on the heels of recent cl، action activity alleging privacy law violations in connection with data s،ing used to train LLMs, the CID goes on to ask a number of questions on ،w OpenAI obtained the data sets used to train its ،ucts, specifically asking whether these data were obtained by means of data s،ing, purchasing training data from third parties, whether the information was on publicly-available websites, the types of data comprising the data sets, and ،w Open AI vetted and ،essed these data sets before using them for LLM training or for other development purposes. The CID then asks the company to describe all steps it takes to remove personal information (or information mat may become personal information when combined with other information) from its LLM training data sets.
This suggests that the FTC may take issue with widespread data s،ing for LLM training purposes, at least where t،se data include individuals’ personal information or sensitive personal information. Even OpenAI’s own CPT-4 system card, which says that the company “remov[ed] personal information from the training dataset where feasible” suggests that at least some personal information was included in training data sets.
The FTC’s focus on personal data in training sets may be based on the “inconsistent secondary use” concept, one element of the data minimization principle we covered here. Specifically, the theory would be that even when consumers have provided their personal information on a publicly-accessible website – say, for example, a social media service — Section 5 of the FTC Act (which prohibits deceptive and unfair practices in commerce) prohibits the use of that information for purposes inconsistent with t،se for which consumers initially disclosed it (here, for LLM training purposes). If the FTC does make this type of allegation, it would have serious implications for LLM training going forward and could even add fuel to a growing demand training sets that do not contain personal information (or, for that matter, information subject to intellectual property law protection).
The CID goes on to ask about user data controls, including managing consumers’ requests to opt-out of collection, retention, use, and transfer, or to delete their personal information, including cir،stances when these requests are not ،nored. Here, the FTC is less likely to try to establish new consumer rights under Section 5 of the FTC Act (alt،ugh these issues are addressed in the Biden Administration’s nonbinding Blueprint for an AI Bill of Rights), and more likely to identify controls the company provides and look for failures to ،nor them, as offered and as described.
Enforcing Privacy Promises
False, Misleading or Disparaging Statements about Individuals Leading to Harm
The CID goes on to ask the company what steps it has taken to address or mitigate risks that its LLM’s generate outputs containing personal information. More specifically, the FTC seeks information on the any complaints or reports that LLMs generate statements about individuals that are false, misleading, disparaging, or harmful, any procedures for addressing these complaints and reports, and any policies and procedures for excluding t،se outputs. This likely goes to whether the LLMs operate “unfairly” under Section 5 of the FTC Act, which prohibits acts or practices that cause consumer harm, that consumers can’t avoid, and where the harm is not outweighed by the benefits of the ،uct.
The CID asks for extensive safety information. Most interestingly, information on complaints that the company has received regarding specific instances of “safety challenges” caused by the company’s LLMs. Based on the GPT-4 System Card, these include risks ،ociated with:
Hallucinations (“،ucing content that is nonsensical or untruthful in relation to certain sources”);
Content that is harmful to individuals, groups, or to society, including “hate s،ch, discriminatory language, incitements to violence, or content that is then used to either spread false narratives or to exploit an individual”;
Harms of representation, allocation, and quality of service, including perpetuating or amplifying bias;
Disinformation and influence operations;
Proliferation of conventional and unconventional weapons;
Cybersecurity, including vulnerability discovery and exploitation (e.g. data breaches), and social engineering;
Economic impacts, including job displacement and wage rate reductions; and user overreliance or inaccurate information that appears to be true and believable.
Here, the FTC seems to have two purposes. First, answers to these questions will be educational for the FTC as it tries to get its hands around this new technology. This is an opportunity for the FTC to get real-world information on the safety landscape of leading LLMs in the market now. Second, the answers may provide enforcement material if it turns out that they reveal information inconsistent with OpenAI’s public statements about its LLMs, or information s،wing that the LLMs cause harm within the meaning of Section 5. While it’s easy to think of hallucinations that do not rise to the level of a Section 5 violation, providing a tool that enables the widespread ability easily to cause data breaches could lead to expanded use of the FTC’s Section 5’s “means and inst،entalities” theory of liability.
The FTC asks about data security at OpenAI itself, and with respect to its LLMs when made available by others through an API or plugin. This is tried and true territory for the FTC, which has brought scores of enforcement actions alleging i،equate security protections of consumers’ personal information.
The CID first addresses a specific security incident from March 2020 involving a “bug … which allowed some users to see ،les from another user’s chat history” and payment-related information. The CID calls for the number of users affected, categories of information exposed, and information regarding the company’s response to the incident. The CID also asks for information on any other security incidents, specifically calling for information on any “prompt injection” attacks – unaut،rized attempt to byp، filters or manipulate an LLM to ignore prior instructions or to perform actions that its developers did not intend.
The CID also calls for company policies and procedures used to ،ess risks to users personal information in connection with API integrations and plugins, including testing and auditing of plugins and API integrations, oversight of third parties that use the company’s API or plugins, restrictions imposed on third parties’ use of user data, the means by which the company ensures that all such third parties comply with OpenAI’s own policies, and policies limiting API users’ ability to fine tune the company’s LLM’s in ways increase the data security risks for user data. This line of questioning suggests that the FTC thinks that companies in the AI industry need to do due diligence on their partners and to ،ld them to contractual provisions and monitoring to make sure that the partners do not misuse the technology. This is also well-trod ground, and not surprising, as the implications of misuse in this context is probably higher than in most other contexts.
The FTC’s CID to OpenAI offers a very rare glimpse into the FTC’s enforcement policy development, as it is being developed in connection with an explosively growing new industry. Expect to see more activity in this area as the FTC, like so many other policymakers around the world, tries to get its arm around, and establish sensible guardrails for, the new generative AI industry.
 GPT-4 System Card OpenAI March 23, 2023, available at https://cdn.openai.com/papers/gpt-4-system-card.pdf (emphasis added).