Why MEDIA AGENCIES should care about generative AI’s Data Thirst

Why MEDIA AGENCIES should care about generative AI’s Data Thirst

by Ann-Kathrin Pfleging

Over the last two years, generative AI’s potential to transform work and society has become central to discussions about technological innovation. For media agencies and the wider marketing industry, the question remains: is the enthusiasm justified, or is generative AI overhyped?

By using AI-driven algorithms, media companies can supply hyper-targeted content recommendations and align campaigns with individual preferences and behaviours. It already plays a significant role in today’s advertising, with Meta announcing just this month new image and text generators that streamline creative production whilst following brand guidelines.

However, generative AI’s dependence on vast, high-quality internet data also raises urgent concerns. In her keynote at SXSW, star futurist Amy Webb explains this problem. As she puts it, generative AI could very soon “run out” of the internet’s data, triggering a crucial question: What happens when there’s no more data left to feed generative AI?

In media, where generative AI is used for content creation, media planning, activation and search, this has far-reaching consequences.

Because the internet’s data sources are limited, AI-generated content risks becoming repetitive and lacking in diversity. This exacerbates problems of algorithmic biases that are already major criticisms of AI. In addition, AI algorithms may struggle to differentiate between accurate facts and misinformation because they rely on outdated or biased data.

In media reporting, this could also lead to distorted representations and inaccurate predictions of audience behaviour. This then impacts campaign effectiveness and audience engagement, as well as performance reporting, weakening the overall reliability of AI-generated insights.

Recent agreements underline this growing thirst for high-quality data. OpenAI, the developer of ChatGPT, has signed a five-year deal with News Corp to get access to the content of publications such as The Wall Street Journal and The Times. Similar deals have also been agreed with Axel Springer and the Financial Times. However, while such partnerships provide a temporary solution by feeding AI with more data, they also highlight media owners’ increasing reliance on AI companies for revenue whilst print sales continue to decline. The long-term impact of this for the media industry remains to be seen.

So, what’s next?

In search for longer-term solutions, AI developers, including industry giants Meta, Google, and Microsoft, started to train their AI models with synthetic data, meaning artificial data generated by AI. However, not only does this risk further worsening existing biases and misinformation, but researchers have also observed irreversible errors and nonsensical results in these models (Firstpost 2024), leaving generative AI at risk to become outdated and less capable of continuing to produce diverse and accurate content.

In short, generative AI’s data thirst matters for media agencies. It not only puts its own future at stake but also poses challenges to the media industry which already relies on generative AI’s capabilities. It is therefore crucial to proactively address these challenges by promoting continuous learning about how to navigate this rapidly changing landscape. Yet, we should not forget to balance these innovations with ethical considerations to ensure that everyone benefits equitably.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertisement".
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
JSESSIONID	session	Used by sites written in JSP. General purpose platform session cookies that are used to maintain users' state across page requests.
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
bcookie	2 years	This cookie is set by linkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
lang	session	This cookie is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	1 day	This cookie is set by LinkedIn and used for routing.

Cookie	Duration	Description
_ga	2 years	This cookie is installed by Google Analytics. The cookie is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. The cookies store information anonymously and assign a randomly generated number to identify unique visitors.
_gat_UA-40512542-1		This is a pattern type cookie set by Google Analytics, where the pattern element on the name contains the unique identity number of the account or website it relates to. It appears to be a variation of the _gat cookie which is used to limit the amount of data recorded by Google on high traffic volume websites.
_gid	1 day	This cookie is installed by Google Analytics. The cookie is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visted in an anonymous form.
CONSENT	16 years 5 months	These cookies are set via embedded youtube-videos. They register anonymous statistical data on for example how many times the video is displayed and what settings are used for playback.No sensitive data is collected unless you log in to your google account, in that case your choices are linked with your account, for example if you click “like” on a video.
UID	2 years	No description available.
vuid	2 years	This domain of this cookie is owned by Vimeo. This cookie is used by vimeo to collect tracking information. It sets a unique ID to embed videos to the website.

Cookie	Duration	Description
bscookie	2 years	This cookie is a browser ID cookie set by Linked share Buttons and ad tags.
IDE	1 year 24 days	Used by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
test_cookie	15 minutes	This cookie is set by doubleclick.net. The purpose of the cookie is to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	This cookie is set by Youtube. Used to track the information of the embedded YouTube videos on a website.
YSC	session	This cookies is set by Youtube and is used to track the views of embedded videos.
yt-remote-connected-devices	never	These cookies are set via embedded youtube-videos.
yt-remote-device-id	never	These cookies are set via embedded youtube-videos.
yt.innertube::nextId	never	These cookies are set via embedded youtube-videos.
yt.innertube::requests	never	These cookies are set via embedded youtube-videos.

Why MEDIA AGENCIES should care about generative AI’s Data Thirst

Cookie Policy