WordPress data mining and sharing

Jason Koebler:

Update: After this article was published, Automattic told 404 Media that it is “deprecating” the Firehose: “SocialGist is rolling off as a firehose customer this month and the remaining customers are winding down in the coming months, both things that were already in motion for different reasons,” an Automattic spokesperson said. “We’re in the process of updating our developer page to indicate that we have been deprecating the old firehose for several months.” The spokesperson did not answer the original questions we posed to them about the data supply chain for the Firehose.

In September 2023, WordPress.com quietly changed the language of a developer page explaining how to access a “Firehose” of roughly a million daily WordPress posts to add that the feeds are “intended for partners like search engines, artificial intelligence (AI) products and market intelligence providers who would like to ingest a real-time stream of new content from a wide spectrum of publishers.” Before then, this page did not note the AI use case. 

This is notable because of the fervor and confusion that has arisen this week after we broke the news that Automattic, which owns WordPress.com and Tumblr, was preparing to send user data to OpenAI and Midjourney. Since then, there has been much discussion about which WordPress blogs would be included, which would not, whether data was already sent, and whether people who opt out would have their data redacted retroactively.