FLoC

From Bitnami MediaWiki
Revision as of 14:24, 15 February 2021 by Jkoran (talk | contribs) (→‎Impact)
Jump to navigation Jump to search

The goal of Google's Federated Learning of Cohorts (FLoC) is to reduce organizations' ability to access audience membership across different publishers.[1]

Google's FloC proposal collects and processes web client behavior across various web sites to assign each web client to a cohort cluster. A cohort identifier, which Google calls “flock,” is short enough string (e.g., “43A7”) such that it cannot – even with other data – be used to uniquely identify a particular device. On each request the web client will send the cohort ID Google has assigned to this web client using the "Sec-CH-Flock" header.

The minimum number of people in a cohort is likely in the thousands.[2] To ensure that a minimum number of web clients is in each cohort, the web client identifier is sent to a Google controlled server to enable distinct counting.

Marketers could measure which cohorts interact with their content, but would not know distinct reach or frequency. Marketers would also presumably be prohibited from associating particular outcomes on their web property with the cohort associated with the user's web client.

Google published a research paper that compared the accuracy of assigning browser identifiers to cohorts.[3] Google's findings were that it was able to achieve 70% accuracy in building cohort clusters relative to random assignment, which is still well below the segmentation accuracy of the current industry standard of using cookies for audience creation.

Impact

By removing audience segmentation, marketers would be unable to perform retargeting and frequency capping.

Another potential impact of removing this information is a degraded end user experience.

Another impact of cohorts is that they do not support smaller organizations (advertisers or publishers) as their audiences may be too small to meet the minimum threshold set by the cohort.

The philosophy behind FLoC's unsupervised clustering has also been criticized as revealing sensitive information, such as people who frequent protected health conditions, or other protected classes of information. "The web currently has more than 1.2 billion sites (including parked domains). It is impractical for even a large browser developer to test for which patterns of usage of which sites are inadvertently revealing sensitive information about a user." [4]

The lack of any incentive for publishers to transfer audience building solely to Google has also been criticized. [5]

Open Questions

  • How frequently does a user's cohort membership change?
  • What is the value impairment associated with shifting from marketer-defined audiences to Google-defined cohorts?

References