• This repository has been archived on 16/Mar/2023
  • Stars
    star
    934
  • Rank 48,927 (Top 1.0 %)
  • Language Makefile
  • License
    Other
  • Created over 5 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This proposal has been replaced by the Topics API.

Replaced by the Topics API

Note that this proposal has been replaced by the Topics API.

Federated Learning of Cohorts (FLoC)

This is an explainer for a new way that browsers could enable interest-based advertising on the web, in which the companies who today observe the browsing behavior of individuals instead observe the behavior of a cohort of similar people.

Overview

The choice of what ads to show on a web page may typically be based on three broad categories of information:

  1. First-party and contextual information (e.g., "put this ad on web pages about motorcycles")
  2. General information about the interests of the person who is going to see the ad (e.g., “show this ad to Classical Music Lovers”)
  3. Specific previous actions the person has taken (e.g., "offer a discount on some shoes that you left in a shopping cart")

This document addresses category 2, ads targeting based on someone's general interests.
For personalized advertising in category 3, please check out the TURTLEDOVE proposal.

In today's web, people’s interests are typically inferred based on observing what sites or pages they visit, which relies on tracking techniques like third-party cookies or less-transparent mechanisms like device fingerprinting. It would be better for privacy if interest-based advertising could be accomplished without needing to collect a particular individual’s browsing history.

We plan to explore ways in which a browser can group together people with similar browsing habits, so that ad tech companies can observe the habits of large groups instead of the activity of individuals. Ad targeting could then be partly based on what group the person falls into.

Browsers would need a way to form clusters that are both useful and private: Useful by collecting people with similar enough interests and producing labels suitable for machine learning, and private by forming large clusters that don't reveal information that's too personal, when the clusters are created, or when they are used.

A FLoC cohort is a short name that is shared by a large number (thousands) of people, derived by the browser from its user’s browsing history. The browser updates the cohort over time as its user traverses the web. The value is made available to websites via a new JavaScript API:

cohort = await document.interestCohort();
url = new URL("https://ads.example/getCreative");
url.searchParams.append("cohort", cohort);
creative = await fetch(url);

The browser uses machine learning algorithms to develop a cohort based on the sites that an individual visits. The algorithms might be based on the URLs of the visited sites, on the content of those pages, or other factors. The central idea is that these input features to the algorithm, including the web history, are kept local on the browser and are not uploaded elsewhere — the browser only exposes the generated cohort. The browser ensures that cohorts are well distributed, so that each represents thousands of people. The browser may further leverage other anonymization methods, such as differential privacy. The number of cohorts should be small, to reinforce that they cannot carry detailed information — short cohort names ("43A7") can help make that clear.

The meaning of a particular cohort should stay roughly consistent over time. As individual people's browsing behavior changes, their cohort will change too, but the algorithm that turns input features into cohort assignments should remain stable. If that cohort assignment algorithm does eventually need to change, then the migration to a new assignment algorithm will need to be clearly communicated by the API, so that consumers of the cohort signal are well informed of the need to update their usage. (See Issue #58 for more on this topic.)

Privacy and Security Considerations

There are several abuse scenarios this proposal must consider.

Revealing People’s Interests to the Web

This API democratizes access to some information about an individual’s general browsing history (and thus, general interests) to any site that opts into it. This is in contrast to today’s world, in which cookies or other tracking techniques may be used to collate someone’s browsing activity across many sites.

Sites that know a person’s PII (e.g., when people sign in using their email address) could record and reveal their cohort. This means that information about an individual's interests may eventually become public. This is not ideal, but still better than today’s situation in which PII can be joined to exact browsing history obtained via third-party cookies.

As such, there will be people for whom providing this information in exchange for funding the web ecosystem is an unacceptable trade-off. Whether the browser sends a real FLoC or a random one is user controllable.

Tracking people via their cohort

A cohort could be used as a user identifier. It may not have enough bits of information to individually identify someone, but in combination with other information (such as an IP address), it might. One design mitigation is to ensure cohort sizes are large enough that they are not useful for tracking. The Privacy Budget explainer points towards another relevant tool that FLoC could be constrained by.

Longitudinal Privacy

The expectation is that the user’s FLoC will be updated over time, so that it continues to have advertising utility. The privacy impacts of this need to be taken into consideration. For instance, multiple FLoC samples means that more information about a user’s browsing history is revealed over time. Possible mitigations include not updating FLoC on a site once it has been called (making it sticky), or reducing the rate of refresh.

Second, if cohorts can be used for tracking, then having more interest cohort samples for a user will make it easier to reidentify them on other sites that have observed the same sequence of cohorts for a user. Possible mitigations for this include designs in which cohorts are updated at different times for different sites, ensuring each site sees a different cohort while the semantic meaning of the cohort remains the same.

Sensitive Categories

A cohort might reveal sensitive information. As a first mitigation, the browser should remove sensitive categories from its data collection. But this does not mean sensitive information can’t be leaked. Some people are sensitive to categories that others are not, and there is no globally accepted notion of sensitive categories.

Cohorts could be evaluated for fairness by measuring and limiting their deviation from population-level demographics with respect to the prevalence of sensitive categories, to prevent their use as proxies for a sensitive category. However, this evaluation would require knowing how many individual people in each cohort were in the sensitive categories, information which could be difficult or intrusive to obtain.

It should be clear that FLoC will never be able to prevent all misuse. There will be categories that are sensitive in contexts that weren't predicted. Beyond FLoC's technical means of preventing abuse, sites that use cohorts will need to ensure that people are treated fairly, just as they must with algorithmic decisions made based on any other data today.

Opting Out of Computation

A site should be able to declare that it does not want to be included in the user's list of sites for cohort calculation. This can be accomplished via a new interest-cohort permissions policy. This policy will be default allow. Any frame that is not allowed interest-cohort permission will have a default value returned when they call document.interestCohort(). If the main frame does not have interest-cohort permission then the page visit will not be included in interest cohort calculation.

For example, a site can opt out of all FLoC cohort calculation by sending the HTTP response header:

Permissions-Policy: interest-cohort=()

Proof of Concept Experiment

As a first step toward implementing FLoC, browsers will need to perform closed experiments in order to find a good clustering method to assign users to cohorts and to analyze them to ensure that they’re not revealing sensitive information about users. We consider this the proof-of-concept (POC) stage. The initial phase will be an experiment with cohorts to ensure that they are sufficiently private to be made publicly available to the web. This phase will inform any potential additional phases which would focus on other goals.

For this initial phase of Chrome’s Proof-Of-Concept, simple client-side methods will be used to calculate the user’s cohort based on all of the sites that they visit with public IP addresses. The qualifying subset of users who meet the criteria described below will have their cohort temporarily logged with their sync data to perform the sensitivity analysis by Chrome described below. The collection of cohorts will be analyzed to ensure that cohorts are of sufficient size and do not correlate too strongly with known sensitive categories. Cohorts that don’t pass the test will be concealed by the browser in any subsequent phases.

How the Interest Cohort will be calculated

This is where most of the experimentation will occur as we explore the privacy and utility space of FLoC. Our first approach involves applying a SimHash algorithm to the registrable domains of the sites visited by the user in order to cluster users that visit similar sites together. Other ideas include adding other features, such as the full path of the URL or categories of pages provided by an on-device classifier. We may also apply federated learning methods to estimate client models in a distributed fashion. To further enhance user privacy, we will also experiment with adding noise to the output of the hash function, or with occasionally replacing the user's true cohort with a random one.

During the experimentation phase, Chrome's various efforts at cohort assignment algorithms will be documented at https://www.chromium.org/Home/chromium-privacy/privacy-sandbox/floc.

Qualifying users for whom a cohort will be logged with their sync data

For Chrome’s POC, cohorts will be logged with sync in a limited set of circumstances. Namely, all of the following conditions must be met:

  1. The user is logged into a Google account and opted to sync history data with Chrome
  2. The user does not block third-party cookies
  3. The user’s Google Activity Controls have the following enabled:
    1. “Web & App Activity”
    2. “Include Chrome history and activity from sites, apps, and devices that use Google services”
  4. The user’s Google Ad Settings have the following enabled:
    1. “Ad Personalization”
    2. “Also use your activity & information from Google services to personalize ads on websites and apps that partner with Google to show ads.”

Sites which interest cohorts will be calculated on

All sites with publicly routable IP addresses that the user visits when not in incognito mode will be included in the POC cohort calculation.

Excluding sensitive categories

We will analyze the resulting cohorts for correlations between cohort and sensitive categories, including the prohibited categories defined here. This analysis is designed to protect user privacy by evaluating only whether a cohort may be sensitive, in the abstract, without learning why it is sensitive, i.e., without computing or otherwise inferring specific sensitive categories that may be associated with that cohort. Cohorts that reveal sensitive categories will be blocked or the clustering algorithm will be reconfigured to reduce the correlation.

More Repositories

1

webcomponents

Web Components specifications
HTML
4,360
star
2

import-maps

How to control the behavior of JavaScript imports
JavaScript
2,705
star
3

virtual-scroller

1,998
star
4

focus-visible

Polyfill for `:focus-visible`
JavaScript
1,607
star
5

webusb

Connecting hardware to the web.
Bikeshed
1,310
star
6

webpackage

Web packaging format
Go
1,231
star
7

EventListenerOptions

An extension to the DOM event pattern to allow authors to disable support for preventDefault
JavaScript
1,166
star
8

portals

A proposal for enabling seamless navigations between sites or pages
HTML
946
star
9

inert

Polyfill for the inert attribute and property.
JavaScript
920
star
10

scheduling-apis

APIs for scheduling and controlling prioritized tasks.
HTML
909
star
11

view-transitions

811
star
12

file-system-access

Expose the file system on the user’s device, so Web apps can interoperate with the user’s native applications.
Bikeshed
658
star
13

background-sync

A design and spec for ServiceWorker-based background synchronization
HTML
639
star
14

ua-client-hints

Wouldn't it be nice if `User-Agent` was a (set of) client hints?
Bikeshed
590
star
15

scroll-to-text-fragment

Proposal to allow specifying a text snippet in a URL fragment
HTML
586
star
16

observable

Observable API proposal
Bikeshed
582
star
17

aom

Accessibility Object Model
HTML
567
star
18

kv-storage

[On hold] A proposal for an async key/value storage API for the web
550
star
19

turtledove

TURTLEDOVE
Bikeshed
526
star
20

navigation-api

The new navigation API provides a new interface for navigations and session history, with a focus on single-page application navigations.
Makefile
486
star
21

webmonetization

Proposed Web Monetization standard
HTML
461
star
22

trust-token-api

Trust Token API
Bikeshed
421
star
23

attribution-reporting-api

Attribution Reporting API
Bikeshed
360
star
24

direct-sockets

Direct Sockets API for the web platform
HTML
329
star
25

shape-detection-api

Detection of shapes (faces, QR codes) in images
Bikeshed
304
star
26

display-locking

A repository for the Display Locking spec
HTML
297
star
27

dbsc

Bikeshed
297
star
28

background-fetch

API proposal for background downloading/uploading
Shell
281
star
29

first-party-sets

Bikeshed
280
star
30

serial

Serial ports API for the platform.
HTML
256
star
31

resize-observer

This repository is no longer active. ResizeObserver has moved out of WICG into
HTML
255
star
32

priority-hints

A browser API to enable developers signal the priorities of the resources they need to download.
Bikeshed
249
star
33

sanitizer-api

Bikeshed
227
star
34

is-input-pending

HTML
221
star
35

proposals

A home for well-formed proposed incubations for the web platform. All proposals welcome.
216
star
36

spatial-navigation

Directional focus navigation with arrow keys
JavaScript
212
star
37

js-self-profiling

Proposal for a programmable JS profiling API for collecting JS profiles from real end-user environments
HTML
197
star
38

cq-usecases

Use cases and requirements for standardizing element queries.
HTML
184
star
39

isolated-web-apps

Repository for explainers and other documents related to the Isolated Web Apps proposal.
Bikeshed
182
star
40

visual-viewport

A proposal to add explicit APIs to the Web for querying and setting the visual viewport
HTML
177
star
41

interventions

A place for browsers and web developers to collaborate on user agent interventions.
176
star
42

frame-timing

Frame Timing API
HTML
170
star
43

layout-instability

A proposal for a Layout Instability specification
Makefile
158
star
44

page-lifecycle

Lifecycle API to support system initiated discarding and freezing
HTML
154
star
45

nav-speculation

Proposal to enable privacy-enhanced preloading
HTML
154
star
46

speech-api

Web Speech API
Bikeshed
145
star
47

cookie-store

Asynchronous access to cookies from JavaScript
Bikeshed
143
star
48

construct-stylesheets

API for constructing CSS stylesheet objects
Bikeshed
137
star
49

webhid

Web API for accessing Human Interface Devices (HID)
HTML
137
star
50

color-api

A proposal and draft spec for a Color object for the Web Platform, loosely influenced by the Color.js work. Heavily WIP, if you landed here randomly, please move along.
HTML
132
star
51

fenced-frame

Proposal for a strong boundary between a page and its embedded content
Bikeshed
126
star
52

devtools-protocol

DevTools Protocol
JavaScript
120
star
53

sms-one-time-codes

A way to format SMS messages for use with browser autofill features such as HTML’s autocomplete=one-time-code.
Makefile
111
star
54

bundle-preloading

Bundles of multiple resources, to improve loading JS and the Web.
HTML
105
star
55

translation-api

A proposal for translator and language detector APIs
Bikeshed
104
star
56

privacy-preserving-ads

Privacy-Preserving Ads
HCL
100
star
57

manifest-incubations

Before install prompt API for installing web applications
HTML
99
star
58

window-controls-overlay

HTML
97
star
59

netinfo

HTML
95
star
60

compression-dictionary-transport

94
star
61

intrinsicsize-attribute

Proposal to add an intrinsicsize attribute to media elements
93
star
62

animation-worklet

🚫 Old repository for AnimationWorklet specification ➡️ New repository: https://github.com/w3c/css-houdini-drafts
Makefile
92
star
63

container-queries

HTML
91
star
64

local-peer-to-peer

↔️ Proposal for local communication between browsers without the aid of a server.
Bikeshed
90
star
65

shared-storage

Explainer for proposed web platform Shared Storage API
Bikeshed
89
star
66

async-append

A way to create DOM and add it to the document without blocking the main thread.
HTML
87
star
67

indexed-db-observers

Prototyping and discussion around indexeddb observers.
WebIDL
84
star
68

canvas-formatted-text

HTML
82
star
69

file-handling

API for web applications to handle files
82
star
70

canvas-color-space

Proposed web platform feature to add color management, wide gamut and high bit-depth support to the <canvas> element.
79
star
71

local-font-access

Web API for enumerating fonts on the local system
Bikeshed
77
star
72

performance-measure-memory

performance.measureMemory API
HTML
77
star
73

digital-credentials

Digital Credentials, like driver's licenses
HTML
77
star
74

handwriting-recognition

Handwriting Recognition Web API Proposal
Bikeshed
75
star
75

web-app-launch

Web App Launch Handler
HTML
75
star
76

pwa-url-handler

72
star
77

ContentPerformancePolicy

A set of policies that a site guarantees to adhere to, browsers enforce, and embedders can count on.
HTML
72
star
78

starter-kit

A simple starter kit for incubations
JavaScript
72
star
79

css-parser-api

This is the repo where the CSS Houdini parser API will be worked on
HTML
71
star
80

close-watcher

A web API proposal for watching for close requests (e.g. Esc, Android back button, ...)
Makefile
71
star
81

eyedropper-api

HTML
69
star
82

idle-detection

A proposal for an idle detection and notification API for the web
Bikeshed
67
star
83

storage-foundation-api-explainer

Explainer showcasing a new web storage API, NativeIO
65
star
84

video-editing

65
star
85

uuid

UUID V4
63
star
86

client-hints-infrastructure

Specification for the Client Hints infrastructure - privacy preserving proactive content negotiation
Bikeshed
61
star
87

sparrow

60
star
88

private-network-access

HTML
58
star
89

element-timing

A proposal for an Element Timing specification.
Bikeshed
57
star
90

document-picture-in-picture

Bikeshed
56
star
91

video-rvfc

video.requestVideoFrameCallback() incubation
HTML
53
star
92

time-to-interactive

Repository for hosting TTI specification and discussions around it.
52
star
93

digital-goods

Bikeshed
50
star
94

soft-navigations

Heuristics to detect Single Page Apps soft navigations
Bikeshed
46
star
95

raw-clipboard-access

An explainer for the Raw Clipboard Access feature
44
star
96

storage-buckets

API proposal for managing multiple storage buckets
Bikeshed
43
star
97

pending-beacon

A better beaconing API
Bikeshed
43
star
98

admin

đź‘‹ Ask your questions here! đź‘‹
HTML
42
star
99

web-smart-card

Repository for the Web Smart Card Explainer
HTML
42
star
100

web-preferences-api

The Web Preference API aims to provide a way for sites to override the value for a given user preference (e.g. color-scheme preference) in a way that fully integrates with existing Web APIs.
Bikeshed
41
star