Methodology

Section 1

Our Approach

Rascasse occupies a distinct position in the research landscape: Behavioral Audience Intelligence. We are neither a social listening provider nor a survey research provider. Instead, we systematically analyze observable digital behavior across multiple sources to construct audience profiles grounded in what people do, not what they say.

This distinction matters. The gap between self-reported attitudes and actual behavior — the Say-Do Gap — is well-documented in both academic literature and industry practice. Choi and Varian demonstrated that digital search behavior predicts real-world economic activity more accurately than traditional survey instruments.1 Kosinski et al. showed that digital records of human behavior can predict personal attributes with remarkable accuracy.2

The market research industry itself increasingly acknowledges this challenge. At IIeX North America 2025, Qrious Insights presented findings suggesting approximately 80% error rates in self-reported media consumption data.{{methodology.toc_item_14}} Rep Data's 2025 State of Survey Fraud report analyzed 4.1 billion survey attempts and found 33% to be fraudulent and 27% inattentive — leaving roughly half of collected responses genuinely usable.{{methodology.toc_item_15}}

Three Paradigms of Audience Research

Traditional

Survey Research

Asks people what they think, buy, and watch. Subject to recall bias, social desirability, and declining response rates.

Rascasse

Behavioral Intelligence

Observes what people actually do across digital sources. Multi-source triangulation of behavioral data.

Channel-Specific

Social Listening

Monitors conversations on social channels. Limited to vocal minorities and channel-specific populations.

Our approach aligns with what the ICC/ESOMAR International Code (5th Edition, 2025) now formally recognizes: the legitimate role of the “researcher as data curator” — professionals who derive insights from existing data sources rather than generating primary data through direct participant contact.5

At ESOMAR Reimagine 2025, Heineken presented a risk framework for synthetic and imputed data. Under this framework, Rascasse's methodology classifies as “Step 1: Data Imputation” — the lowest-risk category, as it draws inferences from real behavioral signals rather than generating synthetic data.{{methodology.toc_item_17}}

Key Principles

Multi-source triangulation: Every data point is validated against independent sources. No single source dominates the output.
Aggregated, non-personal data: We process behavioral patterns at population level. No individual tracking, no personal data processing — GDPR compliance by design.
Observable behavior over stated preferences: Search queries, engagement patterns, and consumption behavior provide higher-fidelity data than self-reported survey responses.
Transparency about uncertainty: Where data is sparse, we report insufficient data rather than imputed values.

Section 2

Data Architecture & Independence

Rascasse's data architecture is deliberately conservative. We have operated without a single source ban, API revocation, cease-and-desist letter, or terms-of-service violation. This is not by accident — it is by design. Our architecture is built on publicly observable behavior that does not require privileged API access, user authentication, or third-party partnerships that can be revoked.

No Third-Party Cookie Dependency

While much of the digital advertising ecosystem faces disruption from cookie deprecation — Google's Privacy Sandbox, Safari's ITP, Firefox's ETP — Rascasse's methodology is entirely cookie-independent. We do not track individual users across websites. Our data points are aggregated behavioral patterns: search volumes, engagement metrics, and public interaction data. None of these rely on browser-level tracking mechanisms.

Zero Customer Data Required

Rascasse does not require access to client CRM systems, first-party data, customer databases, or any proprietary information. Our intelligence is derived entirely from publicly available behavioral data. This means no data processing agreements (DPAs) beyond standard SaaS terms, no risk of commingling client data with third-party sources, no onboarding delay for data integration, and full GDPR compliance by design.

Source Independence

Unlike competitors who depend on a single source's API — such as the Twitter/X Decahose or Meta's Marketing API — Rascasse's multi-source architecture ensures no single source change can disrupt our data pipeline. When third-party services restrict API access, as Twitter/X did in 2023 or as Meta periodically adjusts its Marketing API, our methodology remains unaffected.

Risk Factor	Survey-Based	Social-Graph	Behavioral (Rascasse)
Source API dependency	Panel providers	Twitter/X Decahose	None (public data)
Cookie dependency	Tracking pixels	None	None
Customer data required	None	None	None
Source ban risk	Panel fraud risk	API revocation risk	None (no TOS violation)
GDPR data processing	Panel consent required	Social data consent	No personal data processed

The ESOMAR Guideline on Passive Data Collection, Observation and Recording explicitly recognizes the legitimacy of research based on publicly observable data, provided it adheres to transparency and proportionality principles — both of which Rascasse's architecture satisfies by design.7

Section 3

Data Sources

Rascasse ingests behavioral data from multiple independent categories. Each category captures a distinct facet of digital behavior, and no single source dominates the final output. This multi-source approach follows the principles of data fusion as described by Ipsos MediaCT: combining independent data streams to produce estimates that no single source could deliver alone.8

Category	What We Capture	Data Type
Search Behavior	Query volumes, seasonal patterns, regional distribution	Intent data
Social Channels	Follower graphs, engagement rates, content interaction	Interest data
Video & Streaming	View counts, playlist behavior, channel subscriptions	Consumption data
Public Records	TV ratings, sales charts, award databases, Wikipedia	Validation data
Published Primary Research	Published survey results, census data, Pew studies	Calibration data
Location Data	POI databases, check-in patterns, store locator data	Spatial data

Multi-Source Data Pipeline

Ingest

Raw Data

→

Normalize

Cross-Source Alignment

→

Validate

Multi-Source Cross-Check

→

Output

Data Point & Audience Profiles

Principle

Each profile is built from multiple independent data sources. Data points that cannot be corroborated across at least two independent sources are flagged with reduced confidence scores.

Section 4

Data Point Profiling

A data point in Rascasse's system is any discrete cultural, commercial, or social object that generates measurable digital behavior. The system currently profiles over 320,000 data points across five types: Brands, People, Events, Media, and Topics.

Data Point Construction

Each data point is defined by a curated set of search keywords, aliases, and category assignments. This curation is essential: the same surface-level query can refer to different data points (e.g., “Jaguar” the car brand vs. “Jaguar” the animal), and disambiguation requires domain expertise combined with algorithmic validation.

Data Point Size

Data Point Size is a normalized metric that combines search volume with social engagement data. It provides a comparable measure of a data point's overall digital footprint, enabling cross-category and cross-country comparisons. Data Point Size is type-specific: a brand is weighted differently from a person or an event, reflecting the distinct behavioral patterns each type generates.

Quality Factor (QualFactor)

Each data point carries a QualFactor score derived from cross-validation between search-based and engagement-based behavioral data. High QualFactor indicates consistent patterns across independent sources; low QualFactor triggers manual review or data enrichment.

Scope

Data point profiling covers 172 countries. New data points can be onboarded within days, not months — a significant advantage over survey-based systems that require new questionnaire design and fieldwork for each addition.

Section 5

Audience Construction

Audiences in Rascasse are constructed from data points using domain-expert curation — not algorithmic clustering. This deliberate design choice ensures semantic coherence: a “Premium Automotive Enthusiasts” audience is built by experts who understand which brands, media properties, events, and influencers define that segment.

Single-Data-Point Audiences

The simplest audience type centers on a single data point. “Fans of the Dallas Cowboys” captures all digital behavior associated with the Dallas Cowboys — search patterns, social engagement, content consumption, and related brand affinities.

Multi-Data-Point Audiences

Complex audiences combine multiple data points using logical combinations (AND, OR, NOT). For example, a “Sustainable Fashion” audience might combine sustainability-focused brands, ethical fashion media, and relevant influencers — while excluding fast-fashion brands.

Weighted Aggregation

When constructing multi-data-point audiences, component data points are weighted by relevance. An “American Hip-Hop Fans” audience might weight artists more heavily than media outlets, reflecting the stronger behavioral pattern that artist engagement provides.

Differentiation

Unlike survey-based providers where researchers must define audiences via questionnaire logic, or social listening tools that rely on keyword matching in conversations, Rascasse audiences are built by domain experts who understand the semantic relationships between brands, people, and properties. This produces more nuanced and culturally accurate segments.

Section 6

Demographic Modeling

Demographics are not directly observable from search data. Instead, we employ a multi-source estimation approach that combines several independent demographic indicators into a composite estimate. Each indicator contributes a piece of evidence; the final demographic profile emerges from the convergence of these independent inputs.

Indicator 1

Channel-Specific Audience Composition

Each social channel has documented demographic distributions (Pew Research, 2025). TikTok skews 18-29, LinkedIn skews higher education, Facebook skews 30+. A data point's relative strength across these channels informs its demographic profile.

Indicator 2

Influencer Affinity Transfer

When an influencer with a known audience profile shows affinity to a brand, a portion of that demographic evidence transfers via Bayesian updating: prior (brand profile) + likelihood (influencer audience) = posterior estimate.

Indicator 3

Visual Demographic Analysis

Computer vision applied to publicly accessible profile images provides age and gender distribution estimates at aggregate level, following methods established by Rothe, Timofte & Van Gool (2018) and Cesare et al. (2017).

Indicator 4

Public Primary Research Calibration

Published studies (Pew, Eurostat, national statistics offices), TV ratings with known age distributions, sales charts with category demographics, and publicly available market research serve as ground-truth calibration points.

Indicator 5

Bayesian Blending via Regional Search Patterns

Regions have known demographic profiles. When a brand is disproportionately searched in university towns, it suggests a younger audience. Bayesian updating combines national priors with regional search volume patterns.

The Bayesian framework underpinning indicators 2 and 5 follows established methods in marketing science, as described by Rossi, Allenby and McCulloch (2005)9 and applied to media mix modeling by Google Research (2017).10

The visual demographic analysis component builds on the DEX (Deep EXpectation) architecture for apparent age estimation from facial images11 and broader work on machine-learning-based demographic detection from social media.12

Channel-specific demographic distributions are calibrated against Pew Research Center's ongoing studies of social media usage patterns across demographic groups.13

Honest Caveat

Demographic estimates carry inherent uncertainty. We report confidence intervals and flag data points where demographic signals are sparse. Where insufficient data exists to produce a reliable estimate, we display “insufficient data” rather than imputed values. This transparency is fundamental to our methodology: we prefer accuracy over coverage.

Section 7

Affinity & Psychographic Modeling

Affinity Scores

Affinity measures the relative strength of the connection between an audience and a brand, person, or property. The baseline is 1.0, representing the market average. An affinity score above 1.0 indicates above-average interest; below 1.0 indicates below-average interest. This index-based approach — common in media planning — enables direct comparison across data points and audiences.

Affinity computation draws on techniques from collaborative filtering and matrix factorization, as described by Koren, Bell and Volinsky (2009) in the context of recommender systems.14 The core insight: co-occurrence patterns across behavioral data reveal latent preferences that individual data points alone cannot capture.

Psychographic Profiling (28 Traits)

Rascasse estimates 28 psychographic traits for each audience, organized around dimensions such as sustainability orientation, technology adoption, luxury affinity, health consciousness, and cultural engagement.

Each trait is scored through marker data points: brands, people, and properties that serve as strong indicators of a particular psychographic dimension. For example, the “Sustainability” trait draws on engagement patterns with brands like Patagonia, media about climate change, and events focused on environmental topics. The trait score represents how much an audience over- or under-indexes on these marker data points relative to the general population.

This approach is informed by research on predicting psychological traits from digital behavior2 and the Schwartz Values Framework, which provides a theoretically grounded taxonomy of human values.15 Boyd et al. (2015) demonstrated that value orientations can be reliably inferred from digital behavior patterns.16

Methodology Note

All psychographic scores are normalized against the market average. An audience with a sustainability score of 1.4 is 40% more sustainability-oriented than the general population — not “highly sustainable” in absolute terms. This relative framing prevents overclaiming.

Section 8

Location Intelligence

Rascasse provides location-level intelligence across 172 countries, 100,000+ cities, and 250,000+ postal codes. Location data is derived from the geographic distribution of search behavior combined with spatial analysis of points of interest (POIs).

Location Affinity

Location Affinity measures how strongly a brand or property resonates in a specific geography relative to the national average. It combines search volume distribution with interest patterns, following the regional analysis methodology first described by Choi and Varian (2012).1

Points of Interest (POI) Database

The system maintains a database of over 8 million Points of Interest sourced from open geographic databases — including venue locations, retail stores, cultural institutions, and sports facilities. POIs are mapped to Rascasse's city and region taxonomy, enabling spatial analysis that connects digital behavior with physical-world presence.

Outlier Detection

Not every geographic data point is meaningful. The system employs neighbor-validation to distinguish genuine local trends from data artifacts: a city showing unusually high affinity is validated against its neighboring cities and region-level patterns. Isolated spikes without regional corroboration are flagged as potential artifacts rather than reported as insights.

Section 9

Share of Search

Share of Search measures a brand's proportion of total branded search volume within a defined competitive set. First formally proposed by Les Binet at IPA EffWorks Global in 202017, the metric has since been validated as a reliable proxy for market share.

The IPA Think Tank, led by James Hankins, analyzed 30 studies across 12 categories and 7 countries, finding that Share of Search represents approximately 83% of market share variation.18 Critically, changes in Share of Search tend to precede changes in actual market share, making it a leading indicator for competitive dynamics.

Rascasse Implementation

Flexible category definition: Competitive sets are defined per use case — not restricted to pre-built taxonomies. A “premium automotive” set in Germany may differ from the same concept in the United States.
Monthly tracking: Share of Search is calculated monthly with year-over-year comparisons, enabling trend detection beyond seasonal noise.
Country-level granularity: Each market is analyzed independently, reflecting that competitive dynamics vary by geography.
Normalization: Raw search volumes are normalized to account for seasonal fluctuations and category-level growth or decline.

Academic Validation

Share of Search builds on the broader insight from Choi and Varian (2012) that search query volumes contain predictive information about real-world economic activity. Binet's contribution was to formalize this for brand-level competitive analysis — moving it from economic forecasting to marketing strategy.

Section 10

Validation & Limitations

Validation Framework

Rascasse employs multiple validation mechanisms to ensure output quality:

Cross-source consistency: Data points must be confirmed across at least two independent sources before being reported with high confidence. Single-source data points are flagged accordingly.
Temporal stability: Results are validated over time series to filter out single-month noise. Sudden, uncorroborated shifts trigger review rather than automatic reporting.
Benchmark against public data: Profiles are periodically benchmarked against externally available data — TV ratings, published sales figures, census demographics, and publicly available market research results.
Quality Factor scoring: Each data point carries a QualFactor that reflects the consistency and breadth of its underlying data. Low-QualFactor data points are flagged in the user interface.

Known Limitations

We believe transparency about limitations is essential to methodological credibility. The following are known constraints of our approach:

Digital Divide

Our data reflects online populations. Demographic segments with low digital presence — older populations in developing markets, communities with limited internet access — may be underrepresented in our profiles. We do not extrapolate to offline populations without explicit caveat.

Source Coverage

Not all digital sources provide equal data access. Coverage varies by source and region. In markets where dominant services restrict public data access, our source diversity is reduced, and confidence intervals widen accordingly.

Demographic Estimation

Demographics are inferred, not directly observed. Confidence varies by data point type and data availability. Data points with strong channel-specific engagement patterns yield more reliable demographic estimates than data points with limited or uniform channel presence.

Temporal Resolution

Most data points are updated monthly or quarterly, not in real-time. This design is intentional — it prioritizes stability and validation over immediacy. For use cases requiring real-time updates, we recommend complementing Rascasse data with channel-specific monitoring tools.

For a complete list of academic references underpinning this methodology, see our Bibliography.