Section 1
Our Approach
Rascasse occupies a distinct position in the research landscape: Behavioral Audience Intelligence. We are neither a social listening platform nor a survey research provider. Instead, we systematically analyze observable digital behavior across multiple platforms to construct audience profiles grounded in what people do, not what they say.
This distinction matters. The gap between self-reported attitudes and actual behavior — the Say-Do Gap — is well-documented in both academic literature and industry practice. Choi and Varian demonstrated that digital search behavior predicts real-world economic activity more accurately than traditional survey instruments.1 Kosinski et al. showed that digital records of human behavior can predict personal attributes with remarkable accuracy.2
The market research industry itself increasingly acknowledges this challenge. At IIeX North America 2025, Qrious Insights presented findings suggesting approximately 80% error rates in self-reported media consumption data.{{methodology.toc_item_14}} Rep Data's 2025 State of Survey Fraud report analyzed 4.1 billion survey attempts and found 33% to be fraudulent and 27% inattentive — leaving roughly half of collected responses genuinely usable.{{methodology.toc_item_15}}
Our approach aligns with what the ICC/ESOMAR International Code (5th Edition, 2025) now formally recognizes: the legitimate role of the “researcher as data curator” — professionals who derive insights from existing data sources rather than generating primary data through direct participant contact.5
At ESOMAR Reimagine 2025, Heineken presented a risk framework for synthetic and imputed data. Under this framework, Rascasse's methodology classifies as “Step 1: Data Imputation” — the lowest-risk category, as it draws inferences from real behavioral signals rather than generating synthetic data.{{methodology.toc_item_17}}
Key Principles
- Multi-source triangulation: Every data point is validated against independent sources. No single platform dominates the output.
- Aggregated, non-personal data: We process behavioral patterns at population level. No individual tracking, no personal data processing — GDPR compliance by design.
- Observable signals over stated preferences: Search queries, engagement patterns, and consumption behavior provide higher-fidelity signals than self-reported survey responses.
- Transparency about uncertainty: Where data is sparse, we report insufficient data rather than imputed values.
Section 2
Data Architecture & Independence
Rascasse's data architecture is deliberately conservative. We have operated without a single platform ban, API revocation, cease-and-desist letter, or terms-of-service violation. This is not by accident — it is by design. Our architecture is built on publicly observable signals that do not require privileged API access, user authentication, or platform partnerships that can be revoked.
No Third-Party Cookie Dependency
While much of the digital advertising ecosystem faces disruption from cookie deprecation — Google's Privacy Sandbox, Safari's ITP, Firefox's ETP — Rascasse's methodology is entirely cookie-independent. We do not track individual users across websites. Our signals are aggregated behavioral patterns: search volumes, engagement metrics, and public interaction data. None of these rely on browser-level tracking mechanisms.
Zero Customer Data Required
Rascasse does not require access to client CRM systems, first-party data, customer databases, or any proprietary information. Our intelligence is derived entirely from publicly available behavioral signals. This means no data processing agreements (DPAs) beyond standard SaaS terms, no risk of commingling client data with third-party sources, no onboarding delay for data integration, and full GDPR compliance by design.
Platform Independence
Unlike competitors who depend on a single platform's API — such as the Twitter/X Decahose or Meta's Marketing API — Rascasse's multi-source architecture ensures no single platform change can disrupt our data pipeline. When platforms restrict API access, as Twitter/X did in 2023 or as Meta periodically adjusts its Marketing API, our methodology remains unaffected.
| Risk Factor | Survey-Based | Social-Graph | Behavioral (Rascasse) |
|---|---|---|---|
| Platform API dependency | Panel providers | Twitter/X Decahose | None (public signals) |
| Cookie dependency | Tracking pixels | None | None |
| Customer data required | None | None | None |
| Platform ban risk | Panel fraud risk | API revocation risk | None (no TOS violation) |
| GDPR data processing | Panel consent required | Social data consent | No personal data processed |
The ESOMAR Guideline on Passive Data Collection, Observation and Recording explicitly recognizes the legitimacy of research based on publicly observable data, provided it adheres to transparency and proportionality principles — both of which Rascasse's architecture satisfies by design.7
Section 3
Data Sources
Rascasse ingests behavioral signals from multiple independent data categories. Each category captures a distinct facet of digital behavior, and no single source dominates the final output. This multi-source approach follows the principles of data fusion as described by Ipsos MediaCT: combining independent data streams to produce estimates that no single source could deliver alone.8
| Category | What We Capture | Signal Type |
|---|---|---|
| Search Behavior | Query volumes, seasonal patterns, regional distribution | Intent signals |
| Social Platforms | Follower graphs, engagement rates, content interaction | Interest signals |
| Video & Streaming | View counts, playlist behavior, channel subscriptions | Consumption signals |
| Public Records | TV ratings, sales charts, award databases, Wikipedia | Validation signals |
| Published Primary Research | Published survey results, census data, Pew studies | Calibration signals |
| Location Data | POI databases, check-in patterns, store locator data | Spatial signals |
Each profile is built from multiple independent data sources. Signals that cannot be corroborated across at least two independent sources are flagged with reduced confidence scores.
Section 4
Data Point Profiling
A data point in Rascasse's system is any discrete cultural, commercial, or social object that generates measurable digital behavioral signals. The system currently profiles over 320,000 data points across five types: Brands, People, Events, Media, and Topics.
Data Point Construction
Each data point is defined by a curated set of search keywords, aliases, and category assignments. This curation is essential: the same surface-level query can refer to different data points (e.g., “Jaguar” the car brand vs. “Jaguar” the animal), and disambiguation requires domain expertise combined with algorithmic validation.
Data Point Size
Data Point Size is a normalized metric that combines search volume with platform engagement signals. It provides a comparable measure of a data point's overall digital footprint, enabling cross-category and cross-country comparisons. Data Point Size is type-specific: a brand is weighted differently from a person or an event, reflecting the distinct behavioral patterns each type generates.
Quality Factor (QualFactor)
Each data point carries a QualFactor score derived from cross-validation between search-based signals and platform engagement signals. High QualFactor indicates consistent signals across independent sources; low QualFactor triggers manual review or data enrichment.
Data point profiling covers 172 countries. New data points can be onboarded within days, not months — a significant advantage over survey-based systems that require new questionnaire design and fieldwork for each addition.
Section 5
Audience Construction
Audiences in Rascasse are constructed from data points using domain-expert curation — not algorithmic clustering. This deliberate design choice ensures semantic coherence: a “Premium Automotive Enthusiasts” audience is built by experts who understand which brands, media properties, events, and influencers define that segment.
Single-Data-Point Audiences
The simplest audience type centers on a single data point. “Fans of the Dallas Cowboys” captures all digital behavioral signals associated with the Dallas Cowboys — search patterns, social engagement, content consumption, and related brand affinities.
Multi-Data-Point Audiences
Complex audiences combine multiple data points using Boolean logic (AND, OR, NOT). For example, a “Sustainable Fashion” audience might combine sustainability-focused brands, ethical fashion media, and relevant influencers — while excluding fast-fashion brands.
Weighted Aggregation
When constructing multi-data-point audiences, component data points are weighted by relevance. An “American Hip-Hop Fans” audience might weight artists more heavily than media outlets, reflecting the stronger behavioral signal that artist engagement provides.
Unlike survey-based platforms where researchers must define audiences via questionnaire logic, or social listening tools that rely on keyword matching in conversations, Rascasse audiences are built by domain experts who understand the semantic relationships between brands, people, and properties. This produces more nuanced and culturally accurate segments.
Section 6
Demographic Modeling
Demographics are not directly observable from search data. Instead, we employ a multi-signal estimation approach that combines several independent demographic indicators into a composite estimate. Each signal contributes a piece of evidence; the final demographic profile emerges from the convergence of these independent signals.
The Bayesian framework underpinning signals 2 and 5 follows established methods in marketing science, as described by Rossi, Allenby and McCulloch (2005)9 and applied to media mix modeling by Google Research (2017).10
The visual demographic analysis component builds on the DEX (Deep EXpectation) architecture for apparent age estimation from facial images11 and broader work on machine-learning-based demographic detection from social media.12
Platform-specific demographic distributions are calibrated against Pew Research Center's ongoing studies of social media usage patterns across demographic groups.13
Demographic estimates carry inherent uncertainty. We report confidence intervals and flag data points where demographic signals are sparse. Where insufficient data exists to produce a reliable estimate, we display “insufficient data” rather than imputed values. This transparency is fundamental to our methodology: we prefer accuracy over coverage.
Section 7
Affinity & Psychographic Modeling
Affinity Scores
Affinity measures the relative strength of the connection between an audience and a brand, person, or property. The baseline is 1.0, representing the market average. An affinity score above 1.0 indicates above-average interest; below 1.0 indicates below-average interest. This index-based approach — common in media planning — enables direct comparison across data points and audiences.
Affinity computation draws on techniques from collaborative filtering and matrix factorization, as described by Koren, Bell and Volinsky (2009) in the context of recommender systems.14 The core insight: co-occurrence patterns across behavioral signals reveal latent preferences that individual signals alone cannot capture.
Psychographic Profiling (28 Traits)
Rascasse estimates 28 psychographic traits for each audience, organized around dimensions such as sustainability orientation, technology adoption, luxury affinity, health consciousness, and cultural engagement.
Each trait is scored through marker data points: brands, people, and properties that serve as strong indicators of a particular psychographic dimension. For example, the “Sustainability” trait draws on engagement patterns with brands like Patagonia, media about climate change, and events focused on environmental topics. The trait score represents how much an audience over- or under-indexes on these marker data points relative to the general population.
This approach is informed by research on predicting psychological traits from digital behavior2 and the Schwartz Values Framework, which provides a theoretically grounded taxonomy of human values.15 Boyd et al. (2015) demonstrated that value orientations can be reliably inferred from digital behavior patterns.16
All psychographic scores are normalized against the market average. An audience with a sustainability score of 1.4 is 40% more sustainability-oriented than the general population — not “highly sustainable” in absolute terms. This relative framing prevents overclaiming.
Section 8
Location Intelligence
Rascasse provides location-level intelligence across 172 countries, 100,000+ cities, and 250,000+ postal codes. Location data is derived from the geographic distribution of search behavior combined with spatial analysis of points of interest (POIs).
Location Affinity
Location Affinity measures how strongly a brand or property resonates in a specific geography relative to the national average. It combines search volume distribution with interest patterns, following the regional analysis methodology first described by Choi and Varian (2012).1
Points of Interest (POI) Database
The system maintains a database of over 8 million Points of Interest sourced from open geographic databases — including venue locations, retail stores, cultural institutions, and sports facilities. POIs are mapped to Rascasse's city and region taxonomy, enabling spatial analysis that connects digital behavioral signals with physical-world presence.
Outlier Detection
Not every geographic signal is meaningful. The system employs neighbor-validation to distinguish genuine local trends from data artifacts: a city showing unusually high affinity is validated against its neighboring cities and region-level patterns. Isolated spikes without regional corroboration are flagged as potential artifacts rather than reported as insights.
Section 9
Share of Search
Share of Search measures a brand's proportion of total branded search volume within a defined competitive set. First formally proposed by Les Binet at IPA EffWorks Global in 202017, the metric has since been validated as a reliable proxy for market share.
The IPA Think Tank, led by James Hankins, analyzed 30 studies across 12 categories and 7 countries, finding that Share of Search represents approximately 83% of market share variation.18 Critically, changes in Share of Search tend to precede changes in actual market share, making it a leading indicator for competitive dynamics.
Rascasse Implementation
- Flexible category definition: Competitive sets are defined per use case — not restricted to pre-built taxonomies. A “premium automotive” set in Germany may differ from the same concept in the United States.
- Monthly tracking: Share of Search is calculated monthly with year-over-year comparisons, enabling trend detection beyond seasonal noise.
- Country-level granularity: Each market is analyzed independently, reflecting that competitive dynamics vary by geography.
- Normalization: Raw search volumes are normalized to account for seasonal fluctuations and category-level growth or decline.
Share of Search builds on the broader insight from Choi and Varian (2012) that search query volumes contain predictive information about real-world economic activity. Binet's contribution was to formalize this for brand-level competitive analysis — moving it from economic forecasting to marketing strategy.
Section 10
Validation & Limitations
Validation Framework
Rascasse employs multiple validation mechanisms to ensure output quality:
- Cross-platform consistency: Signals must be confirmed across at least two independent data sources before being reported with high confidence. Single-source signals are flagged accordingly.
- Temporal stability: Results are validated over time series to filter out single-month noise. Sudden, uncorroborated shifts trigger review rather than automatic reporting.
- Benchmark against public data: Profiles are periodically benchmarked against externally available data — TV ratings, published sales figures, census demographics, and publicly available market research results.
- Quality Factor scoring: Each data point carries a QualFactor that reflects the consistency and breadth of its underlying data. Low-QualFactor data points are flagged in the user interface.
Known Limitations
We believe transparency about limitations is essential to methodological credibility. The following are known constraints of our approach:
Our data reflects online populations. Demographic segments with low digital presence — older populations in developing markets, communities with limited internet access — may be underrepresented in our profiles. We do not extrapolate to offline populations without explicit caveat.
Not all digital platforms provide equal data access. Coverage varies by platform and region. In markets where dominant platforms restrict public data access, our signal diversity is reduced, and confidence intervals widen accordingly.
Demographics are inferred, not directly observed. Confidence varies by data point type and data availability. Data points with strong platform-specific engagement patterns yield more reliable demographic estimates than data points with limited or uniform platform presence.
Most data points are updated monthly or quarterly, not in real-time. This design is intentional — it prioritizes stability and validation over immediacy. For use cases requiring real-time signals, we recommend complementing Rascasse data with platform-specific monitoring tools.
For a complete list of academic references underpinning this methodology, see our Bibliography.