Methodology

Community Reporting & Processing Model

How reports are submitted, normalised, de-duplicated, and stored within the Reverseau dataset.

Last Updated:
← Methodology Overview

Processing Overview

Community Report Processing Workflow
Each submission passes through this sequence before appearing in the public dataset.
01 Anonymous Submission
02 Format Standardisation
03 De-Duplication & Abuse Filtering
04 Structured Storage & Aggregation
05 System-Wide Propagation
Reports that fail screening at any stage are held or discarded and do not enter the public dataset.

Submission Pipeline

All Reverseau data originates from community submissions. When someone receives a phone call or SMS, they can submit a report describing that experience. We require no registration, name, or email - submissions are entirely anonymous. This model has been in place since the platform launched in 2014, and it's deliberate: requiring identity would suppress legitimate reports from people targeted by threatening or harassing callers.

Reverseau does not independently verify the factual accuracy of individual submissions. The dataset reflects aggregated community-reported experiences, not confirmed facts about individual callers.

Each submission captures structured fields:

Anonymous by design. No account, email, or identity is required to submit a report. This removes a significant barrier for people who want to warn others but don't want to be identifiable.

Format Standardisation

Phone numbers are normalised on intake to a consistent 10-digit Australian format. The system strips spaces, dashes, parentheses, and international prefixes - so +61 2 9374 1234, (02) 9374-1234, and 0293741234 all resolve to the same record: 0293741234. This ensures every report for the same number is counted together regardless of how it was entered.

Numbers that cannot be normalised to a valid Australian format are rejected at this stage.

De-Duplication

To prevent a single source from inflating a number's report count, the system checks each incoming submission against recent submissions for the same number. If a submission shares the same origin signals as another submission within the past 72 hours, it is treated as a duplicate and discarded.

De-duplication checks combine submission metadata and content similarity analysis. The goal is to reject repeat submissions from the same source while preserving independent reports from unrelated contributors - even when those reports describe similar experiences.

Why 72 hours? Short enough to stop a coordinated flood of identical reports; long enough that a genuine repeat caller (who calls the same person multiple times in a week) generates only one report per incident, keeping the dataset representative.

Abuse Filtering

Every submission passes through automated pre-screening before entering the dataset. The screening layer checks for:

Reports that clear automated screening publish promptly. Reports flagged as potentially problematic go to human review. Human moderators have final say - automated assessments are advisory, not binding. Moderation decisions follow the published platform guidelines.

For details on how AI tools assist this screening process, see the AI Transparency Statement.

Storage & Aggregation

Accepted reports are stored as individual records linked to the reported number. The aggregate profile for each number - classification, confidence level, report count - recalculates automatically each time a new report is accepted. Aggregation applies volume, recency, and category-distribution weighting; the full rules are documented in the Reporting Signal Evaluation Framework.

A number's classification can change as the evidence base grows. A number that starts at "Low Activity" with 3 reports can reach "High Risk" once enough independent reports accumulate and consensus tightens.

Allocation data stays separate. Telecommunications carrier and service-type metadata comes from ACMA allocation records, not community reports. It describes what the number was allocated for - not who currently holds or uses it.

AI-Assisted Summaries

Phone detail pages may include a summary paragraph describing reported activity. These summaries are generated from aggregated community report data and public telecommunications records using AI-assisted tools. They do not introduce claims beyond the underlying submissions and allocation data - they synthesise what contributors have already reported.

All AI-assisted summaries carry an explicit disclosure: "AI-assisted analysis based on community reports and public data." Summaries are regenerated periodically and reviewed before publication.

Full details: AI Transparency Statement.

Update Propagation

When a report is accepted, four things happen in sequence:

  1. The report is stored as an individual record
  2. The number's aggregate classification and rating recalculate
  3. The number appears on the recently updated feed
  4. Related aggregation pages (state, service type, prefix) reflect the updated data

Historical records remain in the permanent dataset archive. Recalculation is triggered by each new accepted report - there is no fixed batch update cycle.

Related Documentation