Community Reporting & Processing Model

Processing Overview

Community Report Processing Workflow

Each submission passes through this sequence before appearing in the public dataset.

01 Anonymous Submission

02 Format Standardisation

03 De-Duplication & Abuse Filtering

04 Structured Storage & Aggregation

05 System-Wide Propagation

Reports that fail screening at any stage are held or discarded and do not enter the public dataset.

Submission Pipeline

All Reverseau data originates from community submissions. When someone receives a phone call or SMS, they can submit a report describing that experience. We require no registration, name, or email - submissions are entirely anonymous. This model has been in place since the platform launched in 2014, and it's deliberate: requiring identity would suppress legitimate reports from people targeted by threatening or harassing callers.

Reverseau does not independently verify the factual accuracy of individual submissions. The dataset reflects aggregated community-reported experiences, not confirmed facts about individual callers.

Each submission captures structured fields:

Phone number - the number being reported, normalised to Australian 10-digit format on submission
Caller type - selected from a fixed category set: Scam, Spam, Nuisance, Suspicious, Uncertain, or Legitimate (see Reporting Signal Evaluation)
Contact channel - how the caller reached the reporter (phone call, missed call, SMS, voicemail, MMS)
Written description - a free-text account of the interaction
Additional context - optional fields including tactics used, expected contact, time window, harm outcomes, and legitimacy verification

Anonymous by design. No account, email, or identity is required to submit a report. This removes a significant barrier for people who want to warn others but don't want to be identifiable.

Format Standardisation

Phone numbers are normalised on intake to a consistent 10-digit Australian format. The system strips spaces, dashes, parentheses, and international prefixes - so +61 2 9374 1234, (02) 9374-1234, and 0293741234 all resolve to the same record: 0293741234. This ensures every report for the same number is counted together regardless of how it was entered.

Numbers that cannot be normalised to a valid Australian format are rejected at this stage.

De-Duplication

To prevent a single source from inflating a number's report count, the system checks each incoming submission against recent submissions for the same number. If a submission shares the same origin signals as another submission within the past 72 hours, it is treated as a duplicate and discarded.

De-duplication checks combine submission metadata and content similarity analysis. The goal is to reject repeat submissions from the same source while preserving independent reports from unrelated contributors - even when those reports describe similar experiences.

Why 72 hours? Short enough to stop a coordinated flood of identical reports; long enough that a genuine repeat caller (who calls the same person multiple times in a week) generates only one report per incident, keeping the dataset representative.

Abuse Filtering

Every submission passes through automated pre-screening before entering the dataset. The screening layer checks for:

Spam content and repetitive phrasing patterns
Personal identifying information (names, addresses, government ID numbers)
Profanity, hate speech, and guideline violations
Near-duplicate content not caught by the de-duplication step

Reports that clear automated screening publish promptly. Reports flagged as potentially problematic go to human review. Human moderators have final say - automated assessments are advisory, not binding. Moderation decisions follow the published platform guidelines.

For details on how AI tools assist this screening process, see the AI Transparency Statement.

Storage & Aggregation

Accepted reports are stored as individual records linked to the reported number. The aggregate profile for each number - classification, confidence level, report count - recalculates automatically each time a new report is accepted. Aggregation applies volume, recency, and category-distribution weighting; the full rules are documented in the Reporting Signal Evaluation Framework.

A number's classification can change as the evidence base grows. A number that starts at "Low Activity" with 3 reports can reach "High Risk" once enough independent reports accumulate and consensus tightens.

Allocation data stays separate. Telecommunications carrier and service-type metadata comes from ACMA allocation records, not community reports. It describes what the number was allocated for - not who currently holds or uses it.

AI-Assisted Summaries

Phone detail pages may include a summary paragraph describing reported activity. These summaries are generated from aggregated community report data and public telecommunications records using AI-assisted tools. They do not introduce claims beyond the underlying submissions and allocation data - they synthesise what contributors have already reported.

All AI-assisted summaries carry an explicit disclosure: "AI-assisted analysis based on community reports and public data." Summaries are regenerated periodically and reviewed before publication.

Full details: AI Transparency Statement.

Update Propagation

When a report is accepted, four things happen in sequence:

The report is stored as an individual record
The number's aggregate classification and rating recalculate
The number appears on the recently updated feed
Related aggregation pages (state, service type, prefix) reflect the updated data

Historical records remain in the permanent dataset archive. Recalculation is triggered by each new accepted report - there is no fixed batch update cycle.