Processing Overview
Submission Pipeline
All Reverseau data originates from community submissions. When someone receives a phone call or SMS, they can submit a report describing that experience. We require no registration, name, or email - submissions are entirely anonymous. This model has been in place since the platform launched in 2014, and it's deliberate: requiring identity would suppress legitimate reports from people targeted by threatening or harassing callers.
Reverseau does not independently verify the factual accuracy of individual submissions. The dataset reflects aggregated community-reported experiences, not confirmed facts about individual callers.
Each submission captures structured fields:
- Phone number - the number being reported, normalised to Australian 10-digit format on submission
- Caller type - selected from a fixed category set: Scam, Spam, Nuisance, Suspicious, Uncertain, or Legitimate (see Reporting Signal Evaluation)
- Contact channel - how the caller reached the reporter (phone call, missed call, SMS, voicemail, MMS)
- Written description - a free-text account of the interaction
- Additional context - optional fields including tactics used, expected contact, time window, harm outcomes, and legitimacy verification
Format Standardisation
Phone numbers are normalised on intake to a consistent 10-digit Australian format. The system strips spaces, dashes, parentheses, and international prefixes - so +61 2 9374 1234, (02) 9374-1234, and 0293741234 all resolve to the same record: 0293741234. This ensures every report for the same number is counted together regardless of how it was entered.
Numbers that cannot be normalised to a valid Australian format are rejected at this stage.
De-Duplication
To prevent a single source from inflating a number's report count, the system checks each incoming submission against recent submissions for the same number. If a submission shares the same origin signals as another submission within the past 72 hours, it is treated as a duplicate and discarded.
De-duplication checks combine submission metadata and content similarity analysis. The goal is to reject repeat submissions from the same source while preserving independent reports from unrelated contributors - even when those reports describe similar experiences.
Abuse Filtering
Every submission passes through automated pre-screening before entering the dataset. The screening layer checks for:
- Spam content and repetitive phrasing patterns
- Personal identifying information (names, addresses, government ID numbers)
- Profanity, hate speech, and guideline violations
- Near-duplicate content not caught by the de-duplication step
Reports that clear automated screening publish promptly. Reports flagged as potentially problematic go to human review. Human moderators have final say - automated assessments are advisory, not binding. Moderation decisions follow the published platform guidelines.
For details on how AI tools assist this screening process, see the AI Transparency Statement.
Storage & Aggregation
Accepted reports are stored as individual records linked to the reported number. The aggregate profile for each number - classification, confidence level, report count - recalculates automatically each time a new report is accepted. Aggregation applies volume, recency, and category-distribution weighting; the full rules are documented in the Reporting Signal Evaluation Framework.
A number's classification can change as the evidence base grows. A number that starts at "Low Activity" with 3 reports can reach "High Risk" once enough independent reports accumulate and consensus tightens.
AI-Assisted Summaries
Phone detail pages may include a summary paragraph describing reported activity. These summaries are generated from aggregated community report data and public telecommunications records using AI-assisted tools. They do not introduce claims beyond the underlying submissions and allocation data - they synthesise what contributors have already reported.
All AI-assisted summaries carry an explicit disclosure: "AI-assisted analysis based on community reports and public data." Summaries are regenerated periodically and reviewed before publication.
Full details: AI Transparency Statement.
Update Propagation
When a report is accepted, four things happen in sequence:
- The report is stored as an individual record
- The number's aggregate classification and rating recalculate
- The number appears on the recently updated feed
- Related aggregation pages (state, service type, prefix) reflect the updated data
Historical records remain in the permanent dataset archive. Recalculation is triggered by each new accepted report - there is no fixed batch update cycle.
- Reporting Signal Evaluation Framework - how classifications are determined from aggregated signals
- Transparency & Data Integrity - moderation, corrections, and dispute handling
- Data Limitations - scope and interpretation boundaries