Our Review Analysis Methodology: How Null Fake Detects Fake Reviews

At Null Fake, we believe in transparency. Unlike closed-source review analysis tools, our methodology is open for inspection. This article details exactly how we analyze reviews, what signals we look for, and how we calculate our grades.

Open Source: Our complete codebase is available at github.com/stardothosting/nullfake. You can verify everything described in this article by examining the source code directly.

Analysis Pipeline Overview

When you submit an Amazon URL for analysis, our system executes a multi-stage pipeline. Each stage contributes to the final authenticity assessment:

Data extraction from Amazon product and review pages
Timing pattern analysis across all reviews
Natural Language Processing (NLP) for content analysis
Reviewer behavior pattern detection
Statistical anomaly detection
AI-powered synthesis and scoring
Grade calculation and explanation generation

Stage 1: Data Extraction

We extract comprehensive data from each product:

Product Metadata

Product title, ASIN, and category
Current price and price history
Overall rating and rating distribution (1-5 stars)
Total review count and verified purchase percentage
Seller information and fulfillment method

Review Data

For each review, we capture:

Full review text and title
Star rating and verification status
Review date and purchase date (when available)
Helpful votes count
Reviewer name and profile link
Whether review includes photos or videos
Vine Voice status

We process up to 200 reviews per product for performance optimization while maintaining statistical validity. For products with more reviews, we use stratified sampling to ensure representation across time periods and rating levels.

Stage 2: Timing Pattern Analysis

Review timing is one of our most reliable signals. We calculate several metrics:

Spike Detection Algorithm

We divide the review timeline into 7-day windows and calculate reviews per window. Then we identify statistical outliers using z-score analysis:

z-score = (reviews_in_window - mean) / standard_deviation
spike_detected = z-score > 2.0

A z-score above 2.0 indicates a review count more than 2 standard deviations above the mean — suspicious activity that warrants further investigation.

Clustering Coefficient

We measure how "bunched" reviews are compared to expected random distribution. High clustering (>0.7) suggests coordinated campaigns; low clustering (<0.3) suggests organic posting patterns.

Day-of-Week Analysis

Real reviews distribute relatively evenly across weekdays. Automated posting often shows strong day-of-week preferences (e.g., 60% of reviews posted on Mondays).

Purchase-to-Review Timing

For verified purchases, we analyze time between purchase and review. Legitimate reviewers typically take 7-14 days. Reviews posted within 24-48 hours of purchase are flagged as potentially suspicious.

Stage 3: Natural Language Processing

Our NLP analysis examines multiple linguistic dimensions:

Vocabulary Diversity

We calculate Type-Token Ratio (TTR): the number of unique words divided by total words. AI-generated and templated reviews typically have lower TTR than genuine human reviews.

TTR = unique_words / total_words
Suspicious threshold: TTR < 0.4 for reviews > 50 words

Sentence Structure Analysis

We analyze sentence length variance and structure patterns. AI-generated content often has unnaturally consistent sentence lengths and follows predictable patterns (introduction → body → conclusion).

AI Detection Markers

We look for phrases strongly associated with AI-generated content:

"I recently purchased" (common ChatGPT opener)
"In conclusion" / "To sum up" (AI summation patterns)
"Exceeded my expectations" (generic AI praise)
Excessive hedge words: "overall," "generally," "typically"
Perfect grammar with zero typos in long reviews

Specificity Scoring

Genuine reviews mention specific details: exact measurements, particular features, unique use cases. We score specificity by detecting:

Numbers and measurements
Product-specific feature mentions
Comparative references to other products
Personal usage scenarios

Sentiment Authenticity

Real reviews show varied emotional expression. AI reviews tend toward neutral, "corporate" language. We analyze emotional authenticity using sentiment intensity scoring.

Stage 4: Reviewer Behavior Analysis

We sample reviewer profiles to identify suspicious patterns:

Account Age and Activity

New accounts (<3 months) with immediate review activity
Burst posting patterns (multiple reviews same day)
Single-category reviewers

Rating Distribution

Real reviewers have varied ratings. Reviewers with 100% 5-star reviews or 100% reviews for one brand are flagged.

Cross-Product Patterns

We check if the same reviewers appear across multiple suspicious products — a sign of review farm operations.

Stage 5: Statistical Anomaly Detection

We compare product metrics against our database of 40,000+ analyzed products:

Rating Distribution Analysis

Most legitimate products have bell-curve rating distributions. Products with J-curve distributions (overwhelmingly 5-star with very few mid-range ratings) warrant scrutiny.

Verification Rate Anomalies

Typical verified purchase rates are 60-80%. Rates above 95% may indicate discount-code manipulation; rates below 40% suggest review solicitation from non-purchasers.

Review Count vs. Product Age

We calculate expected review velocity based on product category and age. Products significantly exceeding expected review rates are flagged.

Stage 6: AI-Powered Synthesis

We use large language models (currently OpenAI's GPT-4) to synthesize findings from all previous stages. The AI examines:

Patterns across multiple signals that individually might not be conclusive
Context-specific anomalies that require semantic understanding
Review content consistency with product claims
Generation of human-readable explanations for findings

The AI doesn't make the final decision alone — it provides weighted input that's combined with statistical measures in our scoring algorithm.

Stage 7: Grade Calculation

We combine all signals using weighted scoring:

Signal Category	Weight	Rationale
Timing Analysis	25%	Very reliable; hard to fake timing patterns
Language Patterns	30%	Highly reliable for AI/template detection
Reviewer Behavior	20%	Good signal but sample-limited
Statistical Anomalies	15%	Useful but context-dependent
Verification Rate	10%	Weakest signal; easily manipulated

Final Grade Mapping

Composite scores map to letter grades:

Grade A (90-100): High confidence in review authenticity
Grade B (80-89): Generally authentic with minor concerns
Grade C (70-79): Mixed signals; review with caution
Grade D (60-69): Significant authenticity concerns
Grade F (0-59): High probability of manipulation

Accuracy Metrics

We've validated our system against 1,000 manually-verified products:

Obvious manipulation detection: 87% accuracy
Subtle manipulation detection: 72% accuracy
False positive rate: ~5% (legitimate products flagged as suspicious)

We intentionally err toward caution — accepting higher false positives to minimize missed detections. Our philosophy: better to warn about a legitimate product than miss a scam.

Known Limitations

We're transparent about what we can't catch:

Sophisticated slow campaigns: Reviews spread gradually over months with natural-looking timing
Human-edited AI reviews: AI content manually refined to add specificity and personality
Legitimate viral spikes: Products that go viral on TikTok may show patterns similar to manipulation
New seller accounts: We have limited data for very new sellers

Continuous Improvement

Our methodology evolves as manipulation tactics change:

Monthly algorithm updates based on new patterns
User feedback integration for reported errors
Database expansion with each new analysis
Open-source contributions from the community

You can track our methodology changes and contribute improvements through our GitHub repository.