← Back to Blog

Our Review Analysis Methodology: How Null Fake Detects Fake Reviews

January 7, 2026 • 12 min read

Last updated: January 10, 2026

Our Review Analysis Methodology: How Null Fake Detects Fake Reviews

At Null Fake, we believe in transparency. Unlike closed-source review analysis tools, our methodology is open for inspection. This article details exactly how we analyze reviews, what signals we look for, and how we calculate our grades.

Open Source: Our complete codebase is available at github.com/stardothosting/nullfake. You can verify everything described in this article by examining the source code directly.

Analysis Pipeline Overview

When you submit an Amazon URL for analysis, our system executes a multi-stage pipeline. Each stage contributes to the final authenticity assessment:

  1. Data extraction from Amazon product and review pages
  2. Timing pattern analysis across all reviews
  3. Natural Language Processing (NLP) for content analysis
  4. Reviewer behavior pattern detection
  5. Statistical anomaly detection
  6. AI-powered synthesis and scoring
  7. Grade calculation and explanation generation

Stage 1: Data Extraction

We extract comprehensive data from each product:

Product Metadata

  • Product title, ASIN, and category
  • Current price and price history
  • Overall rating and rating distribution (1-5 stars)
  • Total review count and verified purchase percentage
  • Seller information and fulfillment method

Review Data

For each review, we capture:

  • Full review text and title
  • Star rating and verification status
  • Review date and purchase date (when available)
  • Helpful votes count
  • Reviewer name and profile link
  • Whether review includes photos or videos
  • Vine Voice status

We process up to 200 reviews per product for performance optimization while maintaining statistical validity. For products with more reviews, we use stratified sampling to ensure representation across time periods and rating levels.

Stage 2: Timing Pattern Analysis

Review timing is one of our most reliable signals. We calculate several metrics:

Spike Detection Algorithm

We divide the review timeline into 7-day windows and calculate reviews per window. Then we identify statistical outliers using z-score analysis:

z-score = (reviews_in_window - mean) / standard_deviation
spike_detected = z-score > 2.0

A z-score above 2.0 indicates a review count more than 2 standard deviations above the mean — suspicious activity that warrants further investigation.

Clustering Coefficient

We measure how "bunched" reviews are compared to expected random distribution. High clustering (>0.7) suggests coordinated campaigns; low clustering (<0.3) suggests organic posting patterns.

Day-of-Week Analysis

Real reviews distribute relatively evenly across weekdays. Automated posting often shows strong day-of-week preferences (e.g., 60% of reviews posted on Mondays).

Purchase-to-Review Timing

For verified purchases, we analyze time between purchase and review. Legitimate reviewers typically take 7-14 days. Reviews posted within 24-48 hours of purchase are flagged as potentially suspicious.

Stage 3: Natural Language Processing

Our NLP analysis examines multiple linguistic dimensions:

Vocabulary Diversity

We calculate Type-Token Ratio (TTR): the number of unique words divided by total words. AI-generated and templated reviews typically have lower TTR than genuine human reviews.

TTR = unique_words / total_words
Suspicious threshold: TTR < 0.4 for reviews > 50 words

Sentence Structure Analysis

We analyze sentence length variance and structure patterns. AI-generated content often has unnaturally consistent sentence lengths and follows predictable patterns (introduction → body → conclusion).

AI Detection Markers

We look for phrases strongly associated with AI-generated content:

  • "I recently purchased" (common ChatGPT opener)
  • "In conclusion" / "To sum up" (AI summation patterns)
  • "Exceeded my expectations" (generic AI praise)
  • Excessive hedge words: "overall," "generally," "typically"
  • Perfect grammar with zero typos in long reviews

Specificity Scoring

Genuine reviews mention specific details: exact measurements, particular features, unique use cases. We score specificity by detecting:

  • Numbers and measurements
  • Product-specific feature mentions
  • Comparative references to other products
  • Personal usage scenarios

Sentiment Authenticity

Real reviews show varied emotional expression. AI reviews tend toward neutral, "corporate" language. We analyze emotional authenticity using sentiment intensity scoring.

Stage 4: Reviewer Behavior Analysis

We sample reviewer profiles to identify suspicious patterns:

Account Age and Activity

  • New accounts (<3 months) with immediate review activity
  • Burst posting patterns (multiple reviews same day)
  • Single-category reviewers

Rating Distribution

Real reviewers have varied ratings. Reviewers with 100% 5-star reviews or 100% reviews for one brand are flagged.

Cross-Product Patterns

We check if the same reviewers appear across multiple suspicious products — a sign of review farm operations.

Stage 5: Statistical Anomaly Detection

We compare product metrics against our database of 40,000+ analyzed products:

Rating Distribution Analysis

Most legitimate products have bell-curve rating distributions. Products with J-curve distributions (overwhelmingly 5-star with very few mid-range ratings) warrant scrutiny.

Verification Rate Anomalies

Typical verified purchase rates are 60-80%. Rates above 95% may indicate discount-code manipulation; rates below 40% suggest review solicitation from non-purchasers.

Review Count vs. Product Age

We calculate expected review velocity based on product category and age. Products significantly exceeding expected review rates are flagged.

Stage 6: AI-Powered Synthesis

We use large language models (currently OpenAI's GPT-4) to synthesize findings from all previous stages. The AI examines:

  • Patterns across multiple signals that individually might not be conclusive
  • Context-specific anomalies that require semantic understanding
  • Review content consistency with product claims
  • Generation of human-readable explanations for findings

The AI doesn't make the final decision alone — it provides weighted input that's combined with statistical measures in our scoring algorithm.

Stage 7: Grade Calculation

We combine all signals using weighted scoring:

Signal Category Weight Rationale
Timing Analysis 25% Very reliable; hard to fake timing patterns
Language Patterns 30% Highly reliable for AI/template detection
Reviewer Behavior 20% Good signal but sample-limited
Statistical Anomalies 15% Useful but context-dependent
Verification Rate 10% Weakest signal; easily manipulated

Final Grade Mapping

Composite scores map to letter grades:

  • Grade A (90-100): High confidence in review authenticity
  • Grade B (80-89): Generally authentic with minor concerns
  • Grade C (70-79): Mixed signals; review with caution
  • Grade D (60-69): Significant authenticity concerns
  • Grade F (0-59): High probability of manipulation

Accuracy Metrics

We've validated our system against 1,000 manually-verified products:

  • Obvious manipulation detection: 87% accuracy
  • Subtle manipulation detection: 72% accuracy
  • False positive rate: ~5% (legitimate products flagged as suspicious)

We intentionally err toward caution — accepting higher false positives to minimize missed detections. Our philosophy: better to warn about a legitimate product than miss a scam.

Known Limitations

We're transparent about what we can't catch:

  • Sophisticated slow campaigns: Reviews spread gradually over months with natural-looking timing
  • Human-edited AI reviews: AI content manually refined to add specificity and personality
  • Legitimate viral spikes: Products that go viral on TikTok may show patterns similar to manipulation
  • New seller accounts: We have limited data for very new sellers

Continuous Improvement

Our methodology evolves as manipulation tactics change:

  • Monthly algorithm updates based on new patterns
  • User feedback integration for reported errors
  • Database expansion with each new analysis
  • Open-source contributions from the community

You can track our methodology changes and contribute improvements through our GitHub repository.

Sources & References

This article draws on the following sources for accuracy and verification:

  1. Natural Language Processing fundamentals
  2. Statistical hypothesis testing methodology
  3. Machine learning classification techniques
  4. Null Fake open-source codebase

Last updated: January 10, 2026

About the Author

DA

Derek Armitage

Founder & Lead Developer

Derek Armitage is the founder of Shift8 Web, a Toronto-based web development agency. With over 15 years of experience in software development and data analysis, Derek created Null Fake to help consumers identify fraudulent Amazon reviews. He holds expertise in machine learning, natural language processing, and web security. Derek has previously written about e-commerce fraud detection for industry publications and regularly contributes to open-source projects focused on consumer protection.

Credentials:

  • 15+ years software development experience
  • Founder of Shift8 Web (Toronto)
  • Machine learning and NLP specialist
  • Open source contributor