Building Null Fake: Why We Made It Open Source
December 30, 2024 • Null Fake Team
We got burned by a fake review scam in 2023. Bought a "highly rated" kitchen appliance that broke after two weeks. The 4.8-star rating was built on fake reviews. That pissed us off enough to build something about it.
Why Existing Tools Weren't Good Enough
Fakespot and ReviewMeta exist. They're decent. But they have problems: usage limits (can't analyze more than 10 products per day on free tiers), no transparency (black box algorithms), and they miss sophisticated fakes.
We wanted something unlimited, open source, and more aggressive at catching manipulation. If that meant more false positives, fine. Better to warn about a good product than miss a scam.
Also, we're developers. Building stuff is what we do. This seemed like a good problem to solve.
The Tech Stack
We chose Laravel (PHP framework) because we know it well. Fast development, good ecosystem, easy deployment.
PostgreSQL for data storage. We need to store review data, analysis results, and cache responses. Postgres handles JSON well, which is useful for storing review metadata.
Python with scikit-learn for machine learning. We run linguistic analysis on review text. Python's NLP libraries are better than PHP's.
OpenAI API for advanced language analysis. Their models are good at detecting AI-generated text and analyzing sentiment. Costs money, but worth it for accuracy.
The entire stack runs on a single VPS (for now). If we need to scale, we'll add queue workers and load balancing. But 500 analyses per day fits on one server easily.
The Scraping Challenge
Amazon doesn't have an official API for reviews. We have to scrape their website. That's technically allowed (public data) but practically difficult (they have bot detection).
We use rotating proxies and randomized request timing. We also cache aggressively. Once we've analyzed a product, we cache the result for 24 hours. No need to scrape again if someone checks the same product twice.
Scraping breaks occasionally when Amazon changes their HTML structure. We've rebuilt the scraper 3 times in 6 months. It's maintenance overhead, but manageable.
The Analysis Pipeline
When you paste an Amazon URL, here's what happens:
Step 1: We extract the ASIN (Amazon product ID) from the URL. Step 2: We scrape all reviews (up to 5,000, which covers 99% of products). Step 3: We run timing analysis (check for spikes and patterns). Step 4: We run language analysis (detect AI text and generic praise). Step 5: We sample reviewer profiles (check account age and history). Step 6: We calculate statistical anomalies (compare to our database of 40,000+ products). Step 7: We combine all signals into a weighted score. Step 8: We convert the score to a letter grade (A through F).
The entire process takes 5-15 seconds depending on how many reviews the product has.
Why We Made It Open Source
Transparency matters. If we're telling you a product has fake reviews, you should be able to see how we reached that conclusion.
Our code is on GitHub. You can read the algorithms, check the weighting, and verify we're not doing anything shady.
Open source also means community contributions. People have submitted bug fixes, improved the scraper, and suggested new detection methods. We couldn't build this alone.
Plus, we believe tools like this should be public goods. Everyone deserves access to review analysis, not just people who can afford subscriptions.
The Accuracy Problem
We've manually verified 1,000 products to test our accuracy. Results: 87% accuracy on obvious fakes, 72% on subtle manipulation, 5% false positive rate.
That false positive rate bothers us. 1 in 20 legitimate products gets flagged as suspicious. We could reduce it by being less aggressive, but we'd miss more fakes.
We chose consumer protection over precision. If you skip a good product because we flagged it, you lose a purchase. If you buy a scam because we missed it, you lose money and trust.
The trade-off favors caution.
What We've Learned
Fake review operations are more sophisticated than we expected. They use aged accounts, varied language, and timing strategies that bypass simple detection.
AI-generated reviews are the new frontier. ChatGPT makes it trivial to generate thousands of unique, plausible reviews. Detection is an arms race.
Users want simple answers. We tried showing detailed breakdowns (timing score: 65, language score: 78, etc.). People ignored it. They just want the letter grade.
Caching is essential. Without it, our server would die from scraping load. With it, we handle hundreds of analyses per day on basic hardware.
The Cost Reality
Running this isn't free. Server costs $40/month. OpenAI API costs $50-100/month depending on usage. Domain and SSL are another $20/year.
We're not making money from this (it's free, no ads, no subscriptions). We cover costs out of pocket because we think it's worth doing.
If usage grows significantly, we'll need to figure out sustainability. Maybe donations, maybe optional premium features. For now, we're keeping it completely free.
Future Plans
We want to add support for other platforms (Walmart, eBay, Etsy). The algorithms are platform-agnostic, we just need to build scrapers.
We're working on browser extensions. One-click analysis without leaving Amazon. Chrome and Firefox first, Safari if there's demand.
We're also building a public API. Other developers can integrate our analysis into their tools. Free for non-commercial use, paid tiers for businesses.
Long-term, we want to build a database of known fake review operations. Track sellers across products, identify patterns, share data with platforms and regulators.
The Limitations We Accept
We can't catch everything. Sophisticated operations that spread reviews over months, use real accounts, and write genuine-sounding text will slip through.
We can't verify product quality. A product can have authentic reviews and still be terrible. We only assess review authenticity.
We can't stop fake reviews from being posted. We can only help you identify them after the fact.
These limitations are inherent to the problem. No tool will ever be perfect. We're just trying to be good enough to be useful.
How You Can Help
Use the tool. The more products we analyze, the better our algorithms get. Each analysis adds to our database and improves detection.
Report errors. If we grade a product wrong, tell us. We investigate every report and adjust our models.
Contribute code. We're on GitHub. If you can improve the scraper, enhance the algorithms, or fix bugs, pull requests are welcome.
Spread the word. The more people use review analysis tools, the less valuable fake reviews become. If buyers can easily detect fakes, sellers stop buying them.
Why This Matters
Fake reviews cost consumers billions annually. They prop up bad products, hurt legitimate sellers, and erode trust in online shopping.
Platforms won't fix this alone. Their incentives are misaligned. They need reviews to drive sales, even if some are fake.
Consumers need tools to protect themselves. That's what we're building. Free, open, and aggressive about catching manipulation.
Try it: paste any Amazon URL and see what we find. The analysis is free, the code is open, and we're constantly improving.
The Honest Truth
This is a side project. We're not a company, we don't have investors, and we're not trying to get acquired. We built this because we needed it and thought others might too.
If it helps you avoid one scam purchase, it was worth building. If it grows into something bigger, great. If not, at least the code is out there for others to use and improve.