PDA

View Full Version : Criterii Google de a clasifica Web spam-ul


Krumel
03-20-2008, 11:52 AM
De curand au fost niste "scurgeri" referitoare la modul in care Google clasifica si identifica web spamul.

* Long domain names
* .info, .cc, .us and other cheap, easy to grab TLDs
* Short registration period (1 year, maybe 2)
* High ratio of ad blocks to content
* Javascript redirects from initial landing pages
* Use of common, high-commercial value spam keywords like "mortgage," "poker," "texas hold 'em," "porn," "student credit cards," and related terms
* Many links to other low quality, spam sites
* Few links to high quality, trusted sites
* High keyword frequencies and keyword densities
* Small amounts of unique content
* Very few direct visits
* Very few links sent out in (non-spam) email to the site
* Registered to people/entities not associated with trusted sites
* Not frequently registered with services like Yahoo! Site Explorer, Google Webmaster Central or Live Webmaster Tools
* Rarely have short, high value domain names
* Often contain many keyword-stuffed subdomains
* More likely to have longer domain names
* More likely to contain multiple hyphens in the domain name
* Less likely to have links from trusted sources
* Less likely to have SSL Security certificates
* Less likely to be in directories like DMOZ, Yahoo!, Librarian's Internet Index, etc.
* Unlikely to have any significant quantity of branded searches
* Unlikely to be bookmarked in services like My Yahoo!, Del.icio.us, Faves.com, etc.
* Unilkely to get featured in social voting sites like Digg, Reddit, Yahoo! Buzz, StumbleUpon, etc.
* Unlikely to have channels on YouTube, communities on Facebook or links from Wikipedia
* Unlikely to be mentioned on major news sites (either with or without link attribution)
* Unlikely to register with Google/Yahoo!/MSN Local Services
* Unlikely to have a legitimate physical address/phone number on the website
* Likely to have the domain associated with emails on blacklists
* Often contain a large number of snippets of "duplicate" content found elsewhere on the web
* Unlikely to contain unique content in the form of PDFs, PPTs, XLSs, DOCs, etc.
* Frequently feature commercially focused content
* Many levels of links away from highly trusted websites
* Rarely contain privacy policy and copyright notice pages
* Rarely listed in Better Business Bureau's Online Directory
* Rarely contains high grade level text content (as measured by metrics like Fleisch-Kincaid Reading Level)
* Rarely have small snippets of text quoted on other websites and pages
* Cloaking based on user-agent or IP address is common
* Rarely contain paid analytics tracking software
* Rarely have online or offline marketing campaigns
* Rarely have affilliate link programs pointing to them
* Less likely to have .com or .org extensions
* Almost never have .mil, .edu or .gov extensions
* Rarely have links from domains with .edu or .gov extensions
* Almost never have links from domains with .mil extensions
* Rarely receive high quantities of monthly visits
* Rarely have visits lasting longer than 30 seconds
* Rarely have visitors bookmarking their domains in the browser
* Unlikely to buy significant quantities of PPC ad traffic
* Rarely have banner ad media buys
* Likely to have links to a significant portion of the sites and pages that link to them
* Extremely unlikely to be mentioned or linked-to in scientific research papers
* Unlikely to use expensive web technologies (Microsoft Server & Coding Products that Require a Licensing Fee)
* Likely to be registered by parties who own a very large number of domains
* Unlikely to attract significant return traffic
* More likely to contain malware, viruses or spyware (or any automated downloads)

Sursa (http://www.seomoz.org/blog/separating-web-spam-from-quality-content-what-are-the-metrics)
Documentul respectiv am avut ocazia sa il vad mai demult, vad ca abia acum se discuta despre el....

Discutam despre criteriile aste? Au valabilitate si pe .ro?

kmofo
03-20-2008, 12:52 PM
Despre punctele acestea:

* Unlikely to be bookmarked in services like My Yahoo!, Del.icio.us, Faves.com, etc.
* Unilkely to get featured in social voting sites like Digg, Reddit, Yahoo! Buzz, StumbleUpon, etc.
* Unlikely to have channels on YouTube, communities on Facebook or links from Wikipedia


eu as zice ca nu e chiar asa, exista chiar tehnici de promovare a paginilor-spam prin social bookmarking.

Si as mai adauga:

*Bounce rate mare pentru vizitele care vin din serps.

kmofo
03-20-2008, 02:11 PM
Inca ceva: circula un document ce se zice ca ar fi un fel de ghid intern pentru evaluarea calitatii unui website pentru membrii grupului Google EWOQ.
L-am pus aici: http://rapidshare.com/files/100926477/google_internal_memo_quality-rater-guidelines.pdf.html .

Mi-am arucat doar un ochi pe el - e datat aprilie 2007 si pare real.

La capitolul WebSpam Guidelines documentul metioneaza:
1) PPC Pages
2) Parked domains
3) Thin Affiliates
4) Hidden text and hidden links
5) JavaScript redirects
6) Keyword stuffing (inclusiv in URL)
7) 100% frame
8) Sneaky redirects

Krumel
03-20-2008, 09:31 PM
Pai cred ca lista aia e facuta dupa WebSpam Guidelines, e o estimare...