Key Word Filtering
The Problem With Simple Keyword Filtering
This point alone brings up many questions that ought to be explored before “building”. Blocking the word sex is a “bad idea” This is called “keyword blocking” and over blocks beyond a tolerable level for a quality filter.
For example “Sussex England” would no longer be available ( suSEX ) as well as sex education, sex abuse, etc. etc , Similarly, “anal” would eliminate analysis (ANALysis), breast eliminates “BREAST” cancer, keeping abreast aBREAST, etc.
Lets look at “BREAST” in detail:
Statistical Research courtesy of Michael Nellis
An April 4, 2004 Google search on the word breast reveals……
CategorySearch TermsHitsBaselinebreast20,800,000Medical Healthbreast + cancer4,630,000breast + feeding2,380,000breastfeeding1,880,000breast + pump879,000breast + exam658,000breast + reduction1,270,000breast + implants836,000breast + enhancement789,000Swimmingbreast + stroke1,480,000Cookingbreast + chicken2,060,000Fashionsbreast + single + double + suit98,300Educationbreastworks13,700breastplates29,400
That comes to 17,003,400 hits, 81.75 percent of the baseline, for material that cannot possibly have anything to do with pornography.
See the problem ?
Also notice at the time of this writing the google ad on this site is not being served. The code is there, but no add is available. This is probably because they are using “Key Word Blocking” and all the nasty references to breasts prohibit them from serving an ad.
On the editer they serve a public service ad which is what they do when “no ad is available. That is what is wrong with key word blocking.
So what works?
Here is the concept we have evolved. This is what works the best at this time for pornography.
Something has to do something to something.
That is the essence of the phrase filter.
And example filter phrase would be:
[blond,redhead,dog,gay man,girl][sucks,fucks,licks][a big][cock,dong,penis]
This line specifies a whole array of phrases to block.
Craft the permutations and combinations of these elements and you have a porn “analytical phrase filter”. And if a page has these elements you are very likely correct about the nature of the page.
This concept seems to work the best.
