|
|
|
|
This is just a basic MediaWiki Scraper, just pulling out all readable strings in “p” tags. Since MediaWiki disallows scrapers, I used Mechanize.
Usage: python mediaWikiScrape.py http://en.wikipedia.org/wiki/High-level_programming_language
Download mediaWikiScrape.py
Growing up, I always made the assumption that people were fairly intelligent and able to make good decisions. This was my non-media, friends of the family bias. I realized that people made a decision and followed through with it, based on some form of decision making. My observations proved that people were intelligent.
As I began to mature, I noticed that people had extreme views on certain issues. I didn’t understand why people got so angry when they talked about these things. I began noticing that the media (printed, radio, TV) played a huge factor in people’s lives. I was able to concur that people who like “X” flocked together. I noticed that “X” was often a very unappealing subject, or at least in my small mind. The bad thing about “X” was that “X” could be something very stupid, racist, hateful and so on. The most amazing thing about “X” was that it created groups and subcultures.
Past my teenage years, I really started to see the truth about “X”. I noticed that I was able to draw many conclusions about the masses. I hypothesized that the more shallow “X” is, the more are drawn to it. This hypothesis contradicts my child-hood belief in my conceived notion of intelligence. The media speaks to the world, and the masses are led by an “X”.
Somebody once said, “Nobody ever went broke underestimating the stupidity of the American People” *or something to that tune*. The internet has given every individual the ability to be their own media outlet. There are billions and billions *Carl Sagan* dollars to be made from the many “X”s. Just turn on the TV at prime-time, and get out a notebook. You’ll be amazed at your findings.
When you sign up for a website, there’s a good change that you need to validate your email account. I have a junk email address that I use for such purposes. Using the alternate email saves me from spam, but I still need to physically log into the email account and grab the confirmation. I’ve automated this function, minus the link click. The following code will log into your gmail account, and grab your confirmation link. Please note that you can change the folder to junk instead of the inbox, you may run into that problem.
Requires:
Sample:
>>> from EmailSearch import getConfirmationLink
>>> getConfirmationLink(‘google@gmail.com’, ‘password’, ‘blackcodeseo.com’)
u’http://www.blackcodeseo.com/validate.php?id=218393923′
I’ve started a forum: Black Hat SEO Forum
It’s open to the public, please become a member and share your knowledge.
Update: I patched this code on 3-24-2010
Remember a few years ago when google had an API allowing for searches from within an application? Then, they decided to ditch the project. I wrote an implementation of the old google search api, but with one modification, I put no limitation on the results. Thank you, Andy Pavlo for your help with this project.
You’ll need:
Download Google Search API For Python
Sample implementation:
>>> from Google import Google, search
>>> results = search(‘blackcodeseo.com’, 3)
>>> for result in results:
… print ‘Title: %s’ % (result.title())
… print ‘Url: %s’ % (result.url())
… print ‘Description: %s’ % (result.description())
… print
…
Title: Black Code SEO
Url: http://blackcodeseo.com/
Description: Oct 29, 2008 … Black Code SEO. Programatically Automating SEO … Download BlackCodeSeo Navigator. Run python setup.py install …
Title: Have A Question?
Url: http://blackcodeseo.com/have-a-question/
Description: If you have any questions about anything, you can reach me at matt@blackcodeseo. com and I will be happy to reply. Your questions may be posted on the site …
Title: SpiderMonkey « Didier Stevens
Url: http://blog.didierstevens.com/programs/spidermonkey/
Description: The exact post is http://blackcodeseo.com/python-spidermonkey-navigator/. Comment by Matt — Wednesday 29 October 2008 @ 20:56. Thanks. …
>>>
I’ve implemented a very simple automated comment poster. If you don’t make seemingly useful comments, don’t plan on getting too far, as most people moderate their comments. The code WILL fail if you do not note the following:
At the bottom of the file you’ll see a few variables that you’ll need to set.
blogUrl = “”"HTTP://WWW.YOUR_URL.COM”"”
Your blogUrl should be in the http://DOMAIN.com format.
keyword = “”"YOUR KEYWORD”"”
You need to change that to the keyword that you want. This will find all wp blogs similar to yours.
results = 50
You can turn this up to 100 or more.
{‘author’ : “”"YOUR AUTHOR NAME”"”, ‘email’ : “”"YOUR@EMAIL.COM”"”, ‘comment’ : “”"YOUR COMMENT”"”},
This line can be edited to your liking, and duplicated as many times as you like. I would suggest making at least 20 of these, all a bit different from the last one. Your posts will NOT work if your ‘email’ isn’t in the form of blah@blah.com.
Download the script commentonwordpressblogs.py
If you are trying to do JS in python, you’re going to run into many hurdles. The python javascript project, that I know of, is python-spidermonkey. The project lacks a few important object; document, window, navigator. Fortunately, there are those of us who have added to the project, and made these modules available, at least in some capacity.
You can find the document and window objects on Didier Stevens’ blog (link below). He’s done an excellent job at porting these modules over to spidermonkey.
I’ve put together a very rudimentary navigator object. The values are hard-coded, but the object makes you appear to be running Firefox on a Mac. If you’d like to change the code in anyway, be my guest. I was unable to find a port of this object, so I created on from scratch.
I’ve included all of the configuration files that you will need to work with Didier Stevens’ objects and my own. You need to visit Didier Steven’s blog to download his code.
Go to http://wwwsearch.sourceforge.net/python-spidermonkey/ and download/untar the project
Go to http://blog.didierstevens.com/programs/spidermonkey/ and follow his instructions
Download BlackCodeSeo Navigator.
Run python setup.py install
Python-SpiderMonkey Implemeting Navigator
Update: Automatically Installing Wordpress Forum Post
Automating the install of wordpress saves a great deal of time. You can easily write a script to copy wordpress to a directory, append your vhosts, auto-generate a wp-config.php and restart apache, so I won’t be including that part of the code in this post. I will cover the initial install, and also the sql injection to get wordpress up and running.
Automatically Installing Wordpress Part 1
At this point, you have installed wordpress, but you’ll want to set some options. This next example shows some examples of how you can manipulate the base installation of wordpress.
Automatically Installing Wordpress Part 2
Programatically posting to wordpress saves a TON of time, no more outsourcing. My prefered method of posting to wordpress is via xml-rpc, NOT sql injection. Michele Ferretti has developed an excellent library for controlling wordpress in such a matter. Wordpresslib allows you to; publish new posts, edit old posts, publish draft posts, delete posts, change post categories, get blog and user informations, upload multimedia files like movies or photos, get last recents post, get last post, get Trackbacks of posts, get Pingbacks of posts.
Automatically Posting To Wordpress
A quick and easy way to create content is syndication. Rss is among the most popular methods of syndication and the best part about Rss is that you have full control over the content that you are “collecting”. Mark Pilgrim developed a nice rss parser to use with python; Universal Feed Parser.
Let’s “subscribe” to 10 rss feeds from google’s blog search, with our target keyword being “python”.
At this point we can grab a part of speech to replace with our keyword. Doing so will build up our likely hood of dominating the SERPS. This would be a good point to point out a few things.
1) Be frugal with the use your keyword(s).
2) Randomizing the order of the content that you’ve scraped makes it more difficult to flag you as a “scraper”.
3) Change the titles! Bloggers are ALWAYS reporting “scrapers”, so mix it up a bit.
4) When you parse URL feeds, use relevant topics, if you are tying to dominate “pay day loans”, don’t search for blogs on “potato salad”. Try to stay on point with your blog searching, I realize that you are going to be collecting lots of feeds. There is a finite number of blogs, seriously!
For the next chunk of code, you’ll need
Substituting Nouns With A Keyword With WordNet