• Stars
    star
    273
  • Rank 149,849 (Top 3 %)
  • Language
    Ruby
  • License
    MIT License
  • Created over 14 years ago
  • Updated almost 7 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Easy way to retrieve Google Page Rank, Alexa Rank, index counts, and backlink counts

PageRankr Build Status

Provides an easy way to retrieve Google Page Rank, Alexa Rank, backlink counts, index counts and different types of social signals.

This project is abandoned. If you'd like to take ownership of this project, let me know.

Note: Version ~> 2.0 and ~> 3.0 used typheous internally which caused memory leak issues and failures on windows. 4.0.0 changes the implementation to use a Net::HTTP based library for better compatability.

Note: Version >= 4.1.0 no longer actively maintains compatibility with Ruby 1.8.X. It will probably still work for the time being.

Note: Version >= 4.2.0 no longer actively maintains compatibility with Ruby < 1.9.3. It will probably still work, but you may need to specify older versions for gems this library depends on in your Gemfile.

Note: Version >= 4.5.0 no longer actively maintains compatibility with Ruby < 2.0.

Check out a little web app I wrote up that uses it or look at the source.

Get it!

    gem install PageRankr

Gemfile

    gem 'PageRankr'

Use it!

    require 'page_rankr'

Backlinks

Backlinks are the result of doing a search with a query like "link:www.google.com". The number of returned results indicates how many sites point to that url. If a site is not tracked then nil is returned.

    PageRankr.backlinks('www.google.com', :google, :bing) #=> {:google=>161000, :bing=>208000000}
    PageRankr.backlinks('www.google.com', :yahoo)         #=> {:yahoo=>256300062}

If you don't specify a search engine, then all of them are used.

    # this
    PageRankr.backlinks('www.google.com')
        #=> {:google=>23000, :bing=>215000000, :yahoo=>250522337, :alexa=>727036}

    # is equivalent to
    PageRankr.backlinks('www.google.com', :google, :bing, :yahoo, :alexa)
        #=> {:google=>23000, :bing=>215000000, :yahoo=>250522337, :alexa=>727036}

You can also use the alias backlink instead of backlinks.

Valid search engines are: :google, :bing, :yahoo, :alexa (altavista and alltheweb now redirect to yahoo). To get this list you can do:

    PageRankr.backlink_trackers #=> [:alexa, :bing, :google, :yahoo]

Indexes

Indexes are the result of doing a search with a query like "site:www.google.com". The number of returned results indicates how many pages of a domain are indexed by a particular search engine. If the site is not indexed nil is returned.

    PageRankr.indexes('www.google.com', :google)       #=> {:google=>4860000}
    PageRankr.indexes('www.google.com', :bing)         #=> {:bing=>2120000}

If you don't specify a search engine, then all of them are used.

    # this
    PageRankr.indexes('www.google.com')
        #=> {:bing=>2120000, :google=>4860000, :yahoo => 4863000}

    # is equivalent to
    PageRankr.indexes('www.google.com', :google, :bing, :yahoo)
        #=> {:bing=>2120000, :google=>4860000, :yahoo => 4863000}

You can also use the alias index instead of indexes.

Valid search engines are: :google, :bing, :yahoo. To get this list you can do:

    PageRankr.index_trackers #=> [:bing, :google, :yahoo]

Ranks

Ranks are ratings assigned to specify how popular a site is. The most famous example of this is the google page rank.

    PageRankr.ranks('www.google.com', :google)        #=> {:google=>10}

If you don't specify a rank provider, then all of them are used.

    PageRankr.ranks('www.google.com', :alexa_us, :alexa_global, :google, :moz_rank, :page_authority)
        #=> {:alexa_us=>1, :alexa_global=>1, :alexa_country=>1, :google=>10, :moz_rank => 8, :page_authority => 97}

    # this also gives the same result
    PageRankr.ranks('www.google.com')
        #=> {:alexa_us=>1, :alexa_global=>1, :alexa_country=>1, :google=>9, :moz_rank=>8, :domain_authority=>100, :page_authority=>96}

You can also use the alias rank instead of ranks.

Valid rank trackers are: :alexa_country, :alexa_global, :alexa_us, :google, :moz_rank, :page_authority. To get this you can do:

    PageRankr.rank_trackers #=> [:alexa_us, :alexa_global, :alexa_country, :google, :moz_rank, :domain_authority, :page_authority]

Alexa ranks are descending where 1 is the most popular. Google page ranks are in the range 0-10 where 10 is the most popular. If a site is unindexed then the rank will be nil.

Socials

Social signals are a somewhat oversimplified way of telling how popular a site or page currently is.

    PageRankr.socials('www.google.com', :linked_in)        #=> {:linked_in=>1001}

If you don't specify a social tracker, then all of them are used.

    PageRankr.socials('www.google.com', :google, :linked_in, :pinterest, :stumbled_upon, :twitter, :vk)
        #=> {:google=>10000, :linked_in=>1001, :pinterest=>75108, :stumple_upon=>255078, :twitter=>21933764, :vk=>3725}

    # this also gives the same result
    PageRankr.socials('www.google.com')
        #=> {:google=>10000, :linked_in=>1001, :pinterest=>75108, :stumble_upon=>255078, :twitter=>21933764, :vk=>3725}

Valid social trackers are: :google, :linked_in, :pinterest, :stumble_upon, :twitter, :vk. To get this you can do:

    PageRankr.social_trackers #=> [:google, :linked_in, :pinterest, :stumble_upon, :twitter, :vk]

Use it a la carte!

From versions >= 3, everything should be usable in a much more a la carte manner. If all you care about is google page rank (which I speculate is common) you can get that all by itself:

    require 'page_rankr/ranks/google'

    tracker = PageRankr::Ranks::Google.new("myawesomesite.com")
    tracker.run #=> 2

Also, once a tracker has run three values will be accessible from it:

    # The value extracted. Tracked is aliased to rank for PageRankr::Ranks, backlink for PageRankr::Backlinks, and index for PageRankr::Indexes.
    tracker.tracked #=> 2

    # The value extracted with the jsonpath, xpath, or regex before being cleaned.
    tracker.raw     #=> "2"

    # The body of the response
    tracker.body    #=> "<html><head>..."

Rate limiting and proxies

One of the annoying things about each of these services is that they really don't like you scraping data from them. In order to deal with this issue, they throttle traffic from a single machine. The simplest way to get around this is to use proxy machines to make the requests.

In PageRankr >= 3.2.0, this is much simpler. The first thing you'll need is a proxy service. Two are provided here. A proxy service must define a proxy method that takes two arguments. It should return a string like http://user:[email protected]:50501.

Once you have a proxy service, you can tell PageRankr to use it. For example:

    PageRankr.proxy_service = PageRankr::ProxyServices::Random.new([
      'http://user:[email protected]:50501',
      'http://user:[email protected]:50501'
    ])

Once PageRankr knows about your proxy service, any request that is made will ask for a proxy from the proxy service. It does this by calling the proxy method. When it calls the proxy method, it passed the name of the tracker (e.g. :ranks_google) and the site that is being looked up. Hopefully, this information is sufficient for you to build a much smarter proxy service than the ones provided (pull requests welcome!).

Fix it!

If you ever find something is broken it should now be much easier to fix it with version >= 1.3.0. For example, if the xpath used to lookup a backlink is broken, just override the method for that class to provide the correct xpath.

    module PageRankr
      class Backlinks
        class Bing
          def xpath
            "//my/new/awesome/@xpath"
          end
        end
      end
    end

Extend it!

If you ever come across a site that provides a rank or backlinks you can hook that class up to automatically be use with PageRankr. PageRankr does this by looking up all the classes namespaced under Backlinks, Indexes, and Ranks.

    require 'page_rankr/backlink'

    module PageRankr
      class Backlinks
        class Foo
          include Backlink

          # This method is required
          def url
            "http://example.com/"
          end

          # This method specifies the parameters for the url. It is optional, but likely required for the class to be useful.
          def params
            {:q => tracked_url}
          end

          # You can use a method named either xpath, jsonpath, or regex with the appropriate query type
          def xpath
            "//backlinks/text()"
          end

          # Optionally, you could override the clean method if the current implementation isn't sufficient
          # def clean(backlink_count)
          #   #do some of my own cleaning
          #   super(backlink_count) # strips non-digits and converts it to an integer or nil
          # end
        end
      end
    end

    PageRankr::Backlinks::Foo.new("myawesomesite.com").run #=> 3
    PageRankr.backlinks("myawesomesite.com", :foo)[:foo]   #=> 3

Then, just make sure you require the class and PageRankr and whenever you call PageRankr.backlinks it'll be able to use your class.

Note on Patches/Pull Requests

  • Fork the project.
  • Make your feature addition or bug fix.
  • Add tests for it. This is important so I don't break it in a future version unintentionally.
  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
  • Send me a pull request. Bonus points for topic branches.

TODO Version 5

  • Detect request throttling

Shout Out

Gotta give credit where credits due!

Original inspiration from:

Copyright

Copyright (c) 2010 Allen Madsen. See LICENSE for details.

More Repositories

1

webhook-payload

This gem is a convenience wrapper for Github's webhook payload that is triggered from a post receive hook.
Ruby
13
star
2

injex

A simple way to describe dependencies that can be replaced at test time.
Elixir
11
star
3

bencodr

This gem provides a way to encode and parse bencodings used by the Bit Torrent protocol.
Ruby
11
star
4

rails3-base

Base rails 3 app with devise, rspec, cucumber, capybara, jquery, and machinist.
Ruby
11
star
5

decision_tree

Python
9
star
6

wumpus

Hunt the Wumpus in Prolog for a school project
Prolog
8
star
7

xml2json

An xsl transformation for xml that will convert it to json.
8
star
8

is_it_popular

Web interface to PageRankr gem.
Ruby
6
star
9

maybelline

Maybe you should stop checking for nil and be confident
Ruby
6
star
10

Bank-Simulator

A bank simulator that utilizes Java's reentrant mutexes in java.util.concurrent.locks.*.
Java
5
star
11

LinkIn

Ridiculously scaled social networking
Ruby
4
star
12

acts_as_search_and_destroy

Index your models with IndexTank. (Abandoned because IndexTank abandoned me...)
Ruby
3
star
13

torrent_tracker

This gem will give you a fully working torrent tracker written in pure ruby when its done.
Ruby
3
star
14

addy

Prettier summations in your code.
Ruby
3
star
15

page_rankr_daemon

Ruby
3
star
16

DieMaze

A* search for solving rolling die mazes.
Python
2
star
17

bplot

A 2D and 3D plotting module for SciRuby.
Ruby
2
star
18

articles

Articles for AllenMadsen.com
Ruby
2
star
19

DamLev

Measure the distance between two strings.
Ruby
2
star
20

portfolio

Ruby
2
star
21

trust_gaurd

Implementation of the algorithm found in TrustGaurd: Countering Vulnerabilities in Reputation Management for Decentralized Overlay Networks.
Ruby
2
star
22

The-Mobile-Problem

Implementations of the mobile problem in various languages.
Scheme
2
star
23

Ritter

Twitter and RIT merged into one
Ruby
2
star
24

torrent_builder

TODO: one-line summary of your gem
Ruby
2
star
25

word_list

Interview solution for ITA Software.
Ruby
2
star
26

DOS

A program to simulate dos in unix. Written for Operating Systems class.
C
1
star
27

stutter

abandoned
1
star
28

jsonpath.js

JavaScript
1
star
29

blacksmith

An extension for prototype.
JavaScript
1
star
30

dotfiles

My dot files
1
star
31

slimgem

Gem setup and automation.
1
star
32

Thesis

User trust on the web
1
star
33

brewery

JavaScript
1
star
34

cryparithmetic

Naive backtracking problem with prolog
Prolog
1
star
35

Blogcast

Mirror of Blogcast by Tian Valdemar Davis
1
star
36

ember_playground

Ruby
1
star
37

backbone_playground

Ruby
1
star