• Stars
    star
    159
  • Rank 235,916 (Top 5 %)
  • Language
    Ruby
  • License
    MIT License
  • Created over 1 year ago
  • Updated 8 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tool for scraping emails, social media accounts, and much more information from websites using Google Search Results.

Clauneck

Gem Version Contributors Forks Stargazers Issues Issues MIT License

Clauneck Information Scraper

Clauneck is a Ruby gem designed to scrape specific information from a series of URLs, either directly provided or fetched from Google search results via SerpApi's Google Search API. It extracts and matches patterns such as email addresses and social media handles from the web pages, and stores the results in a CSV file.

Unlike Google Chrome extensions that need you to visit webpages one by one, Clauneck excels in bringing the list of websites to you by leveraging SerpApiโ€™s Google Search API.


The End Result

The script will write the results in a CSV file. If it cannot find any one of the information on a website, it will label it as null. For unknown errors happening in-between (connection errors, encoding errors, etc.) the fields will be filled with as error.

Website Information Type of Information
serpapi.com [email protected] Email
serpapi.com serpapicom Instagram
serpapi.com serpapicom Facebook
serpapi.com serp_api Twitter
serpapi.com null Tiktok
serpapi.com channel/UCUgIHlYBOD3yA3yDIRhg_mg Youtube
serpapi.com serpapi Github
serpapi.com serpapi Medium

Prerequisites

Since SerpApi offers free credits that renew every month, and the user can access a list of free public proxies online, this toolโ€™s pricing is technically free. You may extract data from approximately 10,000 pages (100 results in 1 page, and up to 100 pages) with a free account from SerpApi.

  • For collecting URLs to scrape, one of the following is required:
    • SerpApi API Key: You may Register to Claim Free Credits
    • List of URLs in a text document (The URLs should be Google web cache links that start with https://webcache.googleusercontent.com)
  • For scraping URLs, one of the following is required:
    • List of Proxies in a text document (You may use public proxies. Only HTTP proxies are accepted.)
    • Rotating Proxy IP

Installation

Add this line to your application's Gemfile:

gem 'clauneck'

And then execute:

$ bundle install

Or install it yourself as:

$ gem install clauneck

Basic Usage

You can use Clauneck as a command line tool or within your Ruby scripts.

Basic Command line usage

In the command line, use the clauneck command with options as follows:

clauneck --api_key YOUR_SERPAPI_KEY --output results.csv --q "site:*.ai AND inurl:/contact OR inurl:/contact-us"

Basic Ruby script usage

In your Ruby script, call Clauneck.run method:

require 'clauneck'

api_key = "<SerpApi API Key>" # Visit https://serpapi.com/users/sign_up to get free credits.
params = {
  "q": "site:*.ai AND inurl:/contact OR inurl:/contact-us"
}

Clauneck.run(api_key: api_key, params: params)

Advanced Usage

Using Advanced Search Parameters

You can visit the Documentation for SerpApi's Google Search API to get insight on which parameters you can use to construct searches.

image

Using Advanced Search Operators

Google allows different search operators in queries to be made. This enhances your abilty to customize your search and get more precise results. For example, this search query: "site:*.ai AND inurl:/contact OR inurl:/contact-us" will search for websites ending with .ai and at /contact or /contact-us paths.

You may check out Google Search Operators: The Complete List (44 Advanced Operators) for a list of more operators

Using Proxies for Scraping in a Text Document

You can utilize your own proxies for scraping web caches of the links you have acquired. Only HTTP proxies are accepted. The proxies should be in the following format

http://username:password@ip:port
http://username:password@another-ip:another-port

or if they are public proxies:

http://ip:port
http://another-ip:another-port

You can add --proxy option in the command line to utilize the file:

clauneck --api_key YOUR_SERPAPI_KEY --proxy proxies.txt --output results.csv --q "site:*.ai AND inurl:/contact OR inurl:/contact-us"

or use the rotating proxy link directly:

clauneck --api_key YOUR_SERPAPI_KEY --proxy "http://username:password@ip:port" --output results.csv --q "site:*.ai AND inurl:/contact OR inurl:/contact-us"

You may also use it in a script:

api_key = "<SerpApi API Key>" # Visit https://serpapi.com/users/sign_up to get free credits.
params = {
  "q": "site:*.ai AND inurl:/contact OR inurl:/contact-us"
}
proxy = "proxies.txt"

Clauneck.run(api_key: api_key, params: params, proxy: proxy)

or directly use the rotating proxy link:

api_key = "<SerpApi API Key>" # Visit https://serpapi.com/users/sign_up to get free credits.
params = {
  "q": "site:*.ai AND inurl:/contact OR inurl:/contact-us"
}
proxy = "http://username:password@ip:port"

Clauneck.run(api_key: api_key, params: params, proxy: proxy)

The System IP Address will be used if no proxy is provided. The user can use System IP for small-scale projects. But it is not recommended.

Using Google Search URL to Scrape links with SerpApi

Instead of providing search parameters, the user can directly feed a Google Search URL for the web cache links to be collected by SerpApi's Google Search API.

Using URLs to Scrape in a Text Document

The user may utilize their own list of URLs to be scraped. The URLs should start with https://webcache.googleusercontent.com, and be added to each line. For example:

https://webcache.googleusercontent.com/search?q=cache:LItv_3DO2N8J:https://serpapi.com/&cd=10&hl=en&ct=clnk&gl=cy
https://webcache.googleusercontent.com/search?q=cache:_gaXFsYVmCgJ:https://serpapi.com/search-api&cd=9&hl=en&ct=clnk&gl=cy

You can find cached links manually from Google Searches as shown below:

image


Options

Clauneck accepts the following options:

  • --api_key: Your SerpApi key. It is required if you're not providing the --urls option.
  • --proxy: Your proxy file or proxy URL. Defaults to system IP if not provided.
  • --pages: The number of pages to fetch from Google using SerpApi. Defaults to 1.
  • --output: The CSV output file where to store the results. Defaults to output.csv.
  • --google_url: The Google URL that contains the webpages you want to scrape. It should be a Google Search Results URL.
  • --urls: The URLs you want to scrape. If provided, the gem will not fetch URLs from Google.
  • --help: Shows the help message and exits.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/serpapi/clauneck.


License

The gem is available as open source under the terms of the MIT License.

More Repositories

1

google-search-results-python

Google Search Results via SERP API pip Python Package
Python
593
star
2

awesome-seo-tools

Curated list of awesome SEO tools
HTML
290
star
3

lego-ai-parser

Lego AI Parser is an open-source application that uses OpenAI to parse visible text of HTML elements.
Python
227
star
4

nokolexbor

High-performance HTML5 parser for Ruby based on Lexbor, with support for both CSS selectors and XPath.
C
182
star
5

turbo_tests

Run RSpec tests on multiple cores. Like parallel_tests but with incremental summarized output. Originally extracted from the Discourse and Rubygems source code.
Ruby
173
star
6

google-search-results-nodejs

SerpApi client library for Node.js. Previously: Google Search Results Node.js.
JavaScript
78
star
7

google-search-results-golang

Google Search Results GoLang API
Go
58
star
8

google-search-results-php

Google Search Results PHP API via Serp Api
PHP
57
star
9

serpapi-python

a Python client library for SerpApi.
Python
55
star
10

google-search-results-ruby

Google Search Results via SERP API Ruby Gem
Ruby
53
star
11

public-roadmap

Public Roadmap for SerpApi, LLC (https://serpapi.com)
50
star
12

serpapi-javascript

Scrape and parse search engine results using SerpApi.
TypeScript
48
star
13

google-search-results-java

Google Search Results JAVA API via SerpApi
Java
38
star
14

review-analyzer

A Chrome Extension for extracting valuable insights from reviews, generating concise summaries, sentiment analysis, and keyword extraction
JavaScript
32
star
15

google-reviews-analyzer

Uses LLM to summarize reviews of a business
JavaScript
28
star
16

serapis-ai-image-classifier

Automatic Image Classification from SERP Data
Python
26
star
17

Wander

Replicate Wanderlust demo that shown in OpenAI Dev Day
JavaScript
24
star
18

code-challenge

SerpApi code challenge
HTML
18
star
19

google-local-results-ai-parser

A ruby gem to extract structured data from Google Local Search Results using the serpapi/bert-base-local-results model, enabling parsing, classification, and information extraction from English HTML content.
Ruby
14
star
20

google-local-results-ai-server

A server code for serving BERT-based models for text classification. It is designed by SerpApi for heavy-load prototyping and production tasks, specifically for the implementation of the google-local-results-ai-parser gem.
Python
13
star
21

automatic-images-classifier-generator

Generate machine learning models fully automatically to clasiffiy any images using SERP data
Python
11
star
22

uule_converter

A Ruby library for encoding and decoding UULE parameters in Google search URLs using coordinates
Ruby
10
star
23

google-maps-pb-decoder

Google Maps pb (i.e., protobuf) parameter decoder.
Ruby
9
star
24

Auto-GPT-SerpApi-Plugin

An Auto-GPT Plugin that connects SerpApi to Auto-GPT
Python
8
star
25

google-search-results-dotnet

Google Search Results via SERP API DotNet Package
C#
8
star
26

serpapi-search-swift

Scrape and parse search resuts from Google, Bing, Baidu, Yandex, Yahoo, Home depot, Ebay and more.. using [SerpApi](https://serpapi.com).
Ruby
6
star
27

google-apps-script

Google Apps Scripts for Google Sheet to integrate SerpApi
JavaScript
4
star
28

serpapi-ruby

Official Ruby wrapper for SerpApi HTTP endpoints
Ruby
4
star
29

ved_decoder

VedDecoder is a decoder for the Google ved parameter
Ruby
3
star
30

serpapi-golang

SerpApi client implementation in Golang
Go
3
star
31

serpapi-search-rust

Search results in Rust powered by SerpApi.com
Rust
3
star
32

seo-rank-tracker

TypeScript
2
star
33

serpapi-rust

Scrape any major search engine from our easy, fast, scalable and feature rich API powered by SerpApi
Rust
2
star
34

google-sheet-addon-guide

Documentation for Google Sheet Add-on for SerpApi
2
star
35

test-knowledge-graph-desktop

Tests for Google Knowledge Graph API
Ruby
1
star
36

test-bing-organic-results-desktop

Tests for SerpApi desktop Bing organic results https://serpapi.com/bing-organic-results
Ruby
1
star
37

serpapi-dotnet

SerpApi Client library for dotnet 5 and 6
C#
1
star
38

test-product-results

Test Google Product page results
Ruby
1
star
39

test-bing-organic-results-mobile

Tests for SerpApi mobile Bing organic results https://serpapi.com/organic-results
1
star
40

test-shopping-results-desktop

Test shopping results for SerpApi desktop
Ruby
1
star
41

test-scholar-organic-results

Test Google Scholar organic results for SerpApi
Ruby
1
star
42

test-bing-knowledge-graph-desktop

Tests for SerpApi desktop Bing knowledge graph results https://serpapi.com/bing-knowledge-graph
Ruby
1
star
43

test-organic-results-desktop

Tests for SerpApi desktop organic results
Ruby
1
star
44

test-product-reviews-results

Ruby
1
star
45

serpapi-search-cpp

Library to search on Google, Bing, HomeDepot, Baidu, Yandex and more using SerpApi written in C++.
C++
1
star
46

test-organic-results-mobile

Tests for SerpApi mobile organic results
Ruby
1
star
47

test-product-specs-results

Ruby
1
star
48

serpapi-vscode-snippets

SerAPI Snippets for VSCode - Scrape search engine results
JavaScript
1
star
49

test-related-questions-desktop

Test related questions "People also ask" block
Ruby
1
star
50

test-images-results-desktop

Tests for SerpApi Images results for desktop (beta)
Ruby
1
star
51

showcase-pot-stock-map

showcase market research with serpapi to track pot stock
JavaScript
1
star
52

test-product-sellers-results

Ruby
1
star
53

test-google-direct-answers-box-api

Tests for Google Direct Answers Box API
Ruby
1
star
54

test-bing-ad-results-desktop

Tests for SerpApi desktop Bing ad results https://serpapi.com/bing-ads
Ruby
1
star
55

spec-builder

Run all tests for SerpApi.com
Ruby
1
star
56

test-video-results

Test video result for mobile and desktop
Ruby
1
star
57

test-news-results-desktop

Test news results for SerpApi desktop
Ruby
1
star
58

hash-json-path

HashJsonPath is a simple gem to access hash and set hash value using json path
Ruby
1
star
59

serpapi-wallstreet-analysis

Analyze company business using Google search powered by SerpApi.com
Python
1
star
60

showcase-serpapi-tensorflow-keras-image-training

Tensorflow / Keras training a network to recognize Apple logo versus a real Apple fruit
Python
1
star
61

serpapi-java

Official Java wrapper for SerpApi HTTP endpoints
Java
1
star