• Stars
    star
    1,919
  • Rank 24,163 (Top 0.5 %)
  • Language
    Python
  • License
    GNU General Publi...
  • Created about 7 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A library that scrapes Linkedin for user data

Linkedin Scraper

Scrapes Linkedin User Data

Linkedin Scraper

Installation

pip3 install --user linkedin_scraper

Version 2.0.0 and before is called linkedin_user_scraper and can be installed via pip3 install --user linkedin_user_scraper

Setup

First, you must set your chromedriver location by

export CHROMEDRIVER=~/chromedriver

Sponsor

rds-cost

Scrape public LinkedIn profile data at scale with Proxycurl APIs.

• Scraping Public profiles are battle tested in court in HiQ VS LinkedIn case.
• GDPR, CCPA, SOC2 compliant
• High rate limit - 300 requests/minute
• Fast - APIs respond in ~2s
• Fresh data - 88% of data is scraped real-time, other 12% are not older than 29 days
• High accuracy
• Tons of data points returned per profile

Built for developers, by developers.

Usage

To use it, just create the class.

Sample Usage

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()

email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/joey-sham-aa2a50122", driver=driver)

NOTE: The account used to log-in should have it's language set English to make sure everything works as expected.

User Scraping

from linkedin_scraper import Person
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5")

Company Scraping

from linkedin_scraper import Company
company = Company("https://ca.linkedin.com/company/google")

Job Scraping

from linkedin_scraper import JobSearch, actions
from selenium import webdriver

driver = webdriver.Chrome()
email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
input("Press Enter")
job = Job("https://www.linkedin.com/jobs/collections/recommended/?currentJobId=3456898261", driver=driver, close_on_complete=False)

Job Search Scraping

from linkedin_scraper import JobSearch, actions
from selenium import webdriver

driver = webdriver.Chrome()
email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
input("Press Enter")
job_search = JobSearch(driver=driver, close_on_complete=False, scrape=False)
# job_search contains jobs from your logged in front page:
# - job_search.recommended_jobs
# - job_search.still_hiring
# - job_search.more_jobs

job_listings = job_search.search("Machine Learning Engineer") # returns the list of `Job` from the first page

Scraping sites where login is required first

  1. Run ipython or python
  2. In ipython/python, run the following code (you can modify it if you need to specify your driver)
from linkedin_scraper import Person
from selenium import webdriver
driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver, scrape=False)
  1. Login to Linkedin
  2. [OPTIONAL] Logout of Linkedin
  3. In the same ipython/python code, run
person.scrape()

The reason is that LinkedIn has recently blocked people from viewing certain profiles without having previously signed in. So by setting scrape=False, it doesn't automatically scrape the profile, but Chrome will open the linkedin page anyways. You can login and logout, and the cookie will stay in the browser and it won't affect your profile views. Then when you run person.scrape(), it'll scrape and close the browser. If you want to keep the browser on so you can scrape others, run it as

NOTE: For version >= 2.1.0, scraping can also occur while logged in. Beware that users will be able to see that you viewed their profile.

person.scrape(close_on_complete=False)

so it doesn't close.

Scraping sites and login automatically

From verison 2.4.0 on, actions is a part of the library that allows signing into Linkedin first. The email and password can be provided as a variable into the function. If not provided, both will be prompted in terminal.

from linkedin_scraper import Person, actions
from selenium import webdriver
driver = webdriver.Chrome()
email = "[email protected]"
password = "password123"
actions.login(driver, email, password) # if email and password isnt given, it'll prompt in terminal
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver=driver)

API

Person

A Person object can be created with the following inputs:

Person(linkedin_url=None, name=None, about=[], experiences=[], educations=[], interests=[], accomplishments=[], company=None, job_title=None, driver=None, scrape=True)

linkedin_url

This is the linkedin url of their profile

name

This is the name of the person

about

This is the small paragraph about the person

experiences

This is the past experiences they have. A list of linkedin_scraper.scraper.Experience

educations

This is the past educations they have. A list of linkedin_scraper.scraper.Education

interests

This is the interests they have. A list of linkedin_scraper.scraper.Interest

accomplishment

This is the accomplishments they have. A list of linkedin_scraper.scraper.Accomplishment

company

This the most recent company or institution they have worked at.

job_title

This the most recent job title they have.

driver

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

For example

driver = webdriver.Chrome()
person = Person("https://www.linkedin.com/in/andre-iguodala-65b48ab5", driver = driver)

scrape

When this is True, the scraping happens automatically. To scrape afterwards, that can be run by the scrape() function from the Person object.

scrape(close_on_complete=True)

This is the meat of the code, where execution of this function scrapes the profile. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other profiles are desired, then you might want to set that to false so you can keep using the same driver.

Company

Company(linkedin_url=None, name=None, about_us=None, website=None, headquarters=None, founded=None, company_type=None, company_size=None, specialties=None, showcase_pages=[], affiliated_companies=[], driver=None, scrape=True, get_employees=True)

linkedin_url

This is the linkedin url of their profile

name

This is the name of the company

about_us

The description of the company

website

The website of the company

headquarters

The headquarters location of the company

founded

When the company was founded

company_type

The type of the company

company_size

How many people are employeed at the company

specialties

What the company specializes in

showcase_pages

Pages that the company owns to showcase their products

affiliated_companies

Other companies that are affiliated with this one

driver

This is the driver from which to scraper the Linkedin profile. A driver using Chrome is created by default. However, if a driver is passed in, that will be used instead.

get_employees

Whether to get all the employees of company

For example

driver = webdriver.Chrome()
company = Company("https://ca.linkedin.com/company/google", driver=driver)

scrape(close_on_complete=True)

This is the meat of the code, where execution of this function scrapes the company. If close_on_complete is True (which it is by default), then the browser will close upon completion. If scraping of other companies are desired, then you might want to set that to false so you can keep using the same driver.

Contribution

Buy Me A Coffee

More Repositories

1

py-edgar

A small library to access files from SEC's edgar
Python
221
star
2

tiktok-scraper

A scraper to download TikTok videos
Python
22
star
3

py-spotme

A CLI tool that creates AWS spot instances on the fly
Python
17
star
4

C-Port-Scanner

A simple port scanner, written in C
C
16
star
5

node-xml2json-cli

A node CLI wrapper for xml2json
JavaScript
13
star
6

py-image-comparer

Compares two images using Siamese Network (machine learning) trained from a Pytorch Implementation
Python
10
star
7

s3-as-a-datastore

s3 as a datastore: A way to use S3 as a key-value datastore instead of a real datastore. can be read as s3aadatastore
Python
9
star
8

node-ascii-animate

A tool that allows you to animate ascii in your terminal
JavaScript
9
star
9

nnnba

Analysis of NBA player stats and salaries of the 2016-17 for the 17-18 season
Python
9
star
10

jsonl-to-conll

A simple tool to convert JSONL to CONLL
Python
9
star
11

imgur-lite

Imgur-lite is a light weighted imgur app. This repository includes website that wraps around imgur-lite app, which allows quick browsing on imgur, especially on mobile.
JavaScript
5
star
12

py-sql2sql

A simple lightweight tool to perform ETL jobs between two SQL databases
Python
4
star
13

AutoChromedriver

Downloads and unzips chromedriver to curent directory
Python
4
star
14

node-minify-all

A tool that minifies all .js files in a given directory, including ones in nested folders
JavaScript
4
star
15

node-dancing-bear

An animation of a dancing bear
JavaScript
4
star
16

Commonly-Used-Pyspark-Commands

A list of commonly used pyspark commands
3
star
17

py-oauth2_google_login

Gets OAuth2 access token from Google/YouTube automatically using requests_oauthlib
Python
3
star
18

you-are-a-fish

A chat where users are fishes. Created using d3 and socket.io
JavaScript
3
star
19

node-download-vine

A tool to download files, including vines
JavaScript
3
star
20

node-git-lib

A library with different git commands for uses
HTML
3
star
21

py-spotcheck

A simple CLI tool to check the spot prices of AWS instances
Python
3
star
22

Yahoo-Fantasy-NBA-simulator

1000 ML Fantasy NBA Draft
Jupyter Notebook
3
star
23

node-checkout-cli

a CLI that allows users to pick the branch they want to checkout
HTML
3
star
24

ze2nb-cli

A wrapper for ze2nb to be used as a CLI
Python
2
star
25

.files

My dot files
Shell
2
star
26

AlexNet_Tutorial

Python
2
star
27

py-oauth2_facebook_login

Gets OAuth2 access token from Facebook automatically (with webdriver) using requests_oauthlib
Python
2
star
28

autolinker

A simple directive that turns all URLs within its div into links
JavaScript
2
star
29

psql2csv

A library and a CLI to download PostgreSQL schemas and tables
Python
2
star
30

nnnba_website

A wrapper site for NNNBA
JavaScript
2
star
31

ArconaiAudio

Python
2
star
32

node-time-bin

A module that takes in time series and outputs the binned version of it
JavaScript
2
star
33

Python-Port-Scanner

A simple port scanner, written in Python
Python
2
star
34

node-commit-cli

a CLI that allows users to automatically commit files
JavaScript
2
star
35

tampermonkey-scripts

JavaScript
1
star
36

lazy-g-cli

Because you are too lazy to type "grunt" or "gulp" sometimes
JavaScript
1
star
37

node-branch-cli

A CLI tool that allows manipulation of git branches in local git repository
JavaScript
1
star
38

C-ping

Ping written in C that shows what I want
C
1
star
39

player-recognition

Python
1
star
40

deploy-me

A front end that deploys apps, which is displayed once the app is successfully deployed
JavaScript
1
star
41

node-push-cli

a CLI that allows users to easily push files to current branch
HTML
1
star
42

Basic-C-Server

Basic web hosting, using C
C++
1
star
43

node-git-cli

A collection of various git-related CLI commands
1
star
44

node-kontains

A small tool that determines whether an object or an array contains an element
JavaScript
1
star
45

redirectm3u

Used to redirect blindy.tv
Go
1
star
46

node-zeroes

A tool to create an array filled with 0's
JavaScript
1
star
47

comparative-stock

JavaScript
1
star
48

node-xcept

Adds abilities for objects and arrays to omit variables
HTML
1
star
49

node-numArray

A tool that creates array based on beginning and end numbers
JavaScript
1
star
50

deep_sort_mask_rcnn

Python
1
star
51

py-nprint

A lightweight nested printing, for all your function within loops within function needs
Python
1
star
52

node-npm-save

A module that allows module npm install --save or --save-dev
JavaScript
1
star
53

node-isUnique

A module that helps determine if an array has only unique elements
JavaScript
1
star
54

tf-player-recognition

Python
1
star
55

node-keyway

A library that creates a keyway: the opposite of Object.keys
JavaScript
1
star
56

node-ones

A tool to create an array filled with 1's
HTML
1
star
57

instances

Scripts for installation on different VMs and dev environments
Shell
1
star
58

node-revert-cli

A CLI that allows users to pick the files they want to revert
JavaScript
1
star
59

node-merge-cli

a CLI that allows users to pick the branch they want to merge
JavaScript
1
star
60

node-jslib

A module that loads some modules that are useful in javascript
JavaScript
1
star
61

py-custom-google-search

Python
1
star
62

aws-cognito-cli

A simple CLI tool to get the AWS Cognito Access Token
Python
1
star
63

herault-prefecture-visa-booking

Python
1
star
64

pyurl

cURL, but with python
Python
1
star
65

rURL

Using rust to make URL calls instead of C
Rust
1
star
66

arxiver-firefox

Handy tool for Arxiv to save papers, and filter through them later. Included is the ability to search for a paper you have saved by keywords, author, title, description, etc.
JavaScript
1
star
67

py-dfault

A simple lightweight class that allows the user to fallback on default values
Python
1
star
68

py-instacart

Python
1
star
69

node-jsondata

A module that saves and loads json data from a file
JavaScript
1
star
70

node-ninstall

A CLI tool to install the same npm modules that you always install
JavaScript
1
star
71

py-cifar10

This library was created to allow an easy usage of CIFAR 10 DATA. This is a wrapper around the instructions givn on the CIFAR 10 site
Python
1
star