Intoli Article Materials
This repository holds supplementary articles materials, such as code files, for posts on the Intoli Blog; basically any stuff that doesn't quite warrant the creation of its own repository. These materials are often also available for download on intoli.com, but the repository offers an alternative mechanism to browse through the files. Additionally, you can watch or start this repository to be made aware of new updates on our blog.
Our Latest Article
This is our most recent article, we hope that you'll enjoy it!
- Performing Efficient Broad Crawls with the AOPIC Algorithm - Learn how to efficiently allocate your bandwidth to the most important pages encountered during a broad crawl.
Articles
- Analyzing One Million robots.txt Files - Explores downloading and analyzing the
robots.txt
files for the Alex top one million websites. - Breaking Out of the Chrome/WebExtension Sandbox - Uses some JavaScript trickery to break out of browser extension context to directly modify webpage native properties.
- Email Spy - An open source Chrome/Firefox Web Extension that lets you find contact emails for any domain with a single click.
- Extending CircleCI's API with a Custom Microservice on AWS Lambda - Explains how to deploy a nodejs express app as a microservice on AWS Lambda.
- Fantasy Football for Hackers - Scrapes Fantasy Football projections and uses them to simulate league dynamics and calculate baseline subtracted values for players to use as a draft strategy.
- How to Clear the Chrome Browser Cache With Selenium WebDriver/ChromeDriver - Demonstrates how to clear the Chrome browser cache with Selenium.
- How to Clear the Firefox Browser Cache With Selenium WebDriver/geckodriver - Demonstrates how to clear the Firefox browser cache with Selenium.
- How to Run a Keras Model in the Browser with Keras.js - Shows how to export a Keras neural network model and use it in the browser with
keras-js
. - Implementing a Custom Waiting Action in Nightmare JS - Learn how to have your Nightmare JS tests wait until the network is silent.
- It is not possible to detect and block Chrome headless - An updated exploration of techniques to avoid detection.
- JavaScript Injection with Selenium, Puppeteer, and Marionette in Chrome and Firefox - An exploration of different browser automation methods to inject JavaScript into webpages.
- Making a YouTube MP3 Downloader with Exodus, FFmpeg, and AWS Lambda - Walks through the process of building a YouTube MP3 bookmarklet using AWS Lambda.
- Making Chrome Headless Undetectable - Bypasses some common Chrome Headless tests by injecting JavaScript into pages before the test code has a chance to run.
- No API Is the Best API — The elegant power of Power Assert - Looks at how Power Assert can be used to get rich contextual error messages without the need to use a specialized assertion API.
- Performing Efficient Broad Crawls with the AOPIC Algorithm - Learn how to efficiently allocate your bandwidth to the most important pages encountered during a broad crawl.
- Recreating Python's Slice Syntax in JavaScript Using ES6 Proxies - Explores how Proxies work in JavaScript, and walks through the process of building the Slice package for negative indexing and extended slicing in JavaScript.
- Running FFmpeg on AWS Lambda for 1.9% the cost of AWS Elastic Transcoder - Develops an AWS Lambda function that can transcode video and audio on the fly.
- Running Selenium with Headless Chrome - Demonstrates how to run Google Chrome in headless mode using Selenium in Python.
- Running Selenium with Headless Chrome in Ruby - A Ruby flavored version of our headless Chrome setup guide.
- Scraping User-Submitted Reviews from the Steam Store - Walks through the process of building an advanced Scrapy spider for the purpose of scraping user reviews from the Steam Store.
- Understanding Neural Network Weight Initialization - Explores the effects of neural network weight initialization strategies.
- Using Firefox WebExtensions with Selenium - A guide to launching Firefox with extensions preloaded using Selenium.
- Using Google Chrome Extensions with Selenium - A simple guide to launching Google Chrome with extensions preloaded using Selenium.
- Using Puppeteer to Scrape Websites with Infinite Scrolling - Learn how to scrape an infinitely scrolling data feed with a headless browser.
- Using Webpack to Render Markdown in React Apps - A short tutorial about automatically rendering Markdown documents for usage in React apps.
- Why I Still Don't Use Yarn - Benchmarks
npm
,pnpm
, andyarn
for installation time and storage space given a few common project configurations.
Honorable Mentions
These are articles where we don't have any supplementary materials available, but that we still highly recommend.
- A Brief Tour of Grouping and Aggregating in Pandas - Learn how to use pandas to easily slice up a dataset and quickly extract useful statistics.
- Building Data Science Pipelines with Luigi and Jupyter Notebooks - Learn about the Luigi task runner and how to use Jupyter notebooks in your Luigi workflows.
- Dangerous Pickles — Malicious Python Serialization - A light introduction to the Python pickle protocol, the Pickle Machine, and constructing malicious pickles.
- Designing The Wayback Machine Loading Animation - A walkthrough of how we helped The Internet Archive design a new loading animation for the Wayback Machine.
- Fantasy Football for Hackers II — An interactive visualization of Average Draft Position vs Season Projections.
- Finding Pareto Optimal Blogs on Hacker News - An analysis of submissions on Hacker News for the purpose of identifying high quality technology blogs.
- How Are Principal Component Analysis and Singular Value Decomposition Related? - Explores the relationship between singular value decomposition and principal component analysis.
- How F5Bot Slurps All of Reddit - A guest post in which the creator of F5Bot explains in detail how it works, and how it's able to scrape million of Reddit comments per day in real-time.
- How to Test If Your Website Logs Errors to the Console - Use Nightmare JS to write useful Mocha-based console output tests.
- Markov's and Chebyshev's Inequalities Explained - A look at why Chebyshev's Inequality holds true and some potential applications.
- Patching a Linux Kernel Module - A case-study in debugging and patching kernel-level issues on Linux.
- Predicting Hacker News article success with neural networks and TensorFlow - An interactive tool that uses TensorFlow to predict how well submissions will do on Hacker News based on their titles.
- Running Selenium with Headless Firefox - A look at connecting Selenium WebDriver to Firefox's headless mode.
- Saving Images from a Headless Browser - Learn how to save any image from a headless browser in this Puppeteer tutorial.
- Terminal Recorders: A Comprehensive Guide - An in-depth comparision of different methods to record animations of terminal sessions.