• Stars
    star
    394
  • Rank 109,295 (Top 3 %)
  • Language
    Python
  • Created about 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Replace Splunk in your small company with this one weird trick!

Automatically Download and Analyze Log Files from Remote Machines

This application is designed to collect and analyze logs from remote machines hosted on Amazon Web Services (AWS) and other cloud hosting services.

Note: This application was specifically designed for use with Pastel Network's log files. However, it can be easily adapted to work with any log files by modifying the parsing functions, data models, and specifying the location and names of the log files to be downloaded. It is compatible with log files stored in a standard format, where each entry is on a separate line and contains a timestamp, a log level, and a message. The application has been tested with log files several gigabytes in size from dozens of machines and can process all of it in minutes. It is designed for Ubuntu 22.04+, but can be adapted for other Linux distributions.

Demo Screenshot:

Customization

To adapt this application for your own use case, refer to the included sample log files and compare them to the parsing functions in the code. You can also modify the data models to store log entries as desired.

Features

The application consists of various Python scripts that perform the following functions:

  • Connect to Remote Machines: Using the boto3 library for AWS instances and an Ansible inventory file for non-AWS instances, the application establishes SSH connections to each remote machine.
  • Download and Parse Log Files: Downloads specified log files from each remote machine and parses them. The parsed log entries are then queued for database insertion.
  • Insert Log Entries into Database: Uses SQLAlchemy to insert the parsed log entries from the queue into an SQLite database.
  • Process and Analyze Log Entries: Processes and analyzes log entries stored in the database, offering functions to find error entries and create views of aggregated data based on specified criteria.
  • Generate Network Activity Data: Fetches and processes network activity data from each remote machine.
  • Expose Database via Web App using Datasette: Once the database is generated, it can be shared over the web using Datasette.

Compatibility

The tool is compatible with both AWS-hosted instances and any list of Linux instances stored in a standard Ansible inventory file with the following structure:

all:
  vars:
    ansible_connection: ssh
    ansible_user: ubuntu
    ansible_ssh_private_key_file: /path/to/ssh/key/file.pem
  hosts:
    MyCoolMachine01:
      ansible_host: 1.2.3.41
    MyCoolMachine02:
      ansible_host: 1.2.3.19

(Both can be used seamlessly.)

Warning

To simplify the code, the tool is designed to delete all downloaded log files and generated databases each time it runs. Consequently, this can consume significant bandwidth depending on your log files' size. However, the design's high level of parallel processing and concurrency allows it to run quickly, even when connecting to dozens of remote machines and downloading hundreds of log files.

Usage

Designed for Ubuntu 22.04+, first install the requirements:

python3 -m venv venv
source venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install wheel
pip install -r requirements.txt

You will also need to install Redis:

sudo apt install redis -y

And install Datasette to expose the results as a website:

sudo apt install pipx -y && pipx ensurepath && pipx install datasette

To run the application every 30 minutes as a cron job, execute:

crontab -e

And add the following line:

*/30 * * * * . $HOME/.profile; /home/ubuntu/automatic_log_collector_and_analyzer/venv/bin/python /home/ubuntu/automatic_log_collector_and_analyzer/automatic_log_collector_and_analyzer.py >> /home/ubuntu/automatic_log_collector_and_analyzer/log_$(date +\%Y-\%m-\%dT\%H_\%M_\%S).log 2>&1

More Repositories

1

llm_aided_ocr

Enhance Tesseract OCR output for scanned PDFs by applying Large Language Model (LLM) corrections.
Python
1,990
star
2

swiss_army_llama

A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.
Python
919
star
3

bulk_transcribe_youtube_videos_from_playlist

Easily take an entire YouTube playlist and turn it into high quality transcripts using Whisper.
Python
430
star
4

fast_vector_similarity

The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.
Rust
339
star
5

sqlalchemy_data_model_visualizer

Automatically turn your SQLalchemy Data Models into a Nice SVG Diagram
Python
244
star
6

visual_astar_python

Generate Cool-Looking Mazes and Animations Illustrating the A* Pathfinding Algorithm
Python
167
star
7

introduction_to_temporal_logic

An introduction to temporal logic and how it can be used to analyze concurrency
103
star
8

hoeffdings_d_explainer

A Detailed Introduction to My Favorite Statistical Measure, Hoeffding's D
94
star
9

bakery_algorithm

Lamport's Bakery Algorithm Demonstrated in Python
Python
94
star
10

grassmann_article

50
star
11

llm_aided_transcription_improvement

Improve a Transcript from a YouTube Video, Podcast, or other source using LLMs
Python
41
star
12

cloud_benchmarker

Cloud Benchmarker automates performance testing of cloud instances, offering insightful charts and tracking over time.
Python
33
star
13

the_lighthill_debate_on_ai

A Full Transcript of the Lighthill Debate on AI from 1973, with Introductory Remarks
25
star
14

anti_alzheimers_flasher

HTML
20
star
15

youtube_transcript_cleaner

YouTube Transcript Cleaner is a simple web-based application that improves the readability of YouTube transcripts.
HTML
19
star
16

prepareprojectforllmprompt

Transform your code project into a Markdown document optimized for interaction with Language Learning Models like GPT-4, complete with dynamic file selection and token management features.
TypeScript
17
star
17

sassaman_and_dingledine_on_remailers_at_blackhat_2003

A Full Transcript and Slides from the talk "Attacks on Anonymity Systems: The Theory" by Len Sassaman and Roger Dingledine in 2003
13
star
18

paxos_vs_raft

11
star
19

most-influential-github-repo-stars

See the most influential users who have starred or forked a given repo (sorted by earned stars and follower count)
TypeScript
8
star
20

some_thoughts_on_ai_alignment

Some Thoughts on AI Alignment: Using AI to Control AI
4
star
21

textsynth_server_cluster

Python
3
star
22

causal_direction_estimation_from_data

Automated Python script for data-driven causal inference, leveraging machine learning models and statistical error analysis to uncover hidden causal relationships.
Python
2
star
23

llm_aided_legal_discovery_bot

Python
2
star
24

ChatTTS

Forked to make it installable via pip
Jupyter Notebook
2
star
25

ball_fighters

HTML
2
star
26

automatic_cpp_code_analysis_with_gpt

Python
1
star
27

github_stars_curve

JavaScript
1
star
28

advice_for_learning_to_code_and_making_an_app

HTML
1
star
29

github-diff-viewer

JavaScript
1
star