• Stars
    star
    122
  • Rank 282,552 (Top 6 %)
  • Language
    PowerShell
  • License
    Apache License 2.0
  • Created over 12 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Parallel Data Processing in PowerShell

SplitPipeline

PowerShell v2.0+ module for parallel data processing. Split-Pipeline splits the input, processes parts by parallel pipelines, and outputs results. It may work without collecting the whole input, large or infinite.

Quick Start

Step 1: Get and install.

SplitPipeline is available as the PSGallery module SplitPipeline. In PowerShell 5.0+ or with PowerShellGet you can install it by this command:

Install-Module SplitPipeline

SplitPipeline is also available as the NuGet package SplitPipeline. Download it by NuGet tools or directly. In the latter case save it as ".zip", unzip, and use the directory "tools/SplitPipeline".

Step 2: In a PowerShell command prompt import the module:

Import-Module SplitPipeline

Step 3: Take a look at help:

help -full Split-Pipeline

Step 4: Try these three commands performing the same job simulating long but not processor consuming operations on each item:

1..10 | . {process{ $_; sleep 1 }}
1..10 | Split-Pipeline {process{ $_; sleep 1 }}
1..10 | Split-Pipeline -Count 10 {process{ $_; sleep 1 }}

Output of all commands is the same, numbers from 1 to 10 (Split-Pipeline does not guarantee the same order without the switch Order). But consumed times are different. Let's measure them:

Measure-Command { 1..10 | . {process{ $_; sleep 1 }} }
Measure-Command { 1..10 | Split-Pipeline {process{ $_; sleep 1 }} }
Measure-Command { 1..10 | Split-Pipeline -Count 10 {process{ $_; sleep 1 }} }

The first command takes about 10 seconds.

Performance of the second command depends on the number of processors which is used as the default split count. For example, with 2 processors it takes about 6 seconds.

The third command takes about 2 seconds. The number of processors is not very important for such sleeping jobs. The split count is important. Increasing it to some extent improves overall performance. As for intensive jobs, the split count normally should not exceed the number of processors.

See also

More Repositories

1

Invoke-Build

Build Automation in PowerShell
PowerShell
603
star
2

PowerShellTraps

Collection of PowerShell traps and oddities
PowerShell
422
star
3

Mdbc

MongoDB Cmdlets for PowerShell
PowerShell
136
star
4

FarNet

Far Manager framework for .NET modules and scripts in PowerShell, F#, JavaScript.
C#
128
star
5

PowerShelf

PowerShell Script Tools
PowerShell
110
star
6

Helps

PowerShell Help Builder
PowerShell
41
star
7

PS-GuiCompletion

A graphical menu for PowerShell tab completions.
PowerShell
40
star
8

PsdKit

PowerShell data (psd1) tool kit
PowerShell
39
star
9

Ldbc

LiteDB Cmdlets, the document store in PowerShell
PowerShell
39
star
10

FarNet.FSharp.PowerShell

F# friendly PowerShell Core helper
F#
20
star
11

Xmlips

XML in PowerShell
PowerShell
14
star
12

Invoke-Build.template

Invoke-Build script template
PowerShell
4
star
13

BsonFile

BSON/JSON file collections in MongoDB, PowerShell module
PowerShell
4
star
14

FarLite

PowerShell module with LiteDB browser in Far Manager
PowerShell
3
star
15

FarDescription

Far Manager style descriptions of files and directories
C#
3
star
16

ClearScript

ClearScript and FarNet lab for Far Manager scripting in JavaScript
PowerShell
2
star
17

FarMongo

PowerShell module with MongoDB browser in Far Manager
PowerShell
2
star
18

FarGit

Deprecated and replaced by https://github.com/nightroman/FarNet/tree/main/GitKit
PowerShell
1
star
19

FarNet.FSharp.Data

FSharp.Data package for FarNet.FSharpFar
PowerShell
1
star
20

FarNet.template

FarNet module and script template
PowerShell
1
star
21

FarNet.FSharp.Charting

FarNet friendly FSharp.Charting extension
F#
1
star