• Stars
    star
    153
  • Rank 243,368 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created almost 4 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML parser

More Repositories

1

casr

Collect crash (or UndefinedBehaviorSanitizer error) reports, triage, and estimate severity.
Rust
279
star
2

web-scraper-chrome-extension

Web data extraction tool implemented as chrome extension
JavaScript
171
star
3

oss-sydr-fuzz

OSS-Sydr-Fuzz - OSS-Fuzz fork for hybrid fuzzing (fuzzer+DSE) open source software.
C
127
star
4

Futag

FUTAG (FUzzing Target Automated Generator) - автоматический генератор фаззинг-оберток для библиотек
Python
51
star
5

scrapy-puppeteer

Library that helps use puppeteer in scrapy.
Python
43
star
6

pu4spark

Positive-Unlabeled Learning for Apache Spark
Scala
40
star
7

rop-benchmark

ROP Benchmark is a tool to compare ROP compilers
Python
36
star
8

crusher

Python
35
star
9

qdt

QEMU Development Toolkit
Python
34
star
10

atr4s

Toolkit with state-of-the-art Automatic Terms Recognition methods in Scala
Scala
33
star
11

spark-openstack

Scripts to setup Spark cluster (any version) in any Openstack environment with optional useful tools.
Jinja
30
star
12

juliet-dynamic

Juliet C/C++ Dynamic Test Suite
23
star
13

qemu-gui

GUI for QEMU
C++
20
star
14

hdl-benchmarks

Collection of open HDL modules, subsystems and microprocessors (benchmarks) that are used for related tools testing.
Verilog
17
star
15

michman

Service for distributed systems deployment; part of Asperitas
Go
17
star
16

natch

Natch: инструмент определения поверхности атаки
Shell
16
star
17

sydr-benchmark

Sydr benchmark applications
C++
15
star
18

quix86

An x86-64 instruction decoder.
C
15
star
19

cotea

cotea: Ansible control tool
Python
14
star
20

EcgLib

Python
13
star
21

centos6.9-build-docker

CentOS 6.9 build Docker environment to distribute portable Linux binaries
Dockerfile
11
star
22

swat

SWAT - System-Wide Analysis Toolkit
C
11
star
23

proceedings

Proceedings of ISP RAS LaTeX Template
TeX
10
star
24

v8-aotc

V8 ahead-of-time compilation project
C++
10
star
25

scrapy-puppeteer-service

A special service that runs puputeer instances.
JavaScript
10
star
26

tact

C
8
star
27

lingvodoc-react

JavaScript
7
star
28

texterra-py

Texterra python sdk
Python
7
star
29

utopia-hls

Utopia: a High-Level Synthesis framework
C++
7
star
30

lingvodoc

More advanced Python version for Dialeqt project
JavaScript
7
star
31

riscv-avs

RISC-V Architecture Verification Suite (AVS)
Assembly
7
star
32

microtesk-old

MicroTESK: Specification-Based Framework for Developing Test Program Generators
7
star
33

tm

Regularized multilingual Probabilistic Semantic Analysis Scala implementation.
HTML
6
star
34

TrustedDynamic

Dockerfile
6
star
35

proceedings-md

Automatic markdown to docx converter that follows the Ispras proceedings design requirements
TypeScript
6
star
36

clouni

Cloud Unifier Tool for Service Orchestration
Python
5
star
37

dedoc-utils

Useful utilities for automatic document images processing
Python
5
star
38

FuzzedDataProviderCS

FuzzedDataProvider for C#, inspired by Google's FuzzedDataProvider.
C#
5
star
39

parmasan

Mirror repository with parmasan project
C++
5
star
40

microtesk

MicroTESK: Specification-Based Framework for Developing Test Program Generators
Java
5
star
41

gocotea

gocotea: Ansible control tool on Golang
Go
5
star
42

endometrium-dataset-analysis

This repository is dedicated to the analysis of the EndoNuke dataset
Jupyter Notebook
4
star
43

esoc

Ethernet Switch on Configurable Logic
Stata
3
star
44

angiocells_analysis

Jupyter Notebook
3
star
45

libosuction

A tool for stripping dynamic libraries of unneeded symbols
C
3
star
46

news-page-dataset

3
star
47

I3S

Python
2
star
48

parmasan-remake

Mirror repository with patched remake for parmasan
C
2
star
49

minimap2_index_modifier

C
2
star
50

hls-idct

Inverse Discrete Cosine Transform (IDCT) algorithm implementations are written in languages for High-Level Synthesis (HLS) and Hardware Construction (HC) tools.
Verilog
2
star
51

sv-tests

Test suites based on Verilog and SystemVerilog standards
Verilog
1
star
52

cv

Klever Continuous Verification Framework
Python
1
star
53

flagsup

Build flags extractor and summarizer.
Python
1
star
54

mammo_crop

Jupyter Notebook
1
star
55

dedockerfiles

Collection of dockerfiles for dedoc group projects
Dockerfile
1
star
56

qdt-guest-agent

C++
1
star
57

NetBlox

Java
1
star
58

staccato

Fork for the STACCATO project of University of Michigan
C
1
star
59

flint

Scalable machine learning framework
Scala
1
star
60

gephi-graphson

Importer and exporter plugins for Gephi for GraphSON format
Java
1
star
61

PTAHA

Patent Timesaving Automatic Helpful Apparatus
R
1
star
62

RISC-V-nML

RISC-V nML is a specification of ISA RISC-V in nML architecture decription language.
1
star