• Stars
    star
    553
  • Rank 77,349 (Top 2 %)
  • Language
    C++
  • License
    Apache License 2.0
  • Created over 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.

Advanced Literate Machinery

Introduction

The ultimate goal of our research is to build a system that has high-level intelligence, i.e., possessing the abilities to read, think and create, so advanced that it could even surpass human intelligence one day in the future. We name this kind of systems Advanced Literate Machinery (ALM).

To start with, we currently focus on teaching machines to read from images and documents. In years to come, we will explore the possibilities of endowing machines with the intellectual capabilities of thinking and creating, catching up with and surpassing GPT-4.

This project is maintained by the 读光 OCR Team (读光-Du Guang means “Reading The Light”) in the Language Technology Lab, Alibaba DAMO Academy.

Logo

Visit our 读光-Du Guang Portal to experience online demos for OCR and Document Understanding.

Recent Updates

2023.4 Release

  • GeoLayoutLM (GeoLayoutLM: Geometric Pre-training for Visual Information Extraction, CVPR 2023. paper): We propose a multi-modal framework, named GeoLayoutLM, for Visual Information Extraction (VIE). In contrast to previous methods for document pre-training, which usually learn geometric representation in an implicit way, GeoLayoutLM explicitly models the geometric relations of entities in documents.

2023.2 Release

  • LORE-TSR (LORE: Logical Location Regression Network for Table Structure Recognition, AAAI 2022. paper): We model Table Structure Recognition (TSR) as a logical location regression problem and propose a new algorithm called LORE, standing for LOgical location REgression network, which for the first time combines logical location regression together with spatial location regression of table cells.

2022.9 Release

  • MGP-STR (Multi-Granularity Prediction for Scene Text Recognition, ECCV 2022. paper): Based on ViT and a tailored Adaptive Addressing and Aggregation module, we explore an implicit way for incorporating linguistic knowledge by introducing subword representations to facilitate multi-granularity prediction and fusion in scene text recognition.
  • LevOCR (Levenshtein OCR, ECCV 2022. paper): Inspired by Levenshtein Transformer, we cast the problem of scene text recognition as an iterative sequence refinement process, which allows for parallel decoding, dynamic length change and good interpretability.