• Stars
    star
    60
  • Rank 503,336 (Top 10 %)
  • Language
    Python
  • Created about 4 years ago
  • Updated about 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Many Natural Language Processing tasks rely on sentence boundary detection (SBD). Although amazing libraries like spacy provide state of the art SBD, they often depend on text extractors (e.g pdf text extractors or OCR). The quality of these extractors greatly influence the quality of SBD libraries and as a consequence, the performance of downstream models as well. To help address this problem, we fine-tuned a T5 model from the hugging face hub that attempts to reconstruct “broken sentences”