跳转到主要内容

A curated list of awesome deep learning based papers on text detection and recognition.

 

Text Detection

  • Papers are sorted by published date.
  • IC is shorts for ICDAR.
  • Score is F1-score for localization task.
    • (L) stands for score in leader-board.
    • If the reported score in leader-board is somewhat different from the paper, (L) is provided.
  • *CODE means official code and CODE(M) means that traiend model is provided.
Conf. Date Title IC13 IC15 Resources
'14-ECCV 14/10/07 Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees      
15-CVPR 15/06/01 Symmetry-based text line detection in natural scenes 0.8043   PRJ
CODE
'16-TIP 15/10/12 Text-Attentional Convolutional Neural Networks for Scene Text Detection 0.8165    
'15-ICCV 15/12/13 Text Flow : A Unified Text Detection System in Natural Scene Images 0.8025    
'16-arXiv 16/03/31 Accurate Text Localization in Natural Image with Cascaded Convolutional TextNetwork 0.86    
'16-CVPR 16/04/14 Multi-Oriented Text Detection with Fully Convolutional Networks 0.83 0.54 *TORCH(M)
'16-CVPR 16/04/22 Synthetic Data for Text Localisation in Natural Images 0.847
(L)0.8359
  CODE
DB
'16-arXiv 16/06/29 Scene Text Detection Via Holistic, Multi-Channel Prediction 0.8433 0.6477  
'16-ECCV 16/09/12 Detecting Text in Natural Image with Connectionist Text Proposal Network 0.8215 0.6085 *CAFFE(M)
CAFFE
TF(M)
TF
DEMO
BLOG(CH)
'17-AAAI 16/11/21 TextBoxes: A fast text detector with a single deep neural network 0.85
(L)0.8767
  *CAFFE(M)
TF
BLOG(KR)
'18-TM 17/03/03 Arbitrary-Oriented Scene Text Detection via Rotation Proposals 0.9125 0.8020 *CAFFE
'17-CVPR 17/03/04 Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection   0.7064  
'17-CVPR 17/03/19 Detecting Oriented Text in Natural Images by Linking Segments 0.853 0.75
(L)0.7636
*TF(M)
TF(M)
SLIDE
VIDEO
'17-arXiv 17/03/24 Deep Direct Regression for Multi-Oriented Scene Text Detection 0.86 0.81  
'17-arXiv 17/04/03 Cascaded Segmentation-Detection Networks for Word-Level Text Spotting 0.86 0.71  
'17-CVPR 17/04/11 EAST: An Efficient and Accurate Scene Text Detector   0.8072
(L)0.8038
TF(M)
TF
PYTORCH(M)
PYTORCH
DEMO
KERAS(M)
VIDEO
'17-ICIP 17/05/15 WordFence: Text Detection in Natural Images with Border Awareness 0.86    
'17-arXiv 17/06/30 R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection 0.8773 0.8254 TF(M)
CAFFE(M)
'17-CVPR 17/07/21 Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting In The Wild 0.85 0.63  
'17-arXiv 17/08/17 Deep Scene Text Detection with Connected Component Proposals 0.919    
'17-ICCV 17/08/22 WordSup: Exploiting Word Annotations for Character based Text Detection 0.9064 0.7816  
'17-ICCV 17/09/01 Single Shot Text Detector with Regional Attention 0.8704 0.7691 *CAFFE(M)
PYTORCH
VIDEO
'17-arXiv 17/09/11 Fused Text Segmentation Networks for Multi-oriented Scene Text Detection   0.8414  
'17-ICCV 17/10/13 WeText: Scene Text Detection under Weak Supervision 0.869
(L)0.8313
   
'17-ICCV 17/10/22 Self-organized Text Detection with Minimal Post-processing via Border Learning 0.84   *KERAS(M)
'17-ICDAR 17/11/11 Deep Residual Text Detection Network for Scene Text 0.9117
(L)0.8925
   
'18-AAAI 17/11/12 Feature Enhancement Network: A Refined Scene Text Detector 0.9161    
'17-arXiv 17/11/30 ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene   0.759  
'18-AAAI 18/01/04 PixelLink: Detecting Scene Text via Instance Segmentation 0.881 0.8519 *TF(M) TF
'18-CVPR 18/01/05 FOTS: Fast Oriented Text Spotting with a Unified Network 0.925 0.8984 PYTORCH
PYTORCH
VIDEO
'18-TIP 18/01/09 TextBoxes++: A Single-Shot Oriented Scene Text Detector 0.88 0.829
(L)0.8475
*CAFFE(M)
'18-CVPR 18/02/27 Multi-Oriented Scene Text Detection via Corner Localization and Region Segmentation 0.88 0.843 *PYTORCH(M)
'18-CVPR 18/03/09 An end-to-end TextSpotter with Explicit Alighment and Attention 0.9 0.87 *CAFFE(M)
'18-CVPR 18/03/14 Rotation-Sensitive Regression for Oriented Scene Text Detection 0.89 0.838 *CAFFE(M)
'18-arXiv 18/04/08 Detecting Multi-Oriented Text with Corner-based Region Proposals 0.876 0.845 *CAFFE(M)
'18-arXiv 18/04/24 An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches 0.92 0.86  
'18-IJCAI 18/05/03 IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection   0.9047  
'18-arXiv 18/06/07 Shape Robust Text Detection with Progressive Scale Expansion Network   0.8721 PRJ
'18-ECCV 18/07/04 TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes   0.826 PYTORCH
'18-ECCV 18/07/06 Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes 0.917 0.86  
'18-ECCV 18/07/10 Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping 0.892    
'19-AAAI 18/11/21 Scene Text Detection with Supervised Pyramid Context Network 0.921 0.872  
'19-TIP 18/12/04 TextField: Learning A Deep Direction Field for Irregular Scene Text Detection   0.824 *CAFFE(M)
'19-CVPR 19/03/21 Towards Robust Curve Text Detection with Conditional Spatial Expansion      
'19-CVPR 19/03/28 Shape Robust Text Detection with Progressive Scale Expansion Network   0.857 TF(M)
'19-CVPR 19/04/03 Character Region Awareness for Text Detection 0.952 0.869 *PYTORCH(M)
VIDEO
PYTORCH
TF(M)
KERAS
BLOG_CH
BLOG_KR
BLOG_KR
BLOG_KR
'19-CVPR 19/04/13 Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes Screen reader support enabled   0.877  
'19-CVPR 19/06/16 Learning Shape-Aware Embedding for Scene Text Detection   0.877  
'19-CVPR 19/06/16 Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation 0.917 0.876  
'19-ICCV 19/08/16 Efficient and Accurate Arbitrary-Shaped Text Detection with Pixel Aggregation Network   0.829  
'19-ICCV 19/09/02 Geometry Normalization Networks for Accurate Scene Text Detection   0.8852  
'19-AAAI 19/11/20 Real-time Scene Text Detection with Differentiable Binarization   0.847  

 

Text Recognition

  • Papers are sorted by published date.
  • IC is shorts for ICDAR.
  • Score is word-accuracy for recognition task.
    • For results on IC03, IC13, and IC15 dataset, papers used different numbers of samples per paper,
      but we did not distinguish between them
  • *CODE means official code and CODE(M) means that trained model is provided.
Conf. Date Title SVT IIIT5k IC03 IC13 Resources
'15-ICLR 14/12/18 Deep structured output learning for unconstrained text recognition 0.717   0.896 0.818 TF
SLIDE
VIDEO
'16-IJCV 15/05/07 Reading text in the wild with convolutional neural networks 0.807   0.933 0.908 KERAS
'16-AAAI 15/06/14 Reading Scene Text in Deep Convolutional Sequences          
'17-TPAMI 15/07/21 An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition 0.808 0.782 0.894 0.867 TORCH(M)
TF
TF
TF
TF
PYTORCH
PYTORCH(M)
BLOG(KR)
'16-CVPR 16/03/09 Recursive Recurrent Nets with Attention Modeling for OCR in the Wild 0.807 0.784 0.887 0.9  
'16-CVPR 16/03/12 Robust scene text recognition with automatic rectification 0.819 0.819 0.901 0.886 PYTORCH
PYTORCH
'16-CVPR 16/06/27 CNN-N-Gram for Handwriting Word Recognition 0.8362       VIDEO
'16-BMVC 16/09/19 STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition 0.836 0.833 0.899 0.891  
'17-arXiv 17/07/27 STN-OCR: A single Neural Network for Text Detection and Text Recognition 0.798 0.86   0.903 *MXNET(M)
PRJ
BLOG
'17-IJCAI 17/08/19 Learning to Read Irregular Text with Attention Mechanisms          
'17-arXiv 17/09/06 Scene Text Recognition with Sliding Convolutional Character Models 0.765 0.816 0.845 0.852  
'17-ICCV 17/09/07 Focusing Attention: Towards Accurate Text Recognition in Natural Images 0.859 0.874 0.942 0.933  
'18-CVPR 17/11/12 AON: Towards Arbitrarily-Oriented Text Recognition 0.828 0.87 0.915   TF
'17-NIPS 17/12/04 Gated Recurrent Convolution Neural Network for OCR 0.815 0.808 0.978   *TORCH(M)
'18-AAAI 18/01/04 Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition 0.844 0.836 0.915 0.908  
'18-AAAI 18/01/04 SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network   0.87 0.931 0.929  
'18-CVPR 18/05/09 Edit Probability for Scene Text Recognition 0.875 0.883 0.946 0.944  
'18-TPAMI 18/06/25 ASTER: An Attentional Scene Text Recognizer with Flexible Rectification 0.936 0.934 0.945 0.918 *TF(M)
PYTORCH
'18-ECCV 18/09/08 Synthetically Supervised Feature Learning for Scene Text Recognition 0.871 0.894 0.947 0.94  
'19-AAAI 18/09/18 Scene Text Recognition from Two-Dimensional Perspective 0.821 0.92   0.914  
'19-AAAI 18/11/02 Show, Attend and Read: A Simple and Strong Baseline for Irregular Text Recognition 0.845 0.915   0.91 *TORCH(M)
'19-CVPR 18/12/14 ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification 0.902 0.933   0.913 PRJ
'19-PR 19/01/10 MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition 0.883 0.912 0.950 0.924 *PYTORCH(M)
'19-ICCV 19/04/03 What is wrong with scene text recognition model comparisons? dataset and model analysis 0.875   0.949 0.936 *PYTORCH(M)
BLOG_KR
'19-CVPR 19/04/18 Aggregation Cross-Entropy for Sequence Recognition 0.826 0.823 0.921 0.897 *PYTORCH
'19-CVPR 19/06/16 Sequence-to-Sequence Domain Adaptation Network for Robust Text Image Recognition 0.845 0.838 0.921 0.918  
'19-ICCV 19/08/06 Symmetry-constrained Rectification Network for Scene Text Recognition 0.889 0.944 0.95 0.939  
'20-AAAI 19/12/28 TextScanner: Reading Characters in Order for Robust Scene Text Recognition 0.895 0.926   0.925  
'20-AAAI 19/12/21 Decoupled Attention Network for Text Recognition 0.892 0.943 0.95 0.939 *PYTORCH(M)
'20-AAAI 20/02/04 GTC: Guided Training of CTC 0.929 0.955 0.952 0.943  

 

End-to-End Text Recognition

  • Papers are sorted by published date.
  • IC is shorts for ICDAR.
  • Score is F1-score for generic task.
  • *CODE means official code and CODE(M) means that trained model is provided.
Conf. Date Title IC03 IC13 IC15 Resources
'12-ICPR 12/11/11 End-to-end text recognition with convolutional neural networks 0.67     *CODE
'14-ECCV 14/09/06 Deep Features for Text Spotting 0.75     PRJ
MATLAB
'15-IJCV 15/05/07 Reading Text in the Wild with Convolutional Neural Networks 0.70 0.77   KERAS
'15-TPAMI 15/10/30 Real-time Lexicon-free Scene Text Localization and Recognition   0.542 0.156  
'16-arXiv 16/04/10 TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild   0.6843 0.4718
(L)0.533
*CAFFE(M)
'17-AAAI 16/11/21 TextBoxes: A fast text detector with a single deep neural network   0.84   TF
*CAFFE(M)
BLOG_KR
'17-ICCV 17/07/13 Towards End-to-end Text Spotting with Convolution Recurrent Neural Network   0.8459   VIDEO
'17-ICCV 17/10/22 Deep TextSpotter An End-to-End Trainable Scene Text Localization and Recognition Framework   0.77 0.47 VIDEO
*CAFFE(M)
'18-CVPR 18/01/05 FOTS: Fast Oriented Text Spotting with a Unified Network   0.8477 0.6533 VIDEO
TF(M)
'18-TIP 18/01/09 TextBoxes++: A Single-Shot Oriented Scene Text Detector   0.8465 0.519 *CAFFE(M)
'18-CVPR 18/03/09 An end-to-end TextSpotter with Explicit Alignment and Attention   0.86 0.63 *CAFFE(M)
'18-TPAMI 18/06/25 ASTER: An Attentional Scene Text Recognizer with Flexible Rectification     0.64 *TF(M)
'18-ECCV 18/07/06 Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes   0.865 0.624  
'19-ICCV 19/08/24 Towards Unconstrained End-to-End Text Spotting     0.6994 BLOG_KR
'19-ICCV 19/10/17 Convolutional Character Networks     0.7108 *PYTORCH(M)
'19-ICCV 19/10/27 TextDragon: An End-to-End Framework for Arbitrary Shaped Text Spotting     0.6537  
'20-AAAI 19/11/21 All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting   0.841 0.641  
'20-AAAI 20/02/12 Text Perceptron: Towards End-to-End Arbitrary-Shaped Text Spotting   0.858 0.651  

Others

  • Papers are sorted by published date.
  • *CODE means official code and CODE(M) means that trained model is provided.
Conf. Date Title Description Resources
'14-NIPS 14/06/09 Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition Dataset PRJ
'17-ECCV 17/02/13 End-to-End Interpretation of the French Street Name Signs Dataset Dataset (FSNS) *TF(M)
'17-arXiv 17/04/11 Attention-based Extraction of Structured Information from Street View Imagery FSNS *TF(M)
TF
TF
LUA
BLOG_KR
'17-CVPR 17/07/21 Unambiguous Text Localization and Retrieval for Cluttered Scenes Text Retrieval  
'17-AAAI 17/10/22 Detection and Recognition of Text Embedded in Online Images via Neural Context Models Dataset PRJ
'18-CVPR 17/11/17 Separating Style and Content for Generalized Style Transfer Font Style  
'17-arXiv 17/12/06 Detecting Curve Text in the Wild New Dataset and New Solution Dataset (CTW 1500) PRJ
'18-AAAI 17/12/14 SEE: Towards Semi-Supervised End-to-End Scene Text Recognition FSNS PRJ
*CHAINER(M)
'17-CVPR 18/06/07 Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks Document Layout PRJ
'18-CVPR 18/06/19 DocUNet: Document Image Unwarping via A Stacked U-Net Document Dewarping PRJ
'18-CVPR 18/06/19 Document Enhancement using Visibility Detection Document Enhancement PRJ
'18-IJCAI 18/06/22 Multi-Task Handwritten Document Layout Analysis Document Layout  
'18-ECCV 18/07/09 Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes Dataset PRJ
'19-AAAI 18/12/03 EnsNet: Ensconce Text in the Wild Text Removal DB
'19-CVPR 18/12/14 Spatial Fusion GAN for Image Synthesis Dataset DB
'19-AAAI 19/01/27 Hierarchical Encoder with Auxiliary Supervision for Table-to-text Generation: Learning Better Representation for Tables TableToText  
'19-AAAI 19/01/27 A Radical-aware Attention-based Model for Chinese Text Classification Chinese Character Classification  
'19-CVPR 19/02/25 Handwriting Recognition in Low-resource Scripts using Adversarial Learning Handwritting Recognition TF
'19-CVPR 19/03/27 Tightness-aware Evaluation Protocol for Scene Text Detection Evaluation CODE
'19-ICCV 19/05/31 Scene Text Visual Question Answering Dataset ICDAR_DB
'19-CVPR 19/06/16 DynTypo: Example-based Dynamic Text Effects Transfer Text Effects PRJ
VIDEO
'19-CVPR 19/06/16 Typography with Decor: Intelligent Text Style Transfer Text Effects *PYTORCH(M)
'19-CVPR 19/06/16 An Alternative Deep Feature Approach to Line Level Keyword Spotting Kyeword Spotting  
'19-ICCV 19/07/23 GA-DAN: Geometry-Aware Domain Adaptation Network for Scene Text Detection and Recognition Domain Adaptation  
'19-ICCV 19/09/17 Chinese Street View Text: Large-scale Chinese Text Reading with Partially Supervised Learning Dataset ICDAR_DB
'19-ICCV 19/10/02 Large-scale Tag-based Font Retrieval with Generative Feature Learning Font Retrieval  
'19-ICCV 19/10/27 TextPlace: Visual Place Recognition and Topological Localization Through Reading Scene Texts Place Recognition DB
'19-ICCV 19/10/27 DewarpNet: Single-Image Document Unwarping With Stacked 3D and 2D Regression Networks Document Dewarping *PYTORCH(M)

Other lists

Tutorial Materials

原文:https://github.com/hwalsuklee/awesome-deep-text-detection-recognition