Better regex to match numbers in 03_classification

The previous regex would not match any decimal number where there is no "E" notation. Also added the option "+/-" sign in the "E" part.
main
Ian Beauregard 2020-08-11 09:57:53 -04:00 committed by GitHub
parent a102114c62
commit d7afbd511d
1 changed files with 1 additions and 1 deletions

View File

@ -2342,7 +2342,7 @@
" for url in urls:\n",
" text = text.replace(url, \" URL \")\n",
" if self.replace_numbers:\n",
" text = re.sub(r'\\d+(?:\\.\\d*(?:[eE]\\d+))?', 'NUMBER', text)\n",
" text = re.sub(r'\\d+(?:\\.\\d*)?(?:[eE][+-]?\\d+)?', 'NUMBER', text)\n",
" if self.remove_punctuation:\n",
" text = re.sub(r'\\W+', ' ', text, flags=re.M)\n",
" word_counts = Counter(text.split())\n",