Skip to main content

Command Palette

Search for a command to run...

Digit Detective: How to Use Regular Expressions in Python to Find Digits

Mastering Regex for Extracting Numbers in Python

Updated
4 min read
Digit Detective: How to Use Regular Expressions in Python to Find Digits

What are Regular Expressions ?

Have you ever had the need to match particular strings of text such as particular characters, words or patterns of characters?

A Regular Expression, also referred to as 'regex' or 'regexp', is a clever 'wildcard' expression used for matching and parsing strings.

Why Regular Expressions ?

Flexibility in pattern matching
Regular expressions offer a wide range of metacharacters that enhance flexibility in defining matching criteria, allowing for precise and versatile pattern matching
Easy Data Extraction
Regular expressions can facilitate extracting specific portion of text from larger strings based on defined patterns, useful for parsing structured data like log files, CSVs.

Regular Expressions Quick Guide

^            Matches the beginning of a line
$            Matches the end of the line
.            Matches any character
\s           Matches whitespace
\S           Matches any non-whitespace character
*            Repeats a character zero or more times
*?           Repeats a character zero or more times (non-Greedy)
+            Repeats a character one or more times 
+?           Repeats a character one or more times (non-Greedy)
[aeiou]      Matches a single character in the listed set
[^XYZ]       Matches a single character not in the listed set
[a-z0-9]     The set of characters can include a range
(            Indicates where string extraction is to start
)            Indicates where string extraction is to end

A practical Example

We have a text file named sample.txt with text and numbers scattered throughout.

Why should you learn to write programs? 7746
12 1929 8827
Writing programs (or programming) is a very creative 
7 and rewarding activity.  You can write programs for 
many reasons, ranging from making your living to solving
8837 a difficult data analysis problem to having fun to helping 128
someone else solve a problem.  This book assumes that 
everyone needs to know how to program ...

Think that we want to extract the numbers scattered throughout and get the total of them. To achieve that, we want to,

  1. open the file

  2. read line by line

  3. find all the numbers which are in each line

  4. get the total from each line

  5. adding all totals from each line altogether

By breaking down our approach step-by-step, we can efficiently calculate the total of scattered numbers from the text file.

In order to use regular expressions, we have to import that library

import re

We let the user to enter the file name

import re
fname = input('Enter file name:')
fhand = open(fname) # openning the file

Since the file is opened we can scan through it line by line to find numbers embedded withing the text.

import re
fname = input('Enter file name:')
fhand = open(fname)

for line in fhand:
    num = re.findall('[0-9]+',line)

A regular expression is now used to find numbers. Let's see what it does.

[0-9]+

This regular expression pattern finds sequences of digits in the text. Square brackets mean a character class and there is a 0-9 range inside those brackets resulting the matching of any digit from 0 to 9. Then there's the plus sign which means 'match one or more occurrences of the preceding element.

Let's do print and see what we get.

import re
fname = input('Enter file name:')
fhand = open(fname)

for line in fhand:
    num = re.findall('[0-9]+',line)
    print(num)

If there are numbers in lines they were returned as elements in a list. This happens for every line and there are empty lists since there are lines with no numbers scattered throughout them. So, before going further we have to ignore those empty lists. By checking if is the list empty, we can ignore the empty lists

import re
fname = input('Enter file name:')
fhand = open(fname)

for line in fhand:
    num = re.findall('[0-9]+',line)
    if num:
        print(num)

Now the numbers are extracted. But these numbers are strings at this moment and they should be converted into integers.

import re
fname = input('Enter file name:')
fhand = open(fname)

for line in fhand:
    num = re.findall('[0-9]+',line)
    if num:
        intnum = [int(digit) for digit in num]

We can obtain the total of the numbers in each line first, before calculating the overall total. This approach is necessary because there are multiple numbers spread throughout some lines.

import re
fname = input('Enter file name:')
fhand = open(fname)

for line in fhand:
    num = re.findall('[0-9]+',line)
    if num:
        intnum = [int(digit) for digit in num]
        linetotal = sum(intnum)

Now we can get the overall total.

import re
fname = input('Enter file name:')
fhand = open(fname)

total = 0
for line in fhand:
    num = re.findall('[0-9]+',line)
    if num:
        intnum = [int(digit) for digit in num]
        print(intnum)
        linetotal = sum(intnum)
        total = total + linetotal
print("Overall total is: ",total)

I hope you found this article useful! Thanks for stopping by.