Release of the extract_lines Python module

Yesterday I wrote-up a page about extract_lines, a Python module I developed recently to automate some text extraction tasks.

Many engineers have to analyze tabular or line-oriented human-readable reports and source code. For instance, suppose we want to extract constant coefficients from a C source file (snippet taken from Bessel functions of the first kind of order 0 from uClibc):

static const double
u00  = -7.38042951086872317523e-02, /* 0xBFB2E4D6, 0x99CBD01F */
u01  =  1.76666452509181115538e-01, /* 0x3FC69D01, 0x9DE9E3FC */
u02  = -1.38185671945596898896e-02, /* 0xBF8C4CE8, 0xB16CFA97 */
u03  =  3.47453432093683650238e-04, /* 0x3F36C54D, 0x20B29B6B */
u04  = -3.81407053724364161125e-06, /* 0xBECFFEA7, 0x73D25CAD */
u05  =  1.95590137035022920206e-08, /* 0x3E550057, 0x3B4EABD4 */
u06  = -3.98205194132103398453e-11, /* 0xBDC5E43D, 0x693FB3C8 */
v01  =  1.27304834834123699328e-02, /* 0x3F8A1270, 0x91C9C71A */
v02  =  7.60068627350353253702e-05, /* 0x3F13ECBB, 0xF578C6C1 */
v03  =  2.59150851840457805467e-07, /* 0x3E91642D, 0x7FF202FD */
v04  =  4.41110311332675467403e-10; /* 0x3DFE5018, 0x3BD6D9EF */

A common case would be to transform that data so it can be more easily used within a spreadsheet or another program, in a form such as this:

u00,-7.38042951086872317523e-02
u01,1.76666452509181115538e-01
u02,-1.38185671945596898896e-02
u03,3.47453432093683650238e-04
u04,-3.81407053724364161125e-06
u05,1.95590137035022920206e-08
u06,-3.98205194132103398453e-11
v01,1.27304834834123699328e-02
v02,7.60068627350353253702e-05
v03,2.59150851840457805467e-07
v04,4.41110311332675467403e-10

Whenever data must be extracted from that kind of document, it usually involves a big mess of cut’n’paste sprinkled with search and replace. For the most part, it is a job of manual editing, especially if it is a one-time affair. However, what happens when we must automate the task ? Tools such as sed, grep and awk within UNIX shell pipelines can do the job, but they are somewhat inflexible and pretty hard to use. Furthermore, they don’t allow the programmer to easily process the data within the same program that does the extraction.

The extract_lines module was written to facilitate the automation of these tasks in Python with Regular Expressions. Several examples are provided on the project page. Take a look 🙂

Leave a Reply