Automatic Generation of Regular Expressions from Examples

We have two on-line tools for different flavors of this problem:

Data Extraction

You provide:

  1. a set of input strings;
  2. for each string you specify the (possibly empty) substring to be extracted.

The tool attempts to infer a general data extraction pattern from the examples. It generates a regular expression that may be applied to a data stream for extracting only the portions complying with the inferred pattern.

Based on "Automatic Synthesis of Regular Expressions from Examples", to appear on IEEE Computer (IEEXplore link; short preliminary version at ACM GECCO 2012)

Regex Golf

You provide:

  1. a set of strings to be matched;
  2. a set of strings to not be matched;

The tool does not attempt to infer any general pattern. It generates a regular expression that classifies the provided strings, in a spirit identical to http://regex.alf.nu/ and http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313-part2.ipynb

Based on "Playing Regex Golf with Genetic Programming", to appear at ACM GECCO 2014. Finalist at Humies 2014 - Human Competitive Results Produced by Genetic and Evolutionary Computation.