The user inserts a dataset composed of one or more examples. An example is typically a text file but may be also a chunk of text inserted by the user in a dedicated window. The user annotates the examples with the desired matches, i.e., the only portions of text to be extracted.

The tool attempts to generalize beyond the provided examples, i.e., it attempts to understand what is the general pattern that the user wants to extract. That is, the input information is a sort of incomplete specification of a general text extraction problem.

When clicking "Evolve!" the tool provides, after a reasonably short time, a regex with the (hopefully) desired behavior. It also provides accuracy measures of the proposed solution on the input data.

If you are interested in classifying full strings without generalizing (as opposed to extracting from strings, with generalization), then you should try our regex golf tool.

Internally, the application makes use of Genetic Programming (GP)

The examples are splitted into 2 subsets: the training set, used to drive a GP-search within a population of candidate regular expressions; and the validation set, used to choose the best regex in the current population. The search is iterated for a predefined amount of iterations.