I got a "Server is busy!" error.
We built this web app as a demonstrator for our paper presented at GECCO 2012. It is not designed to handle hundreds of concurrent requests, but we are working toward making the prototype able to tolerate a reasonable amount on traffic. Follow our Twitter account to be notified about improvements (but please do not expect huge scalability: we are a small group with limited resources).
Are you going to release the code?
Yes! The source code is now available on GitHub!
The generated regexes look unoptimized. Why?
There are some optimization which could make a given regex appear better, e.g., [\d\d\.\d]
→ [\d\.]
. Currently, our prototype does not apply any of these optimizations.
Note, however, that optimizing the regexes during the evolution could be counterproductive, because it could reduce diversity and hence hamper the GP ability of exploring the space of solutions.
Should the examples be representative?
Yes, definitely. As for any approach which synthetizes a solution given only a set of examples, the quality of the generated solution does depend on the quality of the examples. In particular, the examples should be representative of the actual use expected for the solution. Of course, this is a very intuitive and informal description of this issue.
Why do you use some examples for "validation" purposes ? Why not use all the examples for learning ?
Validation examples are necessary for assessing the generalization ability of solutions generated based only on learning examples. This information is an essential component of the criterion for choosing the best solution among those generated by the evolutionary search.
Why don't you run the evolution on the client side, using JavaScript?
Unfortunately, the JavaScript engine for regular expressions does not use possessive quantifiers. We instead prefer possessive quantifiers because they allow faster evaluations than greedy and lazy quantifiers. Since the evolutionary search performed by GP requires each regex candidate to be applied on each example, using possessive quantifiers would require a tremendous amount of time - too much to be practical.
What is Genetic Programming? Which parameter values do you use?
For the details about the algorithm used by this web app, please read ours IEEE Computer and GECCO papers. Note that some of the parameter values used in this web app are different from what presented in the paper, because they are optimized for obtaining a good regex as soon as possible, rather then obtaining a good and compact regex later.
The regular expression generated does not work! There are too many +
in it! Are you generating wrong regular expressions?
No, the generated regular expression is syntactically correct. However, in order to work it requires a regular expression engine which supports possessive quantifiers like the Java one.