Gallery

Actual Example 1 (Text)

Input Strings:

“A regular expression (shortened as regex or regexp;[1] sometimes referred to as rational expression[2][3]) is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" or "find and replace" operations on strings, or for input validation. Regular expression techniques are developed in theoretical computer science and formal language theory.” - Wikipedia

Learned regex:

(expres{2}ion(\[2]\[3]\))?|pat{2}erns?|(ope)?ration(al|s)|(r|R)egular|string(-searching|s,)|validation\.|regex(p;\[1])?|theor(etical|y\.)|"find"?|(sequ|sci)ence|(charac|compu)ters?|sometimes|input|developed|for(mal)?|techniques|(\(short|a?r|t|u?sp?|l?anguag)e(cifies|place"|ned|(fer{2}e)?d|xt\.)?|(algo|o)r(ithms)?|(th|m)at(ch)?|to|(a|i|U)?s(ual{2}y|uch)?|in|on|and|of|by|A|a)

Actual Example 2 (Optimality vs. Readability: URLs)

Input Strings (12):

http://1.alpha.com

http://2.alpha.com

http://3.alpha.com

http://4.beta.com

http://5.beta.com

http://6.beta.org

http://7.beta.org

https://1.alpha.com

https://2.alpha.com

https://3.alpha.com

https://4.beta.com

https://5.beta.com

Learned regexes (3), sorted by decreasing optimality and increasing readability:

1: ht{2}ps?:/{2}(1|2|3|4|5|6|7)\.(alph|bet)a\.c?o(m|rg)

2: (ht{2}ps?:/{2}(1|2|3)\.alpha\.com|ht{2}ps?:/{2}((4|5)\.beta\.com|(6|7)\.beta\.org))

3: ((ht{2}ps:/{2}(1|2|3)|ht{2}p:/{2}(1|2|3))\.alpha\.com|(ht{2}ps|ht{2}p):/{2}4\.beta\.com|(ht{2}ps|ht{2}p):/{2}5\.beta\.com|ht{2}p:/{2}(6|7)\.beta\.org)

Actual Example 3 (Repeating Substrings)

Input Strings (4):

waabbccddaabbccddr

waabbcffggvcffggvcffggvddaabbccddaabbccddr

waabbcffggffggvcffggffggvcffggffggvddaabbccddaabbccddr

waabbcffgeegeevcffgeegeevcffgeegeevddaabbccddaabbccddr

Learned regex:

(wa{2}b{2}cf{2}g{1,2}((f{2}g{2}vcf{2}g{2}){2}f{2}g{2}v|(vcf{2}g{2}){2}v|e{2}ge{2}vcf{2}(ge{2}){2}vcf{2}(ge{2}){2}v)d{2}|w)(a{2}b{2}c{2}d{2}){2}r

Actual Example 4 (Scalability: 50 Random Strings, with lengths between 1 and 100)

Input Strings (50):

Learned regex:

(((c{0,2}e{1,2}c{0,2}|c{1,3})?d{1,3}((c{3}d|c)?e{1,3}(c{3}e{3}dc|cd|ced|c{1,2}(ded{3})?|c|de|d)?|cd{2}|e?c(dc|d)?|ec|c)?|c?e{1,2}(ce|c{1,2})?|c)?b{1,5}((d{3}cbcd{2}cdb|dc{2}|d{2}|dbd|d)?e{1,3}((b{2}d|b|de)c{1,2}(ec{2}edebed(ec){2}d{2}bdcdedbdced{2}ebc{2}b{2}edbe{2}dbdcbdedb{2}dec{2}|ec{2}ed{2}e{2}dbcecbd{2}bdc{2}beb(edc){2}db(bed){2}ce{2}bed{3}c{2}d{2}edbdeb{3}|ebcd{2}ed(bd){2}decd{3}b{2}deb{2}ecbc(db){2}ce{2}b{4}(ebd){2}bcedbc{2}bd{4}bcbdc)|deb{2}(ed){2}cbde{3}cbdbcebcd{2}ecbd{2}ed{2}cb{2}debe{2}cbdb{2}e{2}c(cde){2}ecbc{3}db{2}|bdbcede{2}d{2}cbedebcdbeb{2}cd|cdbc{2}e{2}cdbcdc{3}e{2}(bd){2}cbcd(bd){2}ebcdeb(ec{2}){2}b{3}ed|(bd{2}cdcebce{2}b{2}c{2}ecbecebe(e{2}b{2}d)?c{2}b|cede{2}d{2}cedb(cbd){2})c{2}d(ced{2}c{2}b(bce){2}e|de{2}becedebe{2}cebd{2}bc{2}b(ed){2}c{2}bebdcd{4}(bc){2}c{3}de{2}dcdedb{2})|c?c{2}b{2}|(de){2}e{2}dcebd{2}bedce{2}bd|ce{2}cdcbdeced{3}cde)|(db{3}de{2}be|(db)?db(be){3}d)?c{1,2}(ecb{2}c{2}bde{3}bdbedbdebcbe{2}db{3}dcb{2}ce{2}d{2}ec{4}ecdbd|debcebdbe{2}c{3}ecbc{2}bedc{3}ebdbcb(ce){2}|be{2}d{2}ebec{2}edebc{2}d{2}e{2}bce{3}decebedbe{4}c{2}edbe{2}dc{2}b{2}cbdb)|((db{2})?cdeb|(c{2}db{2}d{2}cb|e)de(e{2}becb{3}dc|c{3})|ede{3}d{2}cdbc{2}be{3}b|(ebd{3}cbd{3}|dbc{2}db)c{2}b(c{2}de{3})?|c)ce(dbd{2}b{4}e{2}c{3}db{2}(e{2}d){2}bdeb{3}d{2}ebcde{2}ce{4}d{2}(eb){2}db{2}decdeb{2}edebedcb{3}ed(be){2}eb{4}c|(dcbec{2}|dbe{2}cd)c{2}d(cecbdbcdc{2}becd{2}b(cd){3}cbdc{2}d(dc){2}c(bd){2}edbec(be){3}d{2}bebdcec{4}de{2}bdedc|cdecedbdcd{2}ec{2}e(cd){2})|ecdce{2}cdc{2}d{2}bdcbd|d{3}ecb|cbe{2}|db)|e?dcbcdbd{2}b{2}d(bed){2}bcb{2}de{2}cbecebd{2}c{2}bedeb(dc){3}ecbe{3}dbe{2}bcbedc{2}(e{2}d){2}cbcebe{2}c{3}d{5}|(e{3}c{3}|c{0,2}e(be{2}|c)?|cbebc|c{3}|c)?d{1,4}((b{2}|cbdb{2}|cbdb|c{2}|bc{2}|cb|b)?e{1,3}(d{0,3}c(bedce{2}bcedecdbcd{2}ed{5}(ced){2}dbce{2}(bd){2}(dbc){2}cde(dc){2}e|de{2}dbe{2}dc)ebe(cecbcd{2}b{2}dbede{2}d|dc(bc){2}bdc{3}de{3})(cb{2}){1,2}(cd{2}e{2}dcb{2}(de){2}c{2}b{2}ec{2}dbecedb{2}cb{3}(cde){2}d{2}bcd{2}cecdbe)?|((ce){2}ecedbe{3}bc{2}b{2}dcedeb{2}ec{2}be(edb){2}decdbcb|b{2}cebc)?c{2}d((be){2}cecb{2}cdc|bdcebce{2}b{2}cdcb(cbde){2}(dc){2}e{2}b(cd){2}ce{3}|d(db){2}b(be){2}e)|((ce){3}(eb){2}db{3}d{2}bec{2}ebcd{2}ce{2}dcdec{2}e|debced{3}ecdeb{3}dcdb{2}deced{2}bde{2}cb{2}e{3}c{3}ed{2}cde{2}dceb{2}dedc(ec{2}){2}edb{2}cd)c{2}b(cbc{3}edc{2}dbeb|b{2}c{3})|cb{2}debecdc{2})|(b?(ecd)?(cb)?ede{3}cbdb{3}cb{2}|cbdec{2}d|cde{3}cdcb)ce(d(de){2}ecebcdbd{2}ed{3}c{2}ecdedbebd{4}ce(cd){2}(db){2}c{2}d{2}bc(bd){2}debce{2}bd|db{2}de{3}be{2}cbebce{3}c{2}d{2}bec{2}bc|(be){2}e{2}b{2}dceb)|(edb{2})?c(dedce{5}cdecb{2}decbd{2}b{3}cdecbdcd{2}ecb{2}cbd{2}e{2}c{2}be{2}c|d)|(b{2}cde(dc){2}edebc)?ec(d(c{2}e){2}dc{2}eb{2}ed{2}ce{3}c{3}ecbd(ce){2}d{2}b{2}c{2}ebecdbcecdc{2}edbde(ec){2}dced|d{2}ecd{2}cb(bd){2}c{2}d{2}bcbde{3}b{2}c{2}db{2}ecdbdec(eb){2}(bd){2}ce{3}c{2}d{2}b(de){2}bcb)|c?ecbedcbdebd{2}bcbecd{2}becdbdc{2}bc{6}bc{3}e{2}bd(cb){2}(cd{2}c){2}d{3}bc{2}b{3}dec{2}db(ed){2}e{2}bec(dce){2}ed)?|(e?ecb(eb{2}cec)?|e{2})?cd(bdeb{2}e{4}db{3}c{4}ebec(dbedcd){2}ed{2}cebec{2}bed{3}ede{2}bdc{2}d{2}|decb{2}dce{2}bdcedec{2}dbcbe(ed){2}ec{2}b(db){2}bdbedcecd{2}bcdecd{2}|(ede{3})?ed{3}b{3}(db){2}ec{2}dcbcdec{2}bdcecdec|d)|ebe(bc){2}bdb{2}edcebedeb(ebd){2}b|c{2}eb{2}ce{2}dcedb{2}dc{2}e{2}b{2}c|ebdbc)?|c)