About
This website performs Machine Learning of Non-trivial Optimal Regular Expressions.
These learned Regular Expressions are:
- Descriptive
- Non-trivial
- Optimal or near-optimal
- Executable
- Matching or Exact Matching
- Learned from positive example input strings
This has not been done before—it’s a breakthrough in Computer Science and Machine Learning.
This solves the Regular Expression Induction (REI) problem in a significant and practical way.
Up to 23 regexes are learned for each input set—providing a choice between Optimality, Readability, and Abstractions.
Definitions
- Descriptive: The input strings can be reconstructed from the learned regex.
- Optimal: The shortest regex based on Significant Length.
- Executable: Compatible with standard regex engines.
- Matching: Matches all input strings.
- Exact Matching: Matches all and only the input strings.
- Abstractions: Use of character classes (e.g.,
\d,\w) and ranges (e.g.,[3-5bg-j]). - Plain Length: Total characters in the regex.
- Significant Length: Count of input string characters in the regex.
- Expansion Factor: (matched strings count) ÷ (original input strings count). For exact matches:
1.0X.
Notes
-
Purpose:
- As close to optimal as possible
- Executable
- Readable by humans
- Supports human analysis of string/sequences
- Introduces a new form of explainable machine learning
- Shortest regex determined by Significant Length
-
Significant vs. Plain Length example for input string aab:
Id Regex Significant Length Plain Length 1 aab3: aab 3: aab 2 a{2}b2: a{2}b 5: a{2}b - Using Plain Length, shortest is
aab(3 < 5) - Using Significant Length, shortest is
a{2}b(2 < 3)
- Using Plain Length, shortest is
- Why Significant Length? It emphasizes structure in the regex.