Course: Regular expression VHDL engine
Learn to build a regex processing VHDL module with runtime-reconfigurable pattern matching and Unicode support, using AI-generated Python scripts to create configurations from regexes.
Description
This course teaches how to create a text-processing pipeline of VHDL modules that support UTF-8 (Unicode) and can search for regular expression (regex) patterns in the text stream.
You will learn how to create a reconfigurable finite-state machine (FSM) that allows us to upload a new regex pattern at runtime.
Furthermore, we’ll reduce the pipeline’s data width by creating a classifier VHDL module that reclassifies 21-bit Unicode symbols into custom VHDL types that require fewer bits.
Finally, we’ll make a top-level testbench that uploads the configuration bytes for a given regex pattern to the regex engine and classifier module before streaming a UTF-8 test file through the pipeline.
This course is only available in the VHDLwhiz Membership.
The membership subscription gives you access to this and many other courses and VHDL resources.
You pay monthly to access the membership and can cancel the automatic renewal anytime. There is no lock-in period or hidden fees.
No FPGA board is required as this course is a pure simulation exercise.
Software used in the course
I use Windows 11 in the course. All the other software is available for free for Windows and Linux:
- Questa – Intel FPGA Edition(includes Starter Edition)
(Any version of ModelSim or QuestaSim will work) - Microsoft Visual Studio Code
(Any editor will do) - A Python
- ChatGPT Codex
(Any free or premium AI chat bot can do it)
Course outline
The overview below shows the lessons in this course.
1 - Introduction
Welcome to the course! Let's talk about regex and what we'll create.
2 - Regex to NFA and DFA
We'll use Thompson's construction algorithm to convert the regular expression into a non-deterministic finite automata (NFA) and then to a deterministic finite automata (DFA) state machine.
3 - Regex with branching and merging NFA paths
Let's walk through a slightly different regex that requires the NFA to have parallel paths that converge in the end before the accepting state.
4 - Regex testbench and module entity
First, we'll create a regex VHDL engine implementation that only supports ASCII and a static search pattern. But let's start with the testbench.
5 - Static pattern FSM
To get started on the VHDL regex engine, we'll first implement a finite-state machine (FSM) supporting only a static pattern: (ab)+.
6 - FSM implementation challenges
Whether a substring is a full match depends on a variable number of characters before and after, and that’s tricky to handle in hardware.
7 - Class map outline
Our regex module shall support Unicode (UTF-8). But won't that make the data width very wide? Not necessarily.
8 - Class map config and testbench
The subset of Unicode characters this module can classify shall be runtime reconfigurable. Let's use a daisy-chained shift register for that.
9 - Class map classifier logic
The classifier logic will use parallel comparators to check if the input Unicode character matches a codepoint we are looking for.
10 - TXT to Graphviz using AI
We'll use AI (ChatGPT) to generate the Python scripts for us. Let's start with the graph visualizer tools.
11 - NFA and DFA generator scripts
Using ChatGPT Codex in VSCode, we'll create two more Python scripts to derive the NFA and DFA graphs from regex patterns.
12 - Regex FSM states config
To make the regex FSM reconfigurable, we'll add a configuration port and an array to store the config data for each state in registers.
13 - Regex config mapping
Let's use nested generate statements to map the config shift register bytes to the record members in the states array.
14 - Regex config data simulation
It's time to update the testbench to upload the dynamic configuration bytes for a DFA graph to the regex engine VHDL module.
15 - Regex test data procedure
In this lesson, we'll convert the send_str procedure in the regex testbench to send class_t type symbols instead of ASCII characters.
16 - Reconfigurable FSM
Finally, we can complete the runtime reconfigurable implementation of the FSM process by using the configuration data.
17 - Top module
Now that we have all the modules needed to build a UTF-8 text-processing regex pipeline, we'll put them in a top-level VHDL file.
18 - Top testbench config data
Fortunately, we can reuse procedures from the regex and class_map testbenches to upload configuration data to the top module.
19 - Top testbench UTF-8 text test
Let's test the VHDL regex engine by streaming a UTF-8 (Unicode) text file through the pipeline.
This course is only available in the VHDLwhiz Membership.
The membership subscription gives you access to this and many other courses and VHDL resources.
You pay monthly to access the membership and can cancel the automatic renewal anytime. There is no lock-in period or hidden fees.




Reviews
There are no reviews yet.