Branch Prediction Elimination

By: Kirk Weedman

www.hdlexpress.com

Below is my progress of the Method to Eliminate Branch Predictions

1. Nov. 2015: Working on a patent "A Novel Concept to Eliminate Branch MisPredictions in Pipelined CPUs".
So far I can't find a patent method like this. This method is very practical for FPGAs, and maybe even for modern CPUs. This will improve CPU performance as branch mispredictions can cause modern CPU's to lose significant performance. I'm currently working on designing a new 5 stage pipeline CPU that will demonstrate the method showing how smooth CPU branches could be in the future. A PowerPoint presentation has been finished describing two methods and I'm just going over it again before working on documentation and maybe creating a patent. Whether I decide to get a patent will depend on feedback from other engineers and engineering professors I know.

The new logic/pipeline smoothly transitions to a branch-taken or branch_not_taken at the end of the Execute stage, without stalls (caused by the branch instruction) or flushing the pipeline. It will smoothly switch even if there are multiple back-to-back branches in the pipeline or even if there is a branch-taken in the current pipeline sequence followed by an immediate branch-taken in the next instruction sequence. It's a unique design as far as I can tell.

The method is applicable to deeper CPU pipelines as well. These two methods are mostly hardware with a minimum of one new CPU instruction. There are no software techniques used or needed such as loop unrolling, or other branch misprediction optimization techniques.

Jan 4, 2016: First working RTL simulations.

Jan 8, 2016: Continued testing/debugging in Verilog/ModelSim simulation. Methods 1 and 2 seem to be working well so far. I'm also investigating a new technique (Method 3) that would allow pipelined processors to be redesigned with Method 3, and also allow it to work with existing compiled code (no need for the 1 new instruction(s) that Methods 1 & 2 use). It looks promising. It appears I have one main hurdle, but I think it's doable.

Jan 10, 2016: Preparing a PowerPoint presentation to explain Methods 1 & 2 & show working simulation to 3 engineering university professors in WA State to get their input/feedback.

Jan. 17, 2016: Working on implementing Method 3 (no new instruction needed as in previous methods). Working on Verilog code for new pipelined CPU with Method 3 as well as a simulation testbench.

Jan. 25, 2016: I will admit that under certain conditions there may be a stall of a few clocks, but nothing like flushing a pipeline, etc.. due to a misprediction. There are no mispredictions in this design. There are also certain conditions, depending on the type of CPU, where some instructions that follow a branch can execute faster than normal. (I'll have to explain that one day). Currently debugging Method 3 in simulation

Jan. 28, 2016: First working simulation of Method 3 in a new 5 stage pipeline (no extra CPU instruction like was used in Methods 1 & 2). This doesn't mean it's all debugged yet. The simulation included a loop and some IF/ELSE statements inside the loop. I will need to simulate different tests. I discovered that for the CPU designed, that it should really have another stage between the Fetch and Decode. There is some logic used with an adder (due to Method 3) currently in the Decode stage that would cause delays in an FPGA/ASIC that could be greatly reduced by putting it in a new stage before Decode. This is because the Decode stage needs the results of the adder. Putting it in Fetch is not good either and would cause a longer Fetch stage delay. To save time to show that Method 3 works, I haven't done this, but hope to someday. I'm more concerned about getting the method to work than designing a new CPU right now.

Feb 9, 2016: Modifying/debugging ISA. Most of the instructions appear to be working. Still slower than it should be due to no forwarding logic yet. Also fixes/changes/additions to the C compiler that generates the code. These changes will allow me to write more C code for testing. Pipeline still fixed at 5 stages for now. Began working on some forwarding logic.

Feb. 15, 2016: Added register forwarding. Will work on adding other types of forwarding. Started creating technial drawings. Made a Visio drawing of the pipeline showing the new logic for branching as well as forwarding logic and control signals used in the pipeline.

Feb. 22, 2016: Fixes/changes to KPU55 ISA and KCC. Updates to PPT presentation. Created a 29 page Word document (from the PPT presentation) explaining Methods 1 & 2. Still need to add Method 3 to PPT and Word docs. Word doc is long because it has LOTS of diagrams/illustration.

Mar. 1, 2016: Updated Word document to include Method3 as well as the PowerPoint presentation. Researching how these methods may apply to OoOE processors.

Oct 24, 2016: Actively working on the new Out of Order Architecture while waiting for opportunity to give a PowerPoint on the Method(s) to professors at a University in WA.

Jan 19, 2017: Busy working on the new Out of Order CPU. Still need to arrange time & give presentation to WA state professors.