Kontakt

Testing Long-Running PHP Code with Generators

by Sebastian Kurfürst on 09.08.2021

Sometimes, I am writing some longer-running process logic in PHP - often in the style of processing loops. This usually leads to a quite-long function which packs quite some functionality in it, but is also quite readable.

An example of such a processing loop is implemented in the new Flowpack.DecoupledContentStore - a  two-stack CMS package for Neos we are currently developing. Inside, there is the NodeRenderOrchestrator whose job is to figure out what needs to be done in order to render all pages; and then check whether all pages were rendered successfully. An example of this control loop is roughly seen below:

  1. function orchestrateRendering() {
  2. // try at most 10 major iterations
  3. for ($i = 0; $i < 10; $i++) {
  4.  
  5. // find out what needs to be done in order to converge to a known-good state
  6. $pagesToRender = findUnrenderedPages();
  7.  
  8. if (count($pagesToRender) === 0) {
  9. // we are finished!
  10. exit(0);
  11. }
  12. addRenderingJobsToJobQueue($pagesToRender);
  13.  
  14. while (!allJobsProcessed()) {
  15. sleep(1);
  16. }
  17. }
  18. }

This code has several difficulties which makes it hard to test:

  • It is running for quite a long time, and only terminates when everything is done.
  • The code has an exit() call inside, which would also terminate all testcases.
  • In line 14, it assumes that there are other workers which run in parallel to this code - which is hard to simulate in a single-threaded PHP environment.

Additionally, to provoke certain error scenarios, we often want to run the code only up to a certain position, then provoke an error, and then let it continue running and see if it deals with the error in a good way.

Initial Idea: Extract Some Functions

In a first quest to make this code better testable, it felt useful to extract some functions for some of the behavior above. It turns out to be rather difficult, because much of the logic is encoded in the outer-loop; so in the way how the different code sections above interact on a high level. Additionally, we needed to use Reflection to call these inner methods; or alternatively override quite some inner methods.

These solution ideas have a big drawback: We need to tear apart the control loop, just for the sake of testing.

So, what are our other options? Somehow, it would be good to say something like: "Please run until a certain point in time, then interrupt the execution. After I have done some stuff in my testcase, resume the execution at exactly this point in time."

Generators to the Rescue

It turns out this suspending and resuming behavior is possible in PHP (and other languages supporting Generators) using the Yield keyword. Let's see in a quick example how generators work:

  1. <?php
  2. function gen(): \Generator {
  3. echo "Gen: start\n";
  4. yield 1;
  5. echo "Gen: after yield 1\n";
  6. yield 2;
  7. echo "Gen: after yield 2\n";
  8. }
  9.  
  10. echo "outer: begin\n";
  11. $myGenerator = gen();
  12. echo "outer: after initial function invocation\n";
  13. $res = $myGenerator->current();
  14. echo "Result: $res\n\n";
  15.  
  16. $myGenerator->next();
  17. echo "outer: after next()\n";
  18. $res = $myGenerator->current();
  19. echo "Result: $res\n\n";
  20.  
  21. $myGenerator->next();
  22. echo "outer: after next()\n";
  23. $res = $myGenerator->current();
  24. echo "Result: $res\n\n";
DISPLAYED OUTPUT:
 
outer: begin
outer: after initial function invocation
Gen: start
Result: 1
 
Gen: after yield 1
outer: after next()
Result: 2
 
Gen: after yield 2
outer: after next()
Result: 

Take a moment to digest what you are seeing here:

  • When gen() is executed and the Generator is returned, the function is only executed up to (and including) the first yield statement. Then, the function invocation is suspended at the yield position (line 4).
  • The currently yielded value can be accessed using $myGenerator->current().
  • You can advance the generator (resume the suspended function) by calling $myGenerator->next(). This then invokes the suspended gen() function up to the next yield statement.

If this sounds like magic, it somehow is :-) Generators need to be supported by the language runtime itself, because they need a way to suspend a function and keeping the function's stack; so that it can be resumed at a later point in time.

Generators for Testing Control Loops

Now, back to our original question. We can emit some events from our processing logic, and we need an outside runtime to handle these events for us. Pretty much like this:

  1. function orchestrateRendering(): \Generator {
  2. // try at most 10 major iterations
  3. for ($i = 0; $i < 10; $i++) {
  4. yield new BeginNextIterationEvent();
  5.  
  6. // find out what needs to be done in order to converge to a known-good state
  7. $pagesToRender = findUnrenderedPages();
  8.  
  9. if (count($pagesToRender) === 0) {
  10. // we are finished!
  11. yield new ExitEvent(0);
  12. return;
  13. }
  14. addRenderingJobsToJobQueue($pagesToRender);
  15.  
  16. yield new RenderingQueueFilledEvent();
  17.  
  18. while (!allJobsProcessed()) {
  19. sleep(1);
  20. }
  21. }
  22. }

Now, our runtime to execute the generator can look like this:

  • In the simple (production) case, we pull all values (=Events) from the generator in a loop, as fast as possible.
  • In the testing case, we pull all values (=Events) from the generator in a loop until we hit our "breakpoint". This way, we can interrupt processing code at the exact position where we need it, then do our adjustments, and then resume it again.
  1. public function run(\Generator $generator): void
  2. {
  3. while ($generator->valid()) {
  4. $currentEvent = $generator->current();
  5. if ($currentEvent instanceof ExitEvent) {
  6. exit($currentEvent->getExitCode());
  7. }
  8. // try to read next event
  9. $generator->next();
  10. }
  11. }
  12.  
  13. public function runUntilEventEncounteredForTesting(\Generator $generator, string $eventClassName): void
  14. {
  15. while ($generator->valid()) {
  16. $currentEvent = $generator->current();
  17. if ($currentEvent instanceof ExitEvent) {
  18. // we do not exit here
  19. // stop iterating the iterator in all cases
  20. return;
  21. }
  22. if (is_a($currentEvent, $eventClassName)) {
  23. // stop here, can be continued lateron.
  24. return;
  25. }
  26. // try to read next event (if not stopped)
  27. $this->generator->next();
  28. }
  29. }
  30.  

Closing Thoughts

At first I was unsure whether this approach is a good one, because I have not seen it beforehand in practice. However, after using it for some days now, it still feels quite elegant to me - so I am quite happy with it so far. If you have further feedback, please let me know on twitter @skurfuerst :-)

You can check out the full source code of our long-running processes (part of Flowpack.DecoupledContentStore) inside NodeRenderOrchestrator.php and NodeRenderer.php - and the runtime to execute and interrupt these processes can be found in InterruptibleProcessRuntime.php.