Make Timesheet Parser with Hoa Compiler

I have been keeping my timelogs in a plain-text timesheet format as follows:

2019-02-11
09:00 [JIRA-1234] Adding some functionality
10:00 [standup]
10:15 [JIRA-1234] Fixing that annoying bug
11:00 [JIRA-2134] Review
12:00 [lunch]
13:00 [JIRA-1234] @pairing
14:00 [confused]
18:00 [finish]

2019-02-12
09:00 ...

Bascially, it's much quick to log my time realistically. I don't need to continually break my concentration and assign time to tickets as I work on them, or strain to remember (or make up) what I did at the end of the day, or even at the end of the week. It also means I can record what really happened, and not just logging random events on the ZZ-22 "catch-all" ticket where the information is lost to the powers of analysis.

The only problem is that every week I need to translate this into not one but two JIRAs, this is an operation that involves a huge amount of clicking and waiting and confusion and ppaaiinn.

So, pain once a week instead of pain every day. But there is no reason that this situation cannot be ameliorated - we can parse the timesheet. Once we parse the timesheet we can sync it automatically with JIRA and my Monday morning trauma is at an end, and our project managers can be happier as I can accurately curate and log my time every day effortlessly.

Parsing the Timesheet

Why do we want to parse the timesheet? We want to extract the information from it, and eventually produce a data structure like:

[
    '2019-01-01' => [
        'entries' => [
            [ 'time' => '10:00', 'category' => 'AN-1234', 'comment' => 'foobar' ],
            [ 'time' => '11:00', 'category' => 'lunch', 'comment' => 'foobar' ],
        ]
    ],
    '2019-01-02' => [
        'entries' => [
            [ 'time' => '10:00', 'category' => 'AN-456', 'comment' => 'foobar' ],
        ]
    ]
]

Once we have structured data we can do something useful with it.

We could use regular expressions to extract the data, but, well, it might not end well. Instead we are going to use a compiler and we are not going to write any PHP code at all.

The HOA Compiler

The HOA Compiler. The HOA Compiler is an amazing library which can take a grammar in the form of a .pp file (see here for good and detailed documentation).

The timesheet document is composed of one or more date entries (of the form YYYY-MM-DD) and each date entry consequently contains a list of entry items, each defining the time, and optionally a category, comment and one or more tags.

Let's skip straight to it:

%token newline            \n
%token space              \s
%token date               [0-9]{4}-[0-1][0-9]-[0-3][0-9] -> entry

%token entry:time         [0-9]{1,2}:[0-9]{1,2}
%token entry:break        \n\n -> default
%token entry:newline      \n
%token entry:space        \s
%token entry:text         [a-zA-Z0-9'"\h.-]+
%token entry:tag          @[a-zA-Z0-9-_]+
%token entry:bracket_     \[ -> category

%token category:name      [A-Za-z-_0-9]+
%token category:_bracket  \] -> entry

#document:
    date()*

#date:
    <date> <newline>? entry()*

#entry:
    <time> <space>? category()? <space>? <text>? tag()* (<newline> | <break> )?

#category:
    <bracket_> <name> <_bracket>

#tag:
    <space>? <tag>

So first we have the tokens, which are PCRE (regex) patterns. These define lexemes the which are like the "atoms" of our grammar. We then define the rules which combine these atoms - when prefixed with # become nodes in the AST (more on this later). Note the following:

  • The document has zero or many (*) date() rules.
  • Each date() rule is composed of a <date> followed by zero or one (?) newlines, followed by one or many entry() rules.
  • Each entry() rule must have a valid <time> token, followed by one or zero spaces, followed by a category() rule, followed by... etc.

Did you notice the -> symbols? These are namespace transitions, they mean that, when encountring a date token the lexer should switch to the date namespace - and it will then only consider tokens in this namespace, this is necessary to stop rules conflicting (you don't want to interpret a date token in a category for example). The Compiler also allows you to transition to the previous namespace using __shift__ (see the docs for more info).

Namespaces are essential, and are what really help make the compiler a much better option than simple regular expressions.

You may notice that we parse the category as a rule, and the tag as a token. There is no particular reason for this other than laziness. But let's look at the difference when the AST is rendered:

Tag:

#tag
>  token(entry:space,  )
>  token(entry:tag, @barfoo)

Category:

#category
>  token(entry:bracket_, [)
>  token(category:name, AA-1234)
>  token(category:_bracket, ])

The information we really want from the above two examples is the name - barfoo and AA-1234 respectively. With the category we can easily extract this information from the token in the AST, but with the tag we need to perform additional processing (e.g. ltrim('@barfoo', '@')) in order to obtain the tag name (barfoo).

But wait, how did we get here?

Parsing the Timesheet

In order to do anything useful, we want to get our hands on an AST (Abstract Syntax Tree). This will be the data structure containing all of our data, more-or-less neatly organized into a tree structure of nodes (remember these are defined in the grammar with the # prefix, e.g. #entry), each node contains the set of tokens (and their values) defined in the rule.

We use the HOA Compiler as follows:

use Hoa\Compiler\Llk\Llk;
use Hoa\File\Read;

$compiler = Llk::load(new Read(__DIR__ . '/../../resources/timesheet.pp'));
$ast = $compiler->parse($string);
// profit!

Once we have the AST we can visualize it using the Dumper class provided by HOA:

use Hoa\Compiler\Visitor\Dump;

$dumper = new Dump();
echo $dumper->visit($ast);

Producing something like this:

>  #document                          
>  >  #date                                 
>  >  >  token(date, 2019-01-01)          
.>  #document                          
>  >  #date                                 
>  >  >  token(date, 2019-01-01)          
>  >  >  token(entry:newline,                
)                                           
.>  #document                             
>  >  #date                                 
>  >  >  token(date, 2019-01-01)                       
>  >  >  token(entry:newline,             
)                                           
>  >  >  #entry                                                                                                                       
>  >  >  >  token(entry:time, 10:00)   
>  >  >  >  token(entry:space,  )           
>  >  >  >  token(entry:text, Fo)                                                                                                     
>  >  >  >  token(entry:newline,                     

Walking the AST

The AST variable has type TreeNode and can be traversed easily with helper methods such as getChildren(), to do something useful with it, you will probably want to walk the tree, the basic idea is something like the following:


class TreeWalker { public function walk(TreeNode $node): array { $dates = []; foreach ($node->getChildren() as $childNode) { if ($childNode->getId() === 'date') { $dates[] = $this->walkDate($childNode); } } return $dates; } private function walkDate(TreeNode $node): array { $date = []; foreach ($node->getChildren() as $childNode) { if ($childNode->getValueToken() === 'date') { $date['date'] = new DateTimeImmutable($childNode->getValueValue())); } if ($childNode->getId() == 'entry') { $builder->addEntry($this->walkEntry($childNode)); } } return $date; } private function walkEntry(TreeNode $node): array { // etc. } }

This is a simplified version, see here for the complete version.

Note that we progressively build our data set and extract information from the tokens in the tree.

Summary

Now that we have walked the AST we have a data structure suited to our needs, and the next step is to build some rudimentary reporting and then integrate with the JIRA API. Along the way the above will probably change significantly - but fortunately it is now easy to change.

The official documentation provides a much greater depth of knowledge than this blog post does, but it is perhaps useful to see it explained from a different perspective.

Here is to great and future hopes of increased productivity powered by HOA.

Phpactor Extensions

Over the past month or so I have been gradually migrating Phpactor to use Extensions.

This started because I wanted to add Language Server capabilities to Phpactor, but having two RPC mechanisms in the same application seemed overkill, so I decided to extract everything into extensions in order that all of the components could be easily reused and recombined (so that a phpactor-language-server standalone application could be created).

In addition I wanted the ability to add framework and tool specific functionality, which doesn't belong in the main distribution. This all pointed the way to having user extensions.

Installing Extensions

Writing an Extension

Extensions have a few key attributes:

  1. The extension package should have a package type of phpactor-extension and an extra attribute phpactor.extension_class which points to...
  2. The extension class which implements Phpactor\Container\Extension.

That's it. The extension class is just a DI container (similar to Pimple but with tags and parameters) with additional configuration (something like the Symfony Option Resolver).

Stupid Completor

DISCLAIMER: Phpactor is not currently not stable, and some packages have no tagged release at all.

Lets make a completion extension. This extension will accept some configuration: stupid_completor.items and it will return these items as suggestions every time it is invoked.

First of all we will need to require the phpactor/container package (this is the only strict requirement) and the phpactor/completion-extension (as we are building a completor) and ensure our composer file has the following attributes:

  1. A type of phpactor-extension
  2. An extra property with the FQN of the extension class.

It might look something like this:

{
    "name": "acme/stupid-completion-extension",
    "description": "Stupid Completion Support",
    "license": "MIT",
    "type": "phpactor-extension",
    "minimum-stability": "dev",
    "require": {
        "phpactor/container": "^1.0",
        "phpactor/completion-extension": "~0.1",
    },
    "autoload": {
        "psr-4": {
            "Acme\\Extension\\StupidCompletion\\": "lib/"
        }
    },
    "extra": {
        "phpactor.extension_class": "Acme\\Extension\\StupidCompletion\\StupidCompletionExtension"
    }
}

NOTE: that the completion extension has no release at time of writing so minimum-stability: dev is currently required.

We need to create a completor class to provide our stupid suggestions, let's put it in lib/Completion/StupidCompletion.php:

<?php

namespace Acme\Extension\StupidCompletion\Completion;

use Generator;
use Phpactor\Completion\Core\Completor;
use Phpactor\Completion\Core\Suggestion;

class StupidCompletion implements Completor
{
    private $suggestions;

    public function __construct(array $suggestions)
    {
        $this->suggestions = $suggestions;
    }

    public function complete(string $source, int $byteOffset): Generator
    {
        foreach ($this->suggestions as $suggestion) {
            yield Suggestion::create($suggestion);
        }
    }
}

Now we need the extension class, this will integrate our completor, this should be in lib/StupidCompletionExtension.php as with the above:

<?php

namespace Acme\Extension\StupidCompletion;

use Acme\Extension\StupidCompletion\Completion\StupidCompletion;
use Phpactor\Container\Container;
use Phpactor\Container\ContainerBuilder;
use Phpactor\Container\Extension;
use Phpactor\Extension\Completion\CompletionExtension;
use Phpactor\MapResolver\Resolver;

class StupidCompletionExtension implements Extension
{
    public const PARAM_ITEMS = 'stupid_completor.items';

    public function load(ContainerBuilder $container)
    {
        $container->register('stupid_completor.stupid_completor', function (Container $container) {
            return new StupidCompletion(
                $container->getParameter(self::PARAM_ITEMS)
            );
        }, [ CompletionExtension::TAG_COMPLETOR => []]);
    }

    public function configure(Resolver $schema)
    {
        $schema->setDefaults([
            self::PARAM_ITEMS => [
                'hello', 'goodbye'
            ]
        ]);
    }
}

Note that above:

  1. We add a tag to our completor from the CompletionExtension. Anything that is "public" is exposed as a public constant, including tags and services (TAG_* and SERVICE_*).
  2. We set some default configuration, when used with Phpactor this can be set in .phpactor.yml as stupid_completor.items.

Testing it Out

You could probably now push your extension to packagist, or add it as a path repository in Phpactor's extensions/extensions.json file (which is actually a composer.json file):

    "repositories": [
        {
            "type": "path",
            "url": "\/home\/daniel\/www\/phpactor\/stupid-completor-extension"
        }
    ]

Once this is done you are ready to install it with:

$ ~/.vim/plugged/phpactor/bin/phpactor extension:install acme/stupid-completion-extension

Note that Phpactor will load extensions based on the contents of the file extensions/extensions.php - if you experience issues you may want to disable the extension temporarily in this file.

Making a Standalone Application

Sometimes you might create an extension which can be used standalone. This is beneficial for user testing and if the extension can be useful without Phpactor.

Our standalone application will provide completion results over Phpactor's RPC protocol and will need the command line interface, so require the following:

$ composer require phpactor/completion-rpc-extension phpactor/console-extension

Create a standalone RPC application for stupid completion: just create the following file in bin/stupid-completion:

#!/usr/bin/env php
<?php

use Acme\Extension\StupidCompletion\StupidCompletionExtension;
use Phpactor\Container\PhpactorContainer;
use Phpactor\Extension\Completion\CompletionExtension;
use Phpactor\Extension\Console\ConsoleExtension;
use Phpactor\Extension\Logger\LoggingExtension;
use Phpactor\Extension\Rpc\RpcExtension;
use Phpactor\FilePathResolverExtension\FilePathResolverExtension;
use Symfony\Component\Console\Application;

require __DIR__ . '/../vendor/autoload.php';

$container = PhpactorContainer::fromExtensions([
    StupidCompletionExtension::class,
    CompletionExtension::class,
    ConsoleExtension::class,
    RpcExtension::class,
    LoggingExtension::class,
    FilePathResolverExtension::class,
], []);

$application = new Application();
$application->setCommandLoader(
    $container->get(ConsoleExtension::SERVICE_COMMAND_LOADER)
);
$application->run();

Note that:

  1. We instantiate a PhpactorContainer
  2. We manually added all the required extensions (the container will shout at you if any extensions were missing).
  3. We create a new Symfony Application and retrieve the command loader from the console extension.
  4. We run the application

Make it executable with chmod a+x bin/stupid-completion and now you have a stupid RPC completor!

$ echo '{"action": "complete", "parameters": {"source": "<?php ", "offset": 2}}' | ./bin/stupid rpc --pretty
{
    "version": "1.0.0",
    "action": "return",
    "parameters": {
        "value": {
            "suggestions": [
                {
                    "type": null,
                    "name": "hello",
                    "label": "hello",
                    "short_description": null,
                    "class_import": null,
                    "info": null
                },
                {
                    "type": null,
                    "name": "goodbye",
                    "label": "goodbye",
                    "short_description": null,
                    "class_import": null,
                    "info": null
                }
            ],
            "issues": []
        }
    }
}

Summary

Extensions should allow Phpactor to be extended in all sorts of ways, as well as providing a very fast way to create entirely new applications based on Phpactor functionality.

The above extension ommits tests for the completor and the extension itself. For a simple(ish) working example see the behat extension.

Rephpactor

TL;DR

Phpactor 1.0 will have no features at all, but it will provide a way to install extensions. All current Phpactor functionality will be extracted to extensions.

Background

One problem with Phpactor has always been that it has not been extensible - it is not possible to, for example, install a Behat extension, or a Phpspec or Symfony extension.

It is not that the infrastructure isn't there internally - it is and was based on the precedent set by Phpbench (which was in turn influenced by other things, notaby Behat, Symfony, Pimple, etc).

Phpbench could be easily included as a dependency of your project, this meant that it was easy to simply include the extension in your project as you would any other library.

Phpactor is a standalone project, you (generally) install it one place and use it everywhere. While you could include new dependencies on the project, it would not be a good idea because you will have conflicts when updating.

Scaling

Another problem has been that Phpactor has been aggregating functionality, and as time has gone on I wish that I could drop certain things, or introduce new domain-specific features.

Another long-standing problem has been lack of code fixers (prettifiers). While I have been tempted to write a Phpactor CS Fixer, it would only have been able to do the absolute minimum to fix the grossest formatting errors in generated code. So it makes far sense to make use of an existing tools php-cs-fixer and phpcs - but it makes not so much sense to bind them to Phpactor, as people will want to use one or the other (often depending on project requirements).

The Language Server

Recently I have been playing with a Phpactor Language Server Protocol (LSP) implementation, I have introduced this into the develop branch, it is generally works quite well. The biggest advantage is that it opens Phpactor up to other text editors with no additional effort, and it means ultimately not having to maintain a phpactor.vim plugin.

The disadvantage is that it's a long running process, and at the moment at least the original Phpactor is more stable.

Anyway - it leads to a problem where more code is added to the core which duplicates existing functionality and introduces more noise. It would be much better if the language server were optional.

Extensions

So this weekend I played with the idea of introducing an embedded composer. After checking out Beau Simensen's embedded composer. I managed to get a stripped down embedded composer working in a prototype project: rephpactor.

Rephpactor

Rephpactor (which will hopefully become Phpactor 1.0) will look something like this with no extensions installed:

Rephpactor

Usage:
  command [options] [arguments]

Options:
  -h, --help            Display this help message
  -q, --quiet           Do not output any message
  -V, --version         Display this application version
      --ansi            Force ANSI output
      --no-ansi         Disable ANSI output
  -n, --no-interaction  Do not ask any interactive question
  -v|vv|vvv, --verbose  Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug

Available commands:
  help               Displays help for a command
  list               Lists commands
 extension
  extension:install  Install extension
  extension:update   Update extensions
  extension:search   Search available extensions
  extension:list     List installed extensions

There is absolutely nothing there! It's amazing.

There is absolutely nothing there! It's amazing.

After initially installing you will be able to use the extension:install command to add packages from Packagist (only those with the phpactor-extension) type are permitted:

$ ./bin/rephpactor extension:install phpactor/language-server-extension

The installed extensions can then be listed:

$ ./bin/rephpactor extension:list
+--------------------------------------+-----------+--------------------------------------+
| Name                                 | Version   | Description                          |
+--------------------------------------+-----------+--------------------------------------+
| phpactor/language-server-extension   | 1.0.x-dev | LSP compatible language server       |
| phpactor/completion-extension        | 1.0.x-dev | Completion framework                 |
| phpactor/worse-reflection-extension  | 1.0.x-dev | Completors and other terrbile things |
+--------------------------------------+-----------+--------------------------------------+

Profit

This change, when it makes it to Phpactor, will make it possible to support more diverse domains. So for example, Symfony DI Completion, or Behat "feature to step jumping". Things get even more interesting at the language-server level.

It would be easy to create for example a PHPStan extension for the language server (and fulfil the LSP APIs for diagnostics) or a php-cs-fixer extension (and fulfil the LSP APIs for code formatting). It would even be possible to add completors based on existing tools (such as Psalm).

The most important thing is, that by removing pretty much everything from Phpactor by default, we can release a stable 1.0 version and there would be much rejoicing.

Feature Agnostic

As a foot note, Phpactor would also be agnostic to function. It would no longer need to do anything related to PHP code development, it essentially just provides a way to install extensions and bootstrap commands.