The Text Analytics Pipeline (TAP) software has undergone significant improvements over the last few months. HETA project Technical Lead Andrew Gibson (QUT) has been working on a batch mode that allows the processing of unlimited numbers of documents from Amazon S3 Cloud storage, opening the way for more scalable analytics on millions of documents.
The TAP client for python (tapclipy) is also being improved to allow the batch mode to be used from within Python. The latest version of TAP can always be found at https://github.com/heta-io/tap and the python client at https://github.com/heta-io/tapclipy . If you’re looking to try bleeding-edge features, check out the develop branch, otherwise for a stable version use the master branch.
TAP has previously suffered from a lack of clear documentation (as do many software research projects). However, this issue is now being addressed by Mike Powell (QUT) together with Sarah Taylor (RMIT). The revised documentation can be found at https://heta-io.github.io/tap/ and includes instructions for getting started from Docker, or with the Source code, as well as helpful information for developers.