Barney D. Media

A Quick Note on Markov Chain Efficiency and Pre-computing

Fairly recently I built a web scraper for Node.js to grab content from a single domain and store it into MySQL, more or less as a learning excersize. The obvious application of this for me was to build a custom Lorem Ipsum generator (Markov Chain Generator) for our department (Marketing) so our designers could pull it as filler text. This lead to a problem that perhaps should have been obvious from the start; Markov Chains take a lot of time to generate.

“A lot” is a relative term, but each paragraph took at least 3-5 seconds to generate. This would be totally unnaceptable to a user trying to use the service to generate an average of 4 paragraphs (up to 20 second page loads), but luckily there was an easy workaround.

Pre-computing/Caching to the Rescue

The only practical way to make this a service was to pre-compute a bunch of Markov Chains into cache files and call them dynamically. This was useful because it created a sense of randomness, without having to be truly random. We ended up using about 17 cores across 3 machines to run all the generation processes. They were manually started to run indefinitely and the processes were ended when we came back in the morning. This yielded a couple hundred megabytes of text, which corresponded to tens of thousands of sentences.

Making it User Friendly

To make it user friendly and easy to access, we split the generated content into 96 text files and returned them from a simple PHP page on our already running Apache installation.

Why 96 files though?

We thought that it would improve performance and reduce memory usage to split the generated text into smaller bits, change the file being read by quarter hour, and pull randomized sentences to form paragraphs of random sentence length. The number of paragraphs was user selectable, and the page returned nearly instantly. We didn’t bother to profile anything since it was running on an internal server and there were no spikes in processing during use, but if it were a public utitility it would be a good idea to do so.

My Experience with Custom Android ROMs

Many years ago I had an iPod Touch that I had jailbroken into doing a bunch of cool but needless things. I really liked iOS at the time (and still do), but I think the jailbreaking experience needlessly put me off from trying Android. Another thing that still rightly prevents people from trying Android is the number and severity of uninstallable apps and needles UI skins applied both by carriers and manufacturers.

There is an obvious incentive for both to install promotional apps, and honestly that’s ok. Really. I can live with it, just like I live with the crapware that PC manufacturers often load on new systems, as long as you let me similarly uninstall them as soon as I get my new phone. Most users are fearful or even incapable of rooting their phones, and it really shouldn’t be necessary to do all of that just to remove some app I don’t want on hardware I own.

That being said, my first Android phone was a Nexus 5. Other than battery life, it was spectacular. There was almost nothing on it I couldn’t do and it only suffered minor types of crashes (generally individual apps) every few weeks. Both features were in stark contrast to my experiences with iOS, and especially jailbroken iOS. Suddenly it wasn’t impossible to browse the contents of a zip file on my phone, or to load an SNES emulator with all my games, or do a port scan on the local LAN. It was incredible, but a few years later it needed to be replaced because of a blown speaker, a mic that was failing, and a power button that was getting stuck causing a restart loop. Several years of good service is all I could realistically ask for though.

That was when I bought an LG2, an “outdated” model on Amazon for around $250 brand new. I use quotes around “outdated” because it still had quantitatively better specs than the iPhone 6 plus. If you don’t believe me, by all means, verify for yourself: iPhone 6s+ vs. LG G2

Rooting the LG G2

Rooting was ridiculously easy and foolproof. I downloaded an app called Stumproot, and let it do it’s thing for a few minutes. No computer or USB cable necessary. This let me remove the crapware installed by AT&T and LG, as well as opening up other sytem utilities and other options. SuperSU is an app that I downloaded immediately after to lockdown root access, which was also kind of a pushbutton process. It should be noted that rooting isn’t this easy for every phone, but it was for this one. My Nexus 5 took a bit more effort to root, but not too much. I suggest researching the root process for your phone before buying a new one, if you plan to root it.

First Android ROM: Resurrection Remix

I’m really not sure where the name for this one comes from, but it has a lot of really neat features built in. Some of the features I liked the most:

  • The app launcher wheel (kind of like OSX’s dock)
  • Ability to add custom launcher buttons to the unlock screen
  • Options for changing notification light behaviors (very granular)
  • Options for changing vibration behaviors
  • Slide the status bar to change screen brightness, no need to even open the drag down menu!
  • Options to unlock when paired with specific bluetooth devices (like my car)
  • Expanded lock patterns for enhanced security
  • Privacy Guard, which prevents apps from leeching any details or sensor data you don’t want them to have
  • OS theme manager

There are a ton more features, but I found that this list was 90% of what I used and liked. After a while though, I noticed that there were a lot of bugs in the OS and some of the features like the app launcher wheel were things I just didn’t use enough to care if I lost them. I used it for about 6 months, and then had to call it quits because of the crashes and the worsening battery life. This led me to try out Cyanogen Mod, the project that Resurrection Remix was based on.

Second Android ROM: Cyanogen Mod

I’ve been using Cyanogem Mod (CM) for several weeks now, and it’s awesome. Apparently, the majority of features I liked from Resurrection Remix (RR) were actually implemented first in CM. This led to much rejoicing as I discovered that the only difference between the list of features above are the first two bullets, which were both features I didn’t use much after trying them out for a bit.

The other major difference between the two ROMs was the battery life and number of daily crashes. CM is basically as stable as the original Android 5 ROM on my Nexus 5, while RR was increasingly buggy. Battery life is also exceptional on CM, I can generally use my phone normally and be above 90% battery remaining after lunch. I generally only charge the phone once every couple of days now. Hopefully CM will be solid in the months to come, but if not, it’s a pretty minor ordeal to reinstall. Just restart, install, boot up, and install the apps from the play store again. Overall, I would highly recommend Cyanogen Mod for any Android phone that comes with a less than desirable manufacturer operating system.

WordPress Site – SimpleSked

I recently put together a theme for a local startup that I’m doing some work alongside. Their idea is to make scheduling hourly employees easy using some simple algorithms and basic machine learning techniques.

The bulk of the actual app will run on Sails.js Node instances with distributed MySQL and Redis rendered via Angular.js on the web and through Ionic, but the front-end site will be WordPress for ease of updates and other integrations. The theme I built uses the fantastic Bootstrap and FontAwesome frameworks, and is therefore responsive and fairly minimalistic.

Check out the screen captures below or visit the website: http://simplesked.com



SimpleSked WordPress Site - Desktop View   SimpleSked WordPress Site - Mobile View

Hubot, Heroku, and Carl

Recently at work, some of us adopted using Hipchat over the Microsoft Lync. We needed something that could handle groups and not crash 10 times a day, and Hipchat provided that and a lot more. Once we discovered that it also offered an API, we of course started thinking about ways to automate common tasks with it, purely for professional purposes of course. That’s when I ran across Hubot for Hipchat, and then Triatomic.

Triatomic made it ridiculously easy to deploy and customize a Heroku dyno (a kind of instance) with Hubot and Redis running on it. We ended up finding a better type of free Redis instance, adding a handful of npm packages, and then introducing it to the group. It was a smash, and we nicknamed it carl after the emoticon in Hipchat, which itself is derived from the ATHF character.

If you have ever wondered what it would be like to have your own robot to throw witty remarks, sarcastic comments, and random google search images into chat I would highly suggest giving Hubot/Triatomic a test. It takes about 5 minutes to get a free Triatomic dyno running, and if you’re running Slack, IRC, Campfire, or something else, there are translator packages available for most systems on npm.

Node.js, C++/CUDA, and memory handling performance

In learning about C++ and all of it’s intricacies, one might be tempted to ask is it worth the extra effort? At least I was asking the question, so I decided to answer it as well. Currently, I’m using a Node.js script to monitor Bitfinex Bitcoin exchange prices, so I decided to see how it handled loading the historical data points into memory.

I have often read that Node.js shouldn’t be used for computation heavy applications, but I was really never sure why (other than Javascript only supporting floating point math). It turns out that Javascripts memory handling carries a lot of overhead, which is obvious and documented, but I didn’t realize just how much overhead until now.

In order to generate moving averages and other technical analysis, I’m loading a sizable chunk of the history into memory before working through it. For the sake of experiment, I loaded in all 5,000,000 data points (timestamp, price, volume) into memory from the CSV. Well, more properly; I tried to load all 5m data points. Node.js choked out at ~1.5m, using 2.5GB of RAM, and slowed to a mere crawl. C++ fared much better doing a very similar operation; all 5m data points loaded in about 30 seconds while only taking ~0.75GB of RAM. This gives me a lot of hope, since I was originally planning to run the averages with CUDA. Using CUDA would naturally limit my memory capacity to the size of the video card (3.5GB usable on an Nvidia GTX970).

For those interested, both file operations were nearly identical; using built in methods from Node.js and C++ respectively to read file streams into lines, then entries which were parsed into ints/floats.

Node.js Script: Find-Feed-Photoshop now on Github

I have been working on getting all the PSDs and raw files exported from my externals into a more compressed format (save for web – jpeg) just for the sake of usability. The process is fairly annoying, since even creating a Photoshop droplet means loading in a max of 250 files at once and I’m dealing with tens of thousands to be processed. A simple Node.js script seemed like the answer.

It’s somewhat cross platform since I’m writing it on a Mac and planning to run it on Windows, but it would probably have to be adapted a bit for someone running Photoshop on Wine. Node.js makes it pretty easy to normalize paths and is otherwise cross platform compatible, so a little preparation goes a long way toward compatibility.

Check out the repo: Github Repo

Bitcoin, C++, CUDA, and Node.js

Something that has been on my mind lately is trading Bitcoin using the various REST-APIs integral to most exchanges.

Building an automated system to trade a handful of times per day based on market ebbs and flows is my plan for now. None of the exchanges are setup (as far as I can tell) to allow ultra-high frequency trading, but that’s probably a good thing for all involved. You can read about some of the controversy from Mark Cuban; the gist is that some think it almost gaurantees profits for those investing billions to do it, but at the expense of individuals, mutual funds, and pretty much everyone else. Bottom line; it would discourage trading in a fledgling market.

First, I’ll go for very few bitcoins and low daily trade volumes, and later I’ll build in some forecasting to account for price shifts during larger trades. Since the commissions are percentage based, I stand to make money even on small buys. Eventually I would like to port some of this logic for working with NYSE, FOREX, or the like, but it will be a while before I can invest that much (>$30k for HFT).

Forecast Model

The models that I have found for HFT range from middle school math to Ph.D. in lambda calculus, but I tend to think simple works better in a system that isn’t largely forecastable. To this end, I’m taking the opportunity to learn enough C++ and CUDA to map out effectiveness of a ton of “magic numbers” in base algorithms and compare what their historical performance would have been. I’ll take some of the winners and apply them to incoming data, and eventually give one of them real dollars and bitcoins to play with.

Free code sample!

I won’t divulge too much about how I’m planning to do it, but I will throw in some code for monitoring an exchange. You’ll notice there are some bits that aren’t really needed or used, but thrown in there for the sake of future extensibility. It seems like it would be worth it to tie in multiple exchanges, and at some point build models for multiple currencies like Litecoin and whatever else. This is written for use with Node.js, so you’ll have to do the whole npm install thing with each dependency. Also make sure you’re not running Node.js 0.12, it doesn’t like the Request.js library.



// requirements
// -----------------------------------------

var cli = require('cli');
var request = require('request');
var async = require('async');
var fs = require('fs');


// structure
// -----------------------------------------

var main = {};
main.baseUrl = 'https://api.bitfinex.com/v1';
main.btcPriceUrl = main.baseUrl + '/pubticker/btcusd';
main.lastTimestamp = 0;
main.openConnections = 0;
main.serialDelay = 1020;
main.logPrices = true;
main.contPull = true;
main.outputDir = './price_logs/';

// make folder for the output
if (!fs.existsSync(main.outputDir)){
    fs.mkdirSync(main.outputDir);
}

main.wStream = fs.createWriteStream(main.outputDir + 'BTC-price-' + Date.now() + '.log');



// moving parts
// -----------------------------------------

getBtcPrice(main.logPrices, main.contPull);

// this function recursively calls itself indefinitely, unless interrupted in terminal
function getBtcPrice(logPrices, contPull) {

  // add to the active connection counter, in case of async stuff later on
  main.openConnections++;

  request(main.btcPriceUrl, function(err, res, body) { 
    if ( !err && res.statusCode == 200 ) {

      var btc = JSON.parse(body);

      // sometimes BFX returns JSON that is out of date
      if (btc.timestamp > main.lastTimestamp) {

        main.lastTimestamp = btc.timestamp;

        // parse -> print to screen && log to file
        console.log( 
          '\nBitFinix@' + btc.timestamp
          + '\n////////////////////////////////////////////////\n\n'
          + 'Last -> ' + btc.last_price 
          + '\nBid/Ask -> ' + btc.ask + '/' + btc.bid 
          + '\nMid/Spread: ' + btc.mid + '/' + Math.round((btc.ask - btc.bid) * 100) / 100
          + '\nHigh/Low -> ' + btc.high + '/' + btc.low
          + '\n\n////////////////////////////////////////////////\n\n'
        );
        main.wStream.write(body);
        main.openConnections--;

      }

      // start the process over, after delay
      if ( main.contPull === true ) {
        setTimeout(function() {
          getBtcPrice(logPrices, contPull);
        }, main.serialDelay);
      }

    } else {
      console.log( 'BitFinix failed... \n\nResponse: \n' 
        + JSON.stringify(res) 
        + '\n\n Body: \n' 
        + JSON.stringify(body) );
    }
  });
}

Beginning the Yearly Performance Analysis

Every year we do a year in review analysis of our email marketing team’s performance. This year we’re looking back on 2014, and I know we have already seen some positive improvements. When I first started working there, almost three years ago now, open rates department wide averaged 17%. Since then we have been doing a monthly results recap with the writers and managers. This year, most months averaged 43% department wide for open rates.

One other aspect I personally add to the process are doing a simple cyclical analysis and a multiple linear regression analysis. The regression analysis gives us a formula to use in campaign forecasting, tells us what factors to consider when sending, and helps sort outliers for reporting purposes later on. The cyclical analysis just lets us know what forecasting offsets to use each month.

I recently found a couple of Node.js libraries that can handle multiple linear regression generation, so in the future we may have a nightly analysis run to generate new forecasting models for the next day. That’s a bit far off, but it would be exciting.

Two obvious caveats to that are the floating-ploat innacuracies inherent in all javascript calculations, and the single-thread nature of Node. I’ll have to do a little digging to figure out whether our numbers can handle the non-tolerances of floating point calculations, but if nothing else, I can always go looking for a Python library.

Local Business Site: Photoworks

A friend contracted me to create a website and design the typical business set (card, brochure, etc.) for him, so I was happy to oblige. I ended up creating a responsive wordpress theme for him and hosting it on a typical shared server setup. It’s been good for him since it’s low maintenance, but there is a GUI for him to use to do minor updates, and of course the wordpress install will be plenty expandable for him in the future.

Twitter Bootstrap and Whiteboard made the coding process fairly quick and easy.

Screen Shot 2015-02-16 at 10.35.20 PM

Another Email Template Update

Our online program needed an updated email template that could link to a few streaming self-help type sessions. They covered a lot of things like how an advisor can help plan course load and how to apply for financial aid.

I designed and coded something that would be easy to make responsive and to update in the future; planning generally goes a long way when creating new templates.

 

WebinarsV2