In closed-off, inaccessible academic circles, linguist brohs openly hate on computer brohs for creating models of language that are based on “probability” and that don’t take into account the “underlying structure of language”.  The hardcore nativist bros believe that every language has the same underlying structure and that corpus-based language software is useless.

Researchers that use “machine learning algorithms” (a fancy way to say ‘teaching a computer to predict future behavior based on previous behavior’, statistically) to understand something in the world without giving a shit about the ‘meaning of that behavior’ are hated on by the people that study that meaning 4 a living.  When a person comes along and tries to model the behavior of something they do not have a deep understanding of, the community that studies the behavior will get upset.  It is a good and maybe healthy reaction.

The models generated by the math and computer bros have had success in terms of producing viable irl products.  To troll the linguist people, computer guys are now getting kinda good at looking at the “underlying structure” of language, mainly, the syntax of sentences and coupling it with their statistical models to understand language better.

Take for example this simple sentence structure in tree-format:

Instead of deciding statistically that “happy” follows before a word like “linguists”, a model relying on sentence structure allows for “dynamic”-y modelling of language.  A *very* simple model, with the only structure as the one above, can still produce some cool things if used correctly.  Rules:

S  –> NpVp

NP –> AdjN

VP -> VN

So, using this very simple rule, I created a script to generate short rhyming rap bursts following this sentence structure ((combining rhyme algorithm stuffs w/ language modelling stufs)).  Here are some examples I generated from the rap corpus + publicly available n-gram models:

  • classic mouth pull the plow //
    fat red round buy the cow //
    late night crowd make the rounds //
    open routes find the lounge //
  • good stream down beat the count //
    sparkling crowd buy the cow //
    heavy clouds produce a growl //
    common grounds give a cow //
  • vocal booth cut the loops //
    next man do divide the schools //
    certain lieu cut the jews //
    expensive route pull the school //
  • major news combine the attributes //
    several pews preserve the rule //
    own lust to get some value //
    long arc through give the proofs //
  • bad tone to send a group //
    your own tissue ensure the use //
    so many crew bring the groups //
    private pools see no two //

All of these follow the general rule S->NPVP (NP ->AdjN, VP -> VN).  And they rhyme!  Some are nonsensical and this is a far way off from anything that would be *really* interesting but it’s a world I want to learn about and do cool shit in.

If you want to fuck around with the thing, visit this link.


  1. Love your stuff man, another really interesting post. From someone who’s deeply interested in the fine line between computer generated music and that subjective…editing that imparts “soul” to randomly generated beat sequences, I think this is an interesting lyrical comparison.

    Have you thought about exploring this further. For example add a “rhyming module” so that you can trigger a random rhyme on a given word or an alliterator module. Then by randomly (or user directed) triggering of the different modules you could generate couplets that also include other rap style usages.

    I’d be interested to map something like this to a drum machine so that each syllable triggers a different instrument in a set pattern.

    I’d be really interested in working with you on some of this stuff man, hit me up x

