Dedicated Syntax Rap Robot

In closed-off, inaccessible academic circles, linguist brohs openly hate on computer brohs for creating models of language that are based on “probability” and that don’t take into account the “underlying structure of language”.  The hardcore nativist bros believe that every language has the same underlying structure and that corpus-based language software is useless.

Researchers that use “machine learning algorithms” (a fancy way to say ‘teaching a computer to predict future behavior based on previous behavior’, statistically) to understand something in the world without giving a shit about the ‘meaning of that behavior’ are hated on by the people that study that meaning 4 a living.  When a person comes along and tries to model the behavior of something they do not have a deep understanding of, the community that studies the behavior will get upset.  It is a good and maybe healthy reaction.

The models generated by the math and computer bros have had success in terms of producing viable irl products.  To troll the linguist people, computer guys are now getting kinda good at looking at the “underlying structure” of language, mainly, the syntax of sentences and coupling it with their statistical models to understand language better.

Take for example this simple sentence structure in tree-format:

Instead of deciding statistically that “happy” follows before a word like “linguists”, a model relying on sentence structure allows for “dynamic”-y modelling of language.  A *very* simple model, with the only structure as the one above, can still produce some cool things if used correctly.  Rules:

S  –> NpVp

NP –> AdjN

VP -> VN

So, using this very simple rule, I created a script to generate short rhyming rap bursts following this sentence structure ((combining rhyme algorithm stuffs w/ language modelling stufs)).  Here are some examples I generated from the rap corpus + publicly available n-gram models:

  • classic mouth pull the plow //
    fat red round buy the cow //
    late night crowd make the rounds //
    open routes find the lounge //
  • good stream down beat the count //
    sparkling crowd buy the cow //
    heavy clouds produce a growl //
    common grounds give a cow //
  • vocal booth cut the loops //
    next man do divide the schools //
    certain lieu cut the jews //
    expensive route pull the school //
  • major news combine the attributes //
    several pews preserve the rule //
    own lust to get some value //
    long arc through give the proofs //
  • bad tone to send a group //
    your own tissue ensure the use //
    so many crew bring the groups //
    private pools see no two //

All of these follow the general rule S->NPVP (NP ->AdjN, VP -> VN).  And they rhyme!  Some are nonsensical and this is a far way off from anything that would be *really* interesting but it’s a world I want to learn about and do cool shit in.

If you want to fuck around with the thing, visit this link.

top 5 rhyme-iest bars [detected thru ‘complex’ computational methods]

So, as a part of a new project I’m working on it’s been important to go through lines in rap that have lots of rhymes and don’t use big words. Here are some random bars from random songs that fit this criteria that sound nice.

1. Benzino – Bang Ta Dis

blunts bitches clips guns
bars bricks whips funds

Syllables Per Word: 1.125
Rhyme Density: 0.78

2. JayZ – Parkin Lot Pimpin

big thangs thick chains
aint shit changed get brain in the four dot six range

Syllables Per Word: 1
Rhyme Score: 0.67

3. MF DOOM – Meat Grinder

wild west style fest ya’ll best to lay low
hey bro day glo set the bet pay dough

Syllables Per Word: 1.05
Rhyme Score: 0.85

4. Boo – Boo & Gotti Freestyle [off the Fast/Furious soundtrack]

pop off two clips top off new six
rock frost blue wrist still cop two bricks

Syllables Per Word: 1
Rhyme Score: 0.93

5. Cam’Ron – Bout it Bout it

get your legs snapped arm twist ribs cracked
wig tapped play fair day care kids napped

Syllables Per Word: 1
Rhyme Score: 0.93

Chance The Rapper Dislikes Fox News

Chance The Rapper, a young artist out of Chicago, has released a new mixtape that’s pretty good and “soulful”.  It’s fun to listen to and there are a few really high points that make the listener feel good, I think.  He is a good rapper, in technical terms.  I think he is supposed to be “psychedelic” but listenable. Inoffensive/un-weird enough to listen to around a group of people with mainstream-y tastes but cool enough (for now, at least), to be proud of liking him.

The intro song Good Ass Intro has about 12 instances of Chance pronouncing the rhotic coda consonant {looser, another, brother, mother, glitter, clutter, colour, baller, butter, stainer, studded, mater) .  In other words, pronouncing the ‘r’ in words that end with ‘r’.  Not pronouncing the coda (last consonant) ‘r’ is something common to Non-Rhotic accents.  African American Vernacular English (AAVE) is a non-rhotic accent.

Like other non-rhotic varieties, the rhotic consonant /r/ is usually dropped when not followed by a vowel;

Non-Rhotic accents are generally stigmatized and speakers ’embarrassed’ of it will converge to Standard English via pronouncing the ‘r’.  Unlike G-Dropping rates, which has lots of good literature, I have no what is a common ‘R-Drop’ rate for AAVE is or what it’s relation to socioeconomic conditions are.  There are some unsubstantiated theories about non-rhotic consonants re: class warfare.

Another event that may have influenced Southern dialectical patterns, particularly desegregation, which was accompanied by turmoil in the South from the 1950s through the 1970s. The civil rights struggle seems to have caused both African Americans and southern Whites to stigmatize linguistic variables associated with the other group.

Later on in the mixtape, Chance mentions that he hates Fox News and that Matt Lauer isn’t properly reporting the problems in Chicago.  Which is cool.  And mirrors the sensibilities of the young rap fan with opinions on politics probably.  But to me, even though they are technically sloppy, the other kids out of Chicago do a way better of job of capturing the desperation and chaos Matt Lauer isn’t reporting.  Chance’s use of the rhotic consonant probably says a lot about his relationship to the stuff happening in Chicago, I think.  It almost doesn’t matter that he is from there, at least musically.  Seems like he is pretty amorphous geographically.


Can Hipsters ‘make’ a Street Rapper?

Chief Keef is a famous rapper out of Chicago that started off making paranoia-driven gangster raps and quickly transitioned to molly-induced happy party songs after he got paid.  Keef’s early stuff – BANGBack From the Dead – is especially great as a look into the paranoia that drives divisions between these kids.  These differences being partially geographical and partially ‘merit-based’ {If you ain’t poppin pistols, I ain’t rocking wit ya}.

After a lengthy Gawker profile and two excellent tapes, Keef became thinkpiece fodder.  The central thesis by outraged black people with rap/culture opinions was that “white people shouldn’t/aren’t equipped to discuss violent rap”.

“Brian “B.Dot” Miller, who is black, and an editor at Rap Radar, took Sargent to task directly, tweeting at him to “please stop writing about MY culture,” bemoaning “cultural tourists writing about the music of MY culture” and “outsiders like yourself in hipster media that get a hard-on by overanalyzing black music.”

Whatever.  In the widely circulated New Republic article, a (black) blogger raised similar concerns re: white hipsters writing about Chief Keef:

“Motherfuckers see us as ONE fucking unit and THAT is what we want ‘white bloggers’ to understand. Someone sees Waka and then kills Treyvon … Y’all don’t know that fuckin’ struggle of being judged based on someone else’s actions and you NEVER will … You will never understand. Never feel the pain, shame, guilt … You get to be just you. But in America no matter how hard I try someone is ALWAYS judgin based on my skin and when the Chief Keefs appear, people are thinking OMG look at what years of oppression and demoralization have done to a group. They think: niggers.”

The guy who killed Treyvon[sic] probably hated black people way before Flockaveli dropped.  But, whatever.  Both of these people are implicitly assuming that (white) bloggers *made* Chief Keef.  That the sustained interest in Keef’s work was due to white people that write on the internet.  Without the interest of these *bloggers*, Keef would not matter.  The fear being, I assume, that the *power* held by these White Bloggers could create an ecosystem of similar rappers “without artistic merit” (representing the worst of the worst of black culture) being given a platform.  The truth is, Keef built his fanbase in a wholly organic way by getting listeners similar to him and that white bloggers DO NOT matter in terms of creating a “street rapper” like Chief Keef.

I believe that people who comment on Youtube videos, good or bad, provide a solid, measurable way to understand content.   Social media data mining may or may not be a bullshit thing to study but I think if the results (especially ones on either extremes) make sense and pass the ‘eye test’, it’s probably something worth exploring further.  One thing that I think makes sense to look at, especially for videos from ‘street’ rappers, is to see if the people commenting type in some sort of unique, measurable way.

Ebonics is a rule-governed language that can sometimes be studied on paper.  For example, something like G-Dropping can be looked at on text.  Another rule of ‘Ebonics’ has to do with word-initial fricatives.  This is a fancy way to say that words like {This} get pronounced {dis} in spoken language.  And sometimes, because this is an example where the spelling of the word goes with the general rules of sound and stuff, people will actually write {dis}.

Writing a little script to extract comments from a YouTube video, we can find how often users use words like [da, dis, dat] instead of [the, this, that].  It ends up working really well to distinguish artists, stylistically.  The chart below shows rappers that have a high ‘da’ score (street rappers, generally), medium ‘da’ scores (mixed fan bases) and low ‘da’ scores (generally backpackers or ‘barely rap’ bros):

High /da/ Medium /da/ Low /da/
Soulja Slim GZA Ras Kass
Geurilla Maab Esham Atmosphere
Beanie Sigel Missy Elliot Beastie Boys
Eightball & MJG RZA Brother ALi
Pastor Troy Ghostface Lupe Fiasco

Stylistically, the High /da/ guys seem to be similar and might be classified under the umbrella term ‘street rapper’, although maybe unfairly.  It turns out that as a quick separator of styles this metric works really well.  It relies on a phonological rule of AAVE.  It probably is ‘wrong’ to call the [th] -> [d] phenomenon a “spelling mistake”.  This is a systematic rule of AAVE (just like any other phonological rule in any language) that in this instance finds its way data mine-able through text.

If the ‘hipster media theory’ holds up and Keef’s fan base was cultivated through white people blogosphere link sharing, his initial work should NOT have a High /da/ score.  However, this is not the case.  The BANG mixtape has consistently High /da/ scores which indicates that it was probably kids similar to Keef that listened to him first.  That while the people writing about him online now may be mostly white nerds, the people that fell in love with him initially were black kids like Chief Keef.

The High /da/ guys have a median score of ~0.15 and Low /da/ guys have a median score of ~0.01. Keef’s BANG mixtape has a weighted average /da/ score of ~0.18.  This allows us to classify him as a ‘street rapper’.  His song Setz Up has a /da/ score of 0.13.   Looking through the responses to this song, we find this particular comment below which has an instance of /da/ AND /dat/.  It also specifically explains in detail one of the gang references in the song:

Screen Shot 2013-04-03 at 12.03.44 AM

A song riddled with gang stuffs appealing to kids that are hyper-aware of these references.  The ‘hipster media’ didn’t make these kids care about Keef or understand these references.  Most likely, Keef’s music initially represented a reality to the kids in his city.

I think it is important to relate these High /da/ scores to actual lyrical content from songs.   We see that the fans response for High /da/ rappers seems to follow a general trend.  High /da/ score rappers are all generally ‘street rappers’.  We need to find a way to link the lyrical content from these songs to the particular responses.  Ideally, we should find that High /da/ scores in YouTube commentary is correlated to some sort of particular word-usage in songs.

There have to be certain trends in word usage that can be measured?  For example, I’m sure the word {‘nigger‘} is almost exclusively limited to songs by black guys.  Not sure if there are any exclusive ‘white’ words, since white artists probably don’t own any kind of similar exclusivity to lexical items.

No reason we can’t look at this scientifically.  All you really have to do is get good enough datasets for ‘white’ raps and ‘black’ raps.  Mathematically, of course, {Black} ∩  {White} = ∅ ⇔ One-drop Rule.  So, once we have these two datasets we can run some cool machine learning algorithms to train a computer to identify specific ‘white’ and ‘black’ characteristics.

We know from the earlier chart that Pastor Troy is a High /da/ score guy and that Atmosphere is a Low /da/ score guy.  Ideally, using the text classification tool, Pastor Troy should score as more ‘black’ and less ‘white’ than Atmosphere.  It turns out he does.  Considerably.

Screen Shot 2013-04-02 at 11.05.39 PM

With average scores:

Artist Black White
Atmosphere 7.62 27.45
Pastor Troy 47.56 1.81

The data supports our intuition with regards to Pastor Troy.  It seems that Pastor Troy, a High /da/ guy, also has High ‘Black’ scores and Low ‘White’ scores.  Does this data extend to other ‘street’ rappers?  If we use an arbitrary cutoff of 0.05 (about 25% of the songs we mined in a 1500+ song dataset) we see that High /da/ scores generally correlate to Low ‘White’ Scores.  That is, how the fans are talking about an artist is directly correlated to the actual lyrical content.   A pretty sweet discovery.

Screen Shot 2013-04-03 at 12.47.04 AM

We see that Low ‘White’ scores (0-15) correlate with High /da/ scores (>0.05).  That is, there is a 92% chance that a song with a /da/ score greater than 0.05 will have a White Score less than 15.  Pretty great evidence that the way fan bases discuss a street artist is a predictor of the kind of lyrical content an artist has.  Without even listening to a song, we can know what kind of song we are dealing with just by how the fans are interacting with the work.

The blogosphere simply cannot ‘break’ a street artist.  Any shit-talk to the contrary is without merit.

I am Giving A Talk At Boston College March 14th

The event is scheduled to take place in Gasson Hall Rm. 305 at Boston College on March 14, 2013. It will begin at 6:30 PM.  The event is hosted by the Boston College Economics Association.  If you are in Boston or around Boston please come by and listen to me talk about RapMetrics.  I will talk about past projects and potential new ones.

This is exciting for me and these kinds of things let me know that this project isn’t useless and that it is worth doing.

Should We Be Impressed By Eminem in Renegade?

Sometimes Eminem’s verse in Renegade is used by racists to prove that rap is “complex” and that it should be looked at as “real art”.  Probably stems from the fact that these ppl started off by listening to things that were not rap and they didn’t have the ability to contextualize rap music critically by putting value on things like “creativity” or “concepts”, historically/artistically etc.  Then the value of the music comes in the form of “technicality” and guys like Eminem who complain (in Renegade at least) about being too famous, become examples of “great rap music” or something. These racists show HOW COMPLEX Eminem is by showing his “intricate” rhyme patterns like the ones below:

Screen Shot 2013-02-03 at 5.57.16 PM

A 5-syllable rhyme pattern that spans multiple words {lucrative lyrics, youth in hysterics, views in his marriage, you shouldn’t hear it, food for the spirit…}.  A bunch of internal rhymes and assonance and other cool TECHNICAL things that don’t matter because Eminem is complaining about dumb things like being too famous.  It IS hard to reproduce as a rhyme pattern because of how many words are used and it takes HARD MATH and other algorithms to try and reproduce it.  It gave me incentive to try and venture into 5,6,7 syllable rhymes (still very new into that realm) and I’m attempting to try and do that.  I have a bunch of rhymes (~7k) at if you want to see.  Here are some:

troop to the era
loot to the spirit
lose to the lyric
dude with the fearless
deuce in my earring
duke and the merit
cruise like a steering
truce with the karat
loose i aint sparing
who gave us clearance
boom of the serum
blues if you hearing
drew in the parrot
use what you hearing
new and the sheriff

I wrote about Chicago Rap Gang Stuff

Gang references in rap is not new.  Rap beef is not new.  Any elements of rap beef crossing over to real life isn’t new.  But the specific intersection of the the 3, with the ability to view/understand it all digitally might be.  The YouTube-wave of Chicago rappers shouting out blocks and gangs in the songs, and their ability to gain fans within the city that are hyper-aware of all the references has created an interesting opportunity.

With good algorithms to detect these references and aggregate them, YouTube comments (!) end up containing a ton of interesting data.  Gang shootouts with respective positive and negative affiliation explicitly referenced, street corners mentioned, coded terminology thrown around and the ability to cross-reference a user’s posting history with particular songs they liked or commented on.  It’s weird new territory with weird ethical questions that need to be considered.  Perhaps better than I have already.

I wrote about it all at the link at the top.  Please read and share.

Jadakiss/Styles-P 4 syllable rhyme stuff

I guess ever since Volume 5 of the And1 series I’ve been a big fan of Styles P.  For one summer, I think this was the song we listened to the most.  Of course, most of the ideas meant nothing to me then (even now?) but it came out at a special time when streetball tapes were important and the music off those tapes reflected what was ‘hot’.  All of this before Professor ruined And1 basketball, streetball and the delicate intersection of the two for me.

All of that being not interesting, Jada and Styles both rhyme in cool ways.  My favorite thing they do is the 4-syllable rhymes with unstressed middle syllables where, usually, almost all of the words are 1-syllable in length.  For example:

I got shit that could wake up the deaf
that’ll knock down the door and break up the steps

Certain words like {up} and {the} are not stressed because they work mostly as function words (I guess).  This is essentially  a 2-syllable rhyme {wake+deaf, break+steps} and the middle words just carry rhythm and sound cool.  Important to note that all of these words are monosyllabic.  I think this is harder to do and actually sounds better.  Computationally, it takes a lot more work to combine four words together like this and for it to sound good/’make sense’.  It is easy to find instances of 2-word combinations that match the rhyme pattern above.  

Examples rhymes for ‘wake up the deaf’:

  • betrayed a contempt
  • break and confess
  • brain to forget
  • behavior and threats
  • changing the s
  • pacing the decks
  • gauging success
  • changing defense
  • painting attempts
  • retaining a sense
  • player selects
  • navy corvette
  • awaited with dread
  • latency when
  • occupational stress
  • gratefully stretched
  • hasten to bess
  • haman against
  • hastily said
  • behavior is set
  • betrayed a contempt
  • behavior condemned
  • betraying the best
  • behavior with men
  • base his defense
  • say it reflects
  • retaining a sense
  • changing defense
  • painting attempts
  • sailing from thence
  • occasional yelps
  • playing yourself
  • creating a head
  • later regrets
  • occasional sketch
  • patients with ed
  • gauging success
  • ladies respect
  • engage in pretense
  • awaited with dread
  • named a hotel

4-word combinations is going to require some boring programming and probably some math and stuff like that to do in a non-shitty way.  Here are some Jadakiss/Syles-P 4-syllable rhymes and what the COMPLEX RapMetrics algorithm returns as possible rhyme results:


Everyday I need an ounce and a half
S.P.; the only flower that you know, with a bounce in a half

One of the great opening bars from one of my favorite albums ever.  Again, same idea here; the rhyme is essentially {ounce+half, bounce+half} with two words in the middle to carry rhythm and make it a 4-syllable rhyme.  It is one of my favorite rhyme schemes and an interesting one because the OW sound is pretty rare in general.  Here are some computer generated rhymes for that scheme.  Notice, again, there are no 4-word combinations.  It is because it is actually harder to do computationally, and, in my opinion, harder to do as a writer.

Example rhymes for the rhyme scheme ‘ounce and a half’:

    • mountain of cash
    • mountain attracts
    • mountains perhaps
    • counted the stacks
    • down the attacks
    • crowded in fast
    • hours of hand
    • flower or grass
    • council advanced
    • hour is passed
    • flowery branch
    • powerful am
    • doubted this last
    • flower or grass
    • south of sudan
    • housing began
    • counter and slammed
    • powerful am
    • flowery lands
    • flowers and scraps
    • powerful stand

A really cool rhyme and actually probably has potential to be bent in other weird ways.


So I roll ’em up, back to back, fat as I could
You got beef with Styles P, I come to splatter the hood

Same basic idea except now the 2nd line has a 2-syllable word {splatt-er}.

Example rhymes for the rhyme scheme ‘fat as I could’:

      • philanthropy would
      • fabric that could
      • faster and looked
      • fantasy look
      • fabulous goods
      • diameter should
      • calendar full
      • classes secured
      • tangible goods
      • carryin books
      • salinger book
      • natalie woods
      • management would
      • happening hook
      • mechanical bull
      • family good
      • splatter the hood
      • allison pushed
      • gallagher took
      • chapter i look
      • galloping hoofs
      • strategy shook
      • daddy assured
      • haven’t you looked
      • masculine foot
      • kasparov could
      • natural looks

Another sweet rhyme.


Motherfucker understand it’s full service to you
I don’t smoke the weed if it ain’t, purple or blue

Again, from ‘Get High’, same 4-syllable rhyme with 2 unstressed syllables in the middle.  This time the first word in this ‘rhyme scheme’ is 2-syllables long instead of one.  Which is cool.  This is what the very complex RapMetrics algorithm returned as possible results for rhymes:

Example rhymes for the rhyme scheme ‘purple and blue’:

    • preservatives to
    • permanent rules
    • purposely moved
    • purchased a few
    • persons consume
    • preferred to commute
    • determining u
    • conservative coup
    • determining two
    • encircling blue
    • recurrence is due
    • purchase this new
    • certainly smoothed
    • uncertainty due
    • cursing the rude
    • circle ensues
    • permanent wounds
    • certainly huge
    • germans produce
    • preferred to commute
    • circular stool
    • earliest jews
    • heard to allude

This is a cool ‘technique’ and there are some really cool things that can be done with it.  It is actually harder to 4-syllable rhymes of only monosyllabic words computationally and I think it is also harder to do it well as a writer.  I guess it is the main points of this ‘article’.  Sometimes having to simplify language and use smaller words to convey meaning/rhythm is actually way harder to do.  The simplest Styles P lines have stuck with me for the longest:

Bitch think I don’t smile ’cause my tooth chipped//
Bitch, I don’t smile cause my heart chipped//