Lighting a Path Forward

About 2 years ago, I accidentally wrote a children’s book. (Accidentally is a strange word: rather, I never intended to write a children’s book, but when an idea sticks with me, the only way to get it out of my head is to do it.)

My significant other is part of a company that makes circuit stickers. Sometimes after dinner, I would beta test his products. I would do this in my spare time, one hand fumbling with a sticky circuit, the other hand clicking for cat clips on the computer. Once the work was done, I’d tuck away the circuit stickers along with the knowledge of how to build these circuits.

And then one night I dreamt of a dark world, and a poem about bringing light to a world full of darkness. The poem stuck in my head. I wrote it down, and from there an idea grew…

And now, 2 years later, *drumroll* I present my STEAM electronics kit: “When Thea LED the Way”.

This is an interactive craftbook that comes with electrical components. The heroine’s name is Thea Prom, a take on Prometheus, the Greek Titan who brought fire to mankind. Likewise, the reader helps Thea bring light to her darkened world.

In the craftbook “When Thea LED the Way”, the reader enters Thea’s world and becomes part of her journey. The reader must help Thea bring light to her world by building circuits. As Thea grows and gains confidence in her knowledge of circuits, the reader does as well.

Of course, the kit comes with everything you need to help Thea with her journey (circuit stickers, copper tape, etc). It’s like the movie “The NeverEnding Story”, where a boy reading a fantasy book has to actively participate in in order to save the fantasy world.

I’m thrilled to see the book in print and the kit being manufactured. In the process of getting to this point, I had to oversee the design many different parts, from the electronic components, to the package design, to the book itself.

When I wrote the SIFT algorithm, my short-sighted goal was to get a publication so I could receive my PhD. Thankfully, my PhD advisors convinced me to make a website so that the research and medical community could actually use SIFT. At the time, I didn’t see the importance of deploying the algorithm. Only later when I saw the utility the SIFT website had in other people’s research, did I realize the impact of deploying an idea.

So while I am excited that a factory is ready to manufacture this kit, I realize that without a website to get the word out, the project would have little impact. Thus, I’m doing a crowdfunding campaign for this project. If you can help me spark confidence in children who might normally not consider engineering, then this craftbook would have achieved the impact I’m hoping for.




There’s No Glass Ceiling if you Leave the Room

My friends have been asking me why I resigned from my current job, with no job lined up.

It started when ants found their way into our home. I could see a solid trail of ants determinedly racing towards a destination (turns out someone had spilled soda underneath the fridge).

The ants looked something like this:

Marching in a single file, the ants were focused and targeted on their sugar goldpile at the end of their trail. There were many of them, and they were undeterred. Like in a well-established field of research, everyone raced toward a final goal because they knew the reward was guaranteed.

After we cleaned up the soda, we saw a huge dropoff of ants in our home. However, one explorer ant (“scout ant” if you’re an entomologist) found a fragrant apple, and established a sparse trail leading a few other pioneering ants to our apple. This is akin to finding a promising, new research area. You’re not guaranteed success, but if you do, then many ants will eventually follow.

I have been lucky to be an explorer ant throughout most of my career. I established burgeoning fields and developed disruptive technologies.

The Job I Left Behind

When I started my job at GIS over 7 years ago, my job was funded by so-called “hard money” — guaranteed funding — which allowed me to be an explorer ant. I was given resources and allowed to try new ideas, so I executed some pretty interesting projects. Besides continuing research in genetic variation [1,2,3], I studied whether one could use online drug reviews to measure drug performances [4], created a framework for emancipating facts from papers locked behind paywalls (think SciHub with a legal theory) [5], and even looked at the patterns of Facebook memes [6].

However, the funding model at the institute changed; it started to transition to grant funding. The grant funding scheme was not guaranteed for a PI; funding instead was allocated top-down and focused on directed, big-money topics such as heart disease and cancer.

When I started writing grants, I found myself using buzzwords and adding unnecessary tasks to pad grants. Rather than designing the project the way it should be, I would bloat projects to consume the available funding.

The environment was changing, too. When resources are tight, people’s behaviors change. I looked around and saw certain behaviors emerging as people struggled to survive – this is not what I wanted to become.

Inevitably, I started to change: my science was changing and my efficiency was changing. And I didn’t like this change.

Large projects were winning funding, but I couldn’t be passionate about them because they were not risky or groundbreaking enough for me. Funding agencies were after success metrics like number of patents filed and licensees signed. They wanted a sure win, just like the ants rushing to our sticky soda sugar mine.

However, going into a heavily researched, well-established area? It’s just not me.

I could no longer be an explorer ant, and was worried I’d become:


I knew what I didn’t want to become, but what did I want to become?

I went through a lot of soul-searching. I applied for jobs, but nothing seemed to click.

A couple of incidents happened around this time.

  1. I heard a female entrepreneur speak, which is rare sight. She was realistic, open, and honest about the challenges in her life. What impressed me was that she had defined her own unique path.

    Instagram: lsjourney

  2. On a long flight, I watched the movie Moana on my dinky little airplane TV. It is about a girl finding her own identity (and saving her village in the process). These are my favorite lyrics in the whole movie:Like Moana, I have “journeyed farther” and have accomplished a lot in my career. My environment may change, but I should not let it change me. I still want to explore, to push boundaries, and to do ground-breaking things.
  3. My partner heard me singing “Moana” so often that he knew the lyrics without seeing the movie. Out of love, or perhaps to salvage his own ears, he suggested I take some time off. The way he put it was “if I didn’t make you happy, you’d leave me in a heartbeat. So if your job doesn’t make you happy, why not take a break instead of holding out for that new right job to come along?”

So I resigned.

The Future

(This is the shortest section because my story is still being written…)

So yes, I am unemployed! I am taking a step back to look at the big picture. I ain’t gonna lie: it’s scary like jumping off a cliff without a safety net.

Instagram adrenaline.addiction

Since my resignation, I’ve been rediscovering me. I have worn many hats: industry, academic, and clinical. I have always been defined by a role; now I will define my own role.

I have a general strategy. I’ve accepted a few part-time consulting jobs that I’m excited about, because it lets me work on exciting projects in new areas with amazing people. I am also very lucky to be a bioinformatician because all I need to continue creating is a laptop and access to cloud-based supercomputers.


Being a consultant is like being a tour guide –bringing others to newly discovered areas. I’ll also use some time to explore the unknown.

Most importantly, I’m keeping time for myself. There are some personal things on my bucket list I want to tick off. Some are genomic and disease-related; some are not. But in the end, I want to find something that challenges me and grows me.

Sure, like any explorer, I may fail.

But I’ll never know if I don’t try, right?


1. SIFT missense predictions for genomes. Nature Protocols 11:1-9. []
2. Phen-Gen: Combining Phenotype and Genotype to Predict Causal Variants in Rare Disorders. Nature Methods 11:935 []
3. Predicting the effects of frameshifting indels. Genome Biology 13:R9
4. Assessment of Web-Based Consumer Reviews as a Resource for Drug Performance S. Adusumalli, H. Lee, Q. Hoi, S.L. Koo, I.B. Tan, P.C. Ng (2015) Journal of Medical Internet Research 17:e211
5. Enabling Public Access to Non-Open Biomedical Literature via Idea-Expression Dichotomy and Fact Extraction. Association for the Advancement of Artificial Intelligence Workshop on Scholarly Big Data [FactPub]
6. Information Evolution in Social Networks. The 9th ACM International Conference on Web Search and Data Mining


Reverse engineering contingency (2×2) table from Odds Ratio (OR)

Given the odds ratio (OR), we will calculate the individual cells in the contingency table (a,b,c,d).

In yellow, I’ve highlighted what is known.
a,b,c, and d are unknown and what we want to calculate.

Odds Ratio = (a/c) / (b/d)

Cases Controls Total
Exposed a b total_exposed
Unexposed c d total_unexposed
Total total_cases

If you’re getting the OR from a paper, the paper usually has total_exposed, total_unexposed, total_cases,and total_participants.

In that case, you can derive a, b, c, and d.

Solving for a:

Cases Controls Total
Exposed a total_exposed – a total_exposed (a+b)
Unexposed total_cases – a total_unexposed – total_cases + a total_unexposed (c+d)
Total total_cases (a+c) total_controls (b+d) total_participants

So now, the equation for OR can be written in terms of a and the known numbers :

OR = (a * d) / (b * c)
OR = (a * (total_unexposed – total_cases + a)) / ((total_exposed – a) * (total_cases – a))

If you have the values for OR, total_exposed, total_unexposed, total_cases, and total_controls, you can solve for a</i> using the quadratic formula.

Once you solve for a, solving for b, c, and d is trivial.

Try it out!

 Deriving cells of 2×2 Contingency Table from Odds Ratio:

Enter values in yellow cells


Absent Present Totals
  Group 1  
  Group 2  


I came across this problem when reading an Alzheimer’s paper.

Looking at ApoE ε4 carriers (n=452), smokers have an OR of 1.97 for dementia compared to non-smokers.

Because this was a population study, I wanted to know how many smokers got dementia, and how many non-smokers got dementia. If I got the individual cells, I could calculate this.

Out of the 452 ApoE ε4 carriers, 207 were smokers (45.8% of 452) and 31 had dementia (6.9% of 452).

From this,

  • OR = 1.97
  • total_exposed = 207
  • total_unexposed = 245
  • total cases (those with dementia) = 31
  • total controls (without dementia) = 421

I plugged in the above calculator to get:

Dementia Non-Dementia Total
Smoking 19 188 207
Non-smoking 12 233 245
Total 31 421 452

In this population-based study, 9% (19/207) of the smokers had dementia while 5% (12/245) of the nonsmokers had dementia.


Alzheimer’s book: 100 Simple Things You Can Do to Prevent Alzheimer’s and Age-Related Memory Loss

The book “100 Simple Things You Can Do to Prevent Alzheimer’s and Age-Related Memory Loss” was written in 2010.

You can find all of the advice on the Internet, and in more succinct terms.

The author Jean Carper lists “100 actions” you can take to prevent Alzheimer’s. Each action is written as a chapter. My biggest pet peeve was the repetition of the list. It’s like she thought the reader had memory loss and so repeats the same thing for 3 or 4 times.

For example, one recommendation is to ‘use your brain’. OK, got it. She repeats this theme at least 3 times as part of her list of 100; chapters “Build Cognitive Reserve”, “Get a Higher Education”, “Google Something” are all the same.

Another example: exercise to improve memory. This message to exercise is repeated in chapters “Be a Busy Body”, “Prevent and Control Diabetes”, “Enjoy Exercise”, “Avoid inactivity”, and “Watch your Waist.”

The irony is that this book is targeted for an ‘intelligent’ reader — it mentions a lot of scientific studies. I appreciate that, but if your reader can understand words like insulin and inflammation, then don’t you think the reader would notice filler pages?

There are not 100 things — maybe 20 things at best. It’s like the editor told the author — we can only sell a booklet of tips for 30 things for $5. But if you can make it a list of 100, then we can sell the book for $20. Boo! Hiss!

Jean Carper did mention some scientific studies that I’d like to follow up on, especially those specific to ApoE4.

Specific for ApoE4

  1. Alcohol is bad for Alzheimer’s

    8-14 beverages / week : 37% lower risk of dementia, but this does not apply to E4
    > 14 beverages / week : Doubles the odds for Alzheimer’s
    Adults who usually drink lightly or moderately, but go on occasional binges are 3x more likely to develop dementia
    Drinking with ApoE4 pushes Alzheimer’s 4-6 years earlier

  2. High homocysteine levels and ApoE4 is bad. Taking vitamin B can lower homocysteine levels
  3. Mice genetically destined to get Alzheimer’s (check if ApoE4) fed with nicotinamide (over-the-counter form of niacin) zipped through mazes and did not get Alzheimer’s. There are studies in progress to see if niacin-packed foods delay Alzheimer’s. Eat niacin-rich foods like tuna, salmon, turkey breast, sardines, peanuts, halibut, and chicken
  4. Don’t eat fast food (tested this in ApoE4 mice)
  5. ApoE4 carriers are especially susceptible to tiny blows to head
  6. Wear a nicotine patch for ApoE4 – cognitive boost greater in carriers of 2 copies hmm, don’t know how I feel about this

Treatments/Foods for Alzheimer’s in general

Ways to Measure

  1. Ankle-brachial index (ABI) test: leow ABI readings likely to get vascular dementia / Alzheimer’s
  2. Measure C-reactive protein. Keep it low (if you have too much inflammation, C-reactive protein levels become high
  3. Balance (how long can you stand on one foot)
  4. Systolic blood pressure > 140 mm in midlife, stronger predictor of demenita. Ideally,
    < 120 systolic, > 80 diastolic
  5. Keep homocysteine levels low or take vitamin B
  6. PET scans show deposits before any signs of mental impairment


Alzheimer’s Books: Tangles and The Little Girl in the Radiator

I read two books on Alzheimer’s from the caretaker’s perspective:

  1. The Little Girl in the Radiator
  2. Tangles

Tangles and The Little Girl in the Radiator are written from the perspective of the caregiver. Tangles is written by Sarah Leavitt about her mother. The main caretaker is her father.

In The Little Girl in the Radiator, the son (Martin Slevin) is the main caretaker.

I enjoyed Radiator Girl more because of its humor. Tangles is a comic book and I’m not a big fan of comic books. Both were good; tears were flowing at the end.

Here are some things that stood out for me:

Loss of Appetite
In both books, the patient loses appetite, and so is in real danger of getting weaker. This may be in part because the patient’s sense of smell is gone, and so flavor and taste comes from the tongue. As a result, both patients loved sweet food (sweet candies in Tangled and chocolate biscuits in the Little Girl in the Radiator). It’s bittersweet the lengths at which the patients would eat sweet food. In Tangled, the mother eats sweet candies without taking the wrapper off, while in the Little Girl in the Radiator, the mother buys 50 packets of chocolate biscuits.

Managing the bodily functions of an Alzheimer’s patient is hard. In Tangles, the mother’s hygiene has deteriorated to a point where there are bits of feces and the patient is oblivious. The author mentions multiple incidents, and I think the mother even pooed in her underwear. I’m not sure why they didn’t put the mother in diapers earlier.

There are some nice pictures of raised toilet seat which only a comic book could capture nicely.

I wonder if a bidet toilet would have helped the patient remain clean. I wonder if there are bidet toilets for elderly. You’d have to be used to using one too, and I know I’m not used to it.

Also in Tangles, keeping the mother clean is a challenge. The mother can’t brush her own teeth so her breath stinks. The comic book pictures a bathtub, which must be difficult to get in and out of. I wonder if a shower stall is easier for someone with Alzheimer’s. Hygiene seems a challenge and they hire two caretakers for the mother when the father is away at work or needs a break.

Pets offer love
In both books, pets seem to help. In Tangles, Sarah’s mother recognizes the cat instead of her own daughter, In The Little Girl in the Radiator, the patient adopts an unruly dog for a few months, and is the one person who can relate to the dog. There are some pretty touching stories about the dog stealing the Sunday roast, but at then finding the lost patient!

Lost concept of time / Waking up at night
In both books, the patient can’t sleep and wakes up in the middle of the night, expecting a hair appointment or having a full-on conversation. In Tangles, the husband turns on the TV all night to entertain his affected wife. This can be very hard on the caretaker because the caretaker needs rest too.

Interesting Observations by the Author of The Little Girl in the Radiator:
Suggestive and Easily Duped
This may depend on how social the patient is. In the Little Girl in the Radiator, the author describes how his mother is easily swindled by door-to-door salesmen. Thankfully, he had power of attorney and was able to dispute charges which the mother had agreed to.
Know the laws in your state and country and assert them! Get power of attorney so that any legal agreement that the patient enters into isn’t binding.
Another example of how the patient is impressionable, is that whatever she watches on television becomes reality for the patient.

Repetitive actions that speak to a larger insecurity
Martin Slevin was astute enough to interpret deeper meaning from his mother’s obsessions. The title “Little Girl in the Radiator” is a fixation that his mother has — that there’s a little girl trapped in the radiator. Martin learns from his mother in her last days that ‘she’ is the little girl in the radiator — that she feels trapped and can’t come out. There are other repetitive patterns such as asking to go to the hairdresser (the mother was always looking her best and well-coiffed) and locking her son out of the house when she felt insecure or that he was a threat.


ApoE4 and increased risk of Alzheimer’s

My significant other (SO) has a good chance of getting Alzheimer’s, based on his family history and genetics (he’s ApoE4 homozygous). We discovered this when he got his DNA tested.

So my goal (that sounds really ambitious – maybe I should say “aim”? Trying again…)

My aim is to find preventive measures for Alzheimer’s, specifically for ApoE4 carriers. My partner is 42 years old right now. Can we prevent Alzheimer’s or delay it? If we start early, maybe he can have another 5 years of sanity. I know taking care of him will be exhausting. I’m trying to spend some time now to see if I can delay/prevent his dementia.

Yes, drugs are in development. Even if a drug gets to the market by the time he needs it, it could have side effects, be too pricey, or may not work effectively on him.

So if there are natural, preventive measures, then why not give it a try? I’m going to read the research, and using my biology/genetics/disease background, see if it makes sense at a biology/molecular level. Maybe we can do an experiment!

Let’s start at the beginning — how did we find out that my SO is at risk for Alzheimer’s?

The regulations are always changing on whether a person has the right to know if they’re at risk for Alzheimer’s. In 2017, FDA permitted 23andMe to provide Alzheimer’s predictions[1 , 2]

My SO had been tested prior to this approval. We were able to still get the Alzheimer’s risk and others by

  1. Get the raw 23andMe data.
    (DNA never changes, the information is always there. Regulation was preventing customers from this information)
  2. Getting the interpretations from Promethease, a 3rd party service for $5
  3. APOE4 genotype from Promethease

Early 23andMe customers got Alzheimer’s predictions (2007-2013), but he ordered his test during the FDA ban period.

If you want to save money and not pay 23andMe’s $199 ancestry+health offering, you could pay for 23andMe’s $99 ancestry service, and then pay $5 to Promethease for the disease risk predictions. The diseases in the 23andMe report are well-studied, so no matter if you use Promethease or 23andMe, the results should be the same. (No guarantees as we haven’t paid for the $199 service, so I can’t check it.)

However, it’s not easy to look at Promethease’s report. The easiest way is to look at the interactive report (report_ui2.html) and see how many copies of APO-ε4 he had. ε4 is bad, while ε3 and ε2 are OK. Unfortunately, my SO had 2 copies of APO-ε4

Honestly, unless you have a strong background in genetics or Alzheimer’s, I’d pay the extra money to get the 23andMe report, because this is the type of stuff you don’t want to misinterpret. (I saw the early reports from 2007-2013, and they’re much clearer than Promethease.)

Still, looking specifically at Alzheimer’s and Promethease’s $5 report, I could tell that Promethease was looking at the right genetic mutations. Because he has two copies of the ‘bad’ allele, his risk for Alzheimer’s disease is increased 12x. He also has a family history of Alzheimer’s, so unfortunately, I think it’s only matter a time.

Here are the resources I’ve found so far:

  1. ALZFORUM has a lot of academic papers on ApoE4’s role in Alzheimer’s, which is great!
  2. – People who have ApoE4 come to share on this forum. I especially like the I like “Our Stories” where people share what diet and exercise regiment they’re trying.

    Community Biomarker Archive is where people share their cholesterol, weight, and other body markers. I wish there were some test results that measure memory/cognitive decline, so people could know if what they’re doing/eating works.

Do you know any other resources out there for ApoE4?


Choosing the Right College

Is it worth paying more for a “top 10” school?

You get accepted to a private school / Ivy League that will cost a fortune. You’re also accepted by a public university which doesn’t have as good a reputation but costs much less. Does it matter which one you choose?

One factor in your decision is if the school will increase your chance of being successful. A school could be ‘worth it’ if it produces successful people, which we define as its alumni appearing in Wikipedia.

To answer this question, we identified the number of ‘successful’ graduates for each college (defined as the alumni appearing in Wikipedia). We calculated the likelihood of appearing in Wikipedia if one was an alumni from a given college.

Method Details:
The likelihood of a college alumnus appearing in Wikipedia is calculated as a relative ratio.
If the relative ratio is 1, then that means that the number of alumnus observed in Wikipedia follows what’s expected based on the college size.
If the relative ratio is greater than 1, then that means that the number of people in Wikipedia is higher than what’s expected base on its college size — and this school increases your chance of success.
Equations here

The table below shows the colleges with the most Wikipedia enrichment.

For example, Harvard has a relative ratio of 50, which means that alumni are 50x more likely to appear in Wikipedia than expected.

CollegeEnrichment in Wikipedia
American Conservatory Theater124.63
Harvard College50.38
Curtis Institute of Music59.97
Columbia University33.50
Juilliard School33.90
Yale University24.37
San Francisco Art Institute25.80
Princeton University20.05
Manhattan School of Music16.17
New England Conservatory of Music16.12
California Institute of the Arts14.62
Stanford University12.06
California Institute of Technology12.96
Massachusetts Institute of Technology11.27
Swarthmore College12.14
Northwestern University10.23
Amherst College11.03
Golden Gate University12.22
Williams College9.99
Bennington College10.70
Trinity College10.26
Cleveland Institute of Music12.57
Shimer College14.20
Johns Hopkins University8.20
University of Chicago7.96
Brown University7.96
Sarah Lawrence College8.83
Cooper Union for the Advancement of Science and Art9.25
Vassar College8.39
Columbia College8.71
Duke University7.45
Dartmouth College7.65
Wesleyan University7.87
Berklee College of Music7.43
Goddard College8.66
Oberlin College7.53
Rhode Island School of Design7.68
Georgetown University6.36
Pomona College7.04
Haverford College7.25
Reed College7.05
San Francisco Conservatory of Music8.36
Brandeis University5.82
University of Pennsylvania5.36
Cornell University5.30
Barnard College5.89
Bowdoin College5.99
National Defense University6.69
University of California Berkeley4.95
University of Southern California4.77
University of California Los Angeles4.75
St. John's College6.72
American University4.92
Davidson College5.56
Art Center College of Design5.49
Occidental College5.28
Wellesley College5.16
Bard College5.17
Hastings College5.57
Howard University4.67
University of Notre Dame4.50
Pontifical College Josephinum8.50
University of Michigan3.99
Smith College4.55
University of Rochester4.15
School of Visual Arts4.39
University of Miami4.04
Westminster College5.02
Marietta College4.78
University of Virginia3.90
New York University3.80
Middlebury College4.45
Mannes College of Music4.84
Southern Methodist University3.89
United States Military Academy4.10
Union College4.56
United States Naval Academy3.99
Kenyon College4.43
Naval Postgraduate School4.43
Illinois College4.78
Boston College3.65
United States Air Force Academy3.91
Wake Forest University3.75
Boston Conservatory4.92
University of Richmond3.96
College of the Holy Cross3.99
Wheaton College3.97
Mitchell College4.52
Colgate University3.93
Tulane University3.47
Lake Forest College4.21
Morehouse College3.92
Hampshire College4.16
School of the Museum of Fine Arts4.58
Carleton College3.94
Mills College4.00
Bryn Mawr College3.87
Rush University6.10
Catholic University of America3.34
Grinnell College3.84
Rice University3.34
Antioch University Los Angeles6.24
Kansas City Art Institute4.28
Colorado College3.60
Carnegie Mellon University3.04
Pratt Institute3.24
Bates College3.60
Vanderbilt University2.97
Tufts University2.92
Mount Holyoke College3.27
Hamilton College3.34
St. Charles Borromeo Seminary5.24
Syracuse University2.68
School of the Art Institute of Chicago3.06
Warren Wilson College3.59
York College4.03
Lincoln University3.71
University of Texas at Austin2.38
Otis College of Art and Design3.18
Texas College3.30
Lawrence University3.05
Whitman College2.98
Millsaps College3.14
Louisiana State University2.26
University of Tulsa2.59
University of Dallas2.70
Southwestern University3.00
Fisk University3.42
Macalester College2.77
University of the Southwest3.49
Willamette University2.64

The relative ratio of appearing in Wikipedia is plotted against the U.S. News & World Report rankings. We see that people who attend the top-ranked schools do have a higher likelihood of appearing in Wikipedia.

alt text

So this supports going to a top-rank school. Also notice what happens after rank 40, the college doesn’t seem to matter for getting into Wikipedia.

Full analysis with gory details

Also, check out my earlier post which show that you don’t even need to go to college for certain professions.


Is going to college worth it?

College is expensive. Students are graduating with massive debts that take the rest of their lives to pay off. Is it worth it? Bill Gates and Steve Jobs never graduated from college, so perhaps a college degree isn’t even necessary. But are Bill Gates, Steve Jobs, Mark Zuckenberg, and Joi Ito rarities in this world, or is this a more general trend?

Let’s analyze Wikipedia for some insight to this crucial question. To help with this task, I’ve created a tool that examines 100,000 biographies of notable individuals born between 1930-1980. This article summarizes my findings.

I’ve divided the results into 5 categories of notable individuals:

  1. Entertainers/Artists (famous singers, writers, etc.)
  2. Athletes, e.g. NBA and NFL professional athletes
  3. Politicians e.g. senators, presidents, protestors
  4. Business people e.g. Warren Buffet, Steve Jobs
  5. Academic nerds e.g. engineers, computer programmers, etc.

Note: A person in Wikipedia can be in more than one category. On his Wikipedia page, Bill Gates is categorized as being both an “American computer programmer” and as “Businesspeople from Seattle.”

College Education by Occupation

59% of the Americans in Wikipedia have no college information on their Wikipedia biography.
That’s surprisingly large.

Is it because Wikipedia simply lacks biographical entries on college education? Probably not. One would expect that almost all notable academics would have a college degree. 79% of the biographies of academics have higher education in their biographies – so perhaps the remaining 21% have incomplete biographical records. Assuming this is a valid proxy for under-reporting higher education, one might guess that, as a lower bound, 59% – 21% = 38% of Wikipedians don’t have a college education.

Furthermore, it stands to reason that if a college education was relevant to a Wikipedian’s achievements, the editorial community would include it in the biographical record. Therefore, even if these results are biased by an under-reporting of college information, it is still a valid indicator of the relative importance of college education to an individual’s notability.

Finally, the relative percentage of individuals with college education is consistent with expectations for the five occupations:

  • Athletes and entertainers/artists don’t require a college education to be successful (70% of the athletes and 60% of entertainers/artists don’t have college educations).
  • About half of business people went to college, but half did not.

College Rates over Time

Given the social pressure to go to college over the last few decades, one would expect that the fraction of people in Wikipedia with a college education would likewise rise over the same time preiod.

Instead, we find that education rates for people in Wikipedia have remained constant over time, despite a general trend in society toward higher education.

The dashed upward-trending line in the graph above shows how the general population is being convinced to go to college; the flat or downward-trending lines for education rates in Wikipedia show that college education has had little to no bearing on one’s accomplishments. In fact, in some disciplines it seems going to college hampers one’s ability to achieve success.

Business people show a small but significant decline in education over time (p = 0.003). Interestingly, the decline in education for businesspeople starts for those born in the 1960’s. This would correspond to being educated in the mid-1980’s which coincides with a recession where college tuition may have been impractical. It also coincides with the development of the World Wide Web which could have created other opportunities.

Entertainers can become successful at a young age without needing further education.
Just look at the Disney child actors that transition into adult roles.

Successful athletes have been showing increased college education rates over the years. This is probably a result of social pressure on college athletics programs to educate their athletes, in addition to using them to fill stadiums and power a money making machine. However, attendance may not mean graduation with a Bachelor’s degree because some athletes choose to leave college and become a professional athlete before graduation. Furthermore, athletes have been given passing grades in classes they never attended, so the ‘education’ aspect of college may be missing.

So is going to college worth it?

If you already know what you want to be – and you’ve proven you have a knack for it – then you may be better off to keep on doing what you’re doing, and skipping the college debt. Just ask the child actors, professional athletes, or all the young entrepreneurs running successful businesses without college degrees.


Mining Wikipedia paper at ICWSM 2012

Britney Spears and Kobe Bryant at VMA Yay! My paper entitled “What Britney Spears and Kobe Bryant Have in Common: Mining Wikipedia for Characteristics of Notable Individuals” was accepted at ICWSM 2012

The pdf can be downloaded here:
Mining Wikipedia For Characteristics of Notable Individuals.pdf

So what do Britney and Kobe have in common? They’re both successful, and my research shows that having a rare name increases the chance of success.

Wait — you say, Britney is a common name! Not so — when Britney was born in 1981, her name was far down the list of popular names — #758 as a matter of fact. So in her age group, her name was quite rare, and that really distinguished her from other musicians. I remember listening to those albums back then, people always said “Christina Aguilera”, but when you said “Britney”, everyone knew you were talking about the one-and-only Ms. Spears. Only later, when she gained immense popularity, did her name become common as parents started naming their daughters “Britney” (the name Britney rose to rank #137 in 2000).

When Britney was becoming a star, her uncommon name helped her. This is not surprising for entertainers, but according to my research, this observation holds for athletes and successful people in general.
And if you don’t have an uncommon name, then my research shows that using a nickname also helps. Think ‘Steve’ Jobs.

I also looked at birth locations. If you’re born in California or New York, you’re 2x more likely to become an entertainer. Not too surprising, because of Hollywood & Broadway. If you’re born in the South, there is increased chance of becoming an athlete.

This isn’t to say that if you have a common name or you weren’t born in these states, there’s no chance you will become famous. It just shows there is an enrichment for these characteristics.

So if you have a common name, try using a nickname!


  • People with rare names more than 2x likely to appear in Wikipedia (2.43x for women; 2.30x for men). [More]
  • People with nicknames are also more likely to be in Wikipedia. Males with nicknames are 2.39x more likely to appear in Wikipedia while for females it’s a 1.32x increase
  • Individuals born in New York and California are ~2x more likely to become entertainers, and those born in the South are ~1.5x more likely to become athletes.[More]

There’s a lot of data in Wikipedia, it can be mined for much much more. This paper describes a couple of features — more associations can be gleaned in the future.



Names and Birth States Found Frequently in Wikipedia

Oprah Winfrey, a successful person with an uncommon name.
Oprah Winfrey, a successful person with an uncommon name.
Bill Gates , Microsoft founder and philanthropist. Born as William Gates, but everyone calls him Bill.

Bill Gates , Microsoft founder and philanthropist.  Real name is William Gates, but commonly called Bill.

What makes a person successful? As parents, we try to make the best choices to help our children become successful and happy.

One of the first things that we decide on when having a baby is the child’s name. Another choice is where to live. Are these relevant in determining a child’s future success?

Wikipedia is full of successful people. I looked at the characteristics of people in Wikipedia to see if they are any different from the average population.

I looked to see if certain names and birthplaces occur in Wikipedia more often than expected . Click here for more details on the analysis.

Analysis on Names in Wikipedia

  • Rare names are enriched in Wikipedia. Names that are less than 1% frequent in the population are 2x more likely to appear in Wikipedia, regardless of gender.
  • If born with a common name, you’re more likely to appear in Wikipedia if you use a nickname rather than the formal name given at birth. For example, Michael appears in Wikipedia 42% less than expected, but its corresponding nickname “Mike” appears 9.7x more frequently than expected in Wikipedia.
  • Visualize names here

    Analysis of Birth States in Wikipedia

  • More entertainers/artists are born in California and New York (~2-fold enrichment)
  • More athletes are born in the Southern states (~1.5-fold enrichment)
  • Visualize all states here

    Download the source code:

    Code for analyzing Wikipedia biographies

    Older posts «