The Three Pillars of Social Reader Relevancy (I)
In Web Search, the ranking of results is primarily determined by their freshness, relevancy (in regard to a search query) and content quality. Freshness is indisputable and needs little explanation, relevancy is an approximation of how much data a web site contains that have something to do with the user’s query and content quality is an indication of how “good” the site’s information is, given factors like PageRank, spam scores and so on.
Once crossed the line into the mobile word, however, these three factors lose their usefulness drastically. Text input on mobile devices is largely impractical and traditional web pages don’t render well, so media discovery and consumption on mobile devices is generally inferior compared to the same experience from printed mediums like newspapers and magazines. While big name players attempt to tackle the issue simply by snapping on extra features (Google Mobile Voice Search, Google Instant Preview for Mobile.etc.), the underlying problems remain resolved as its ranking algorithm is the same as its desktop counterpart. Flipboard is so great because, in my opinion, it has found and defined the new three pillars of relevancy for mobile content consumption and they are freshness, social, and readability* – and they work wonderfully.
With this in mind, I put some work to the server components of Cassius. From a simple script that turns a Tweet into a JSON feed, the pipeline now includes saving documents into a transitional store (MongoDB) and a series of quality measurement calculations. While the extra processing means we won’t be able to serve the feed in realtime, the cost should be worthwhile and I hope the results justify that.
How well does your article read?
In Zite or Flipboard, it’s not uncommon to run into articles with summary texts that resemble gibberish (see below). The issue is often a result of incorrect identification of raw HTML elements as meaningful content, and is very hard to avoid. I have seen attempts to solve this problem using NLP and machine learning classification methods, to varying degrees of success. Since those are beyond my capabilities, I opted to use some traditional methods to measure the quality of a piece of writing – by taking its readability metrics. From Wikipedia, readability evaluation refers to “the ease in which text can be read and understood“, and “…various factors to measure readability have been used, such as speed of perception, perceptibility at a distance, perceptibility in peripheral vision, visibility, the reflex blink technique, rate of work, eye movements, and fatigue in reading…“. Readability metrics measurement tools are widely available, and embedded in word processors and email clients.

source: corporategeek.info

results of bad scraping
In a nutshell, the tools apply different statistical formulas on a piece of English writing, and the resulting scores form an impression of its understandability. The formulas typically break text into syntactic components such as words and sentences and count their distribution or frequency in relation to the text being analyzed. The most common readability formulas and descriptions are given below:
I found it more pleasing to read blog posts and articles on Flipboard/Zite that are about a page in length. Contents that span multiple pages are too demanding for casual reads, while short tweets or one liners aren’t worth the two clicks effort to expand and shrink them from the page (yes really). For simplicity, let’s take my reading habits as standard, and use the following thresholds for computation:
- Flesch – 50 (Times magazine has a score of about 52)
- Flesch-Kincaid – 13 (pre-college level)
- Gunning Fog – 12 (texts for wide audience have fog index of less than 12)
- SMOG - 13 (pre-college level)
- Coleman Liau - 13 (pre-college level)
- ARI - 13 (pre-college level)
In the next post we’ll continue to explore the three pillars, and look at some test results to see whether the additional aspect of readability would help us create a feed that is better optimized for the user’s final reading experience.
Read MoreKnitting a page
When set out to build the prototype, there’re many things in the design I considered fundamental, chief among them being a template system flexible enough so that no re-installs or updates are necessary if a new page layout combination is desired.
References on the topic is plentiful, but surprisingly the most useful one I came across was a paper published in 1977 titled “Computer Assisted Layout of Newspapers” by the MIT. You can find the full 184 pages here. The paper is a gem to read and goes into detail on even how ads and pictures layouts could be automatically assigned to a theoretical newspaper page. I shall definitely return to it for more inspiration, but so far I have based the design of the prototype on Chapter 6, A Symbolic Graphics Language For News Layout.
The diagram below lifted from P.84 of the paper tells it all. Pages on Flipboard largely employ a rows/columns layout combination, and the powerful template language described in the paper should be able to cover all variations effortlessly .

a simple yet powerful layout language
Note that I cheated a little and defined my version of the template language in JSON, mainly for easier parsing in Objective-C.
Therefore,
P1 || (S1 = S2) is represented with {"columns": [{"type":"P1"}, {"rows":[{"type":"S1"}, {"type":"S2"}]}]} in my app,
and
S3 || (S4=(S5 || S6 || S7)) becomes {"columns":[{"type":"S3"}, { "rows": [{"type":"S3"}, {"columns":[{"type":"S5"},{"type":"S5"}, {"type":"S7"}]}] }]}.
With a structure like this, we could simply parse the JSON into multi-dimensional arrays (e.g. {“P1″, “{S1, S2}”}), then write classes to traverse the array and return suitable UIViews or collections of UIViews. Only two-level nesting is supported in the code right now.
The UIView generation process itself is just as crude at present. While looping through the array, the type of value stored is examined, and if it’s a definition like “P1″ or “TIA”, a helper class would create the corresponding UIView, with arguments being the article itself and attributes like the size of the array passed in for presentation purposes. All these take place in the PageLayoutManager class. A whole lot more work will be put in around these classes.
I’m hoping that more help from the server-side will be used for both the templates definition and articles selection process. Analysis on word count, images in the article, source authority, social signals and other relevancy factors should already been taken into account by the time these articles and templates arrive at the client app.
Finally, here’s the template used for generating the pages shown in the first video. There are four pages altogether, with pages 1 and 4 being row-based and pages 2 and 3 column-based. These layout designs are quite similar to the ones used heavily on tweets-display pages on Flipboard.
{"pages":[
{"rows":[{"type":"TIA"},{"columns":[{"type":"TIA"},{"type":"TIA"}, {"type":"TIA"}]}]}, {"columns":[{"type":"TIA"},{"rows":[{"type":"TIA"},{"type":"TIA"}]}]},
{"columns":[{"type":"TIA"}]},
{"rows":[{"type":"TIA"}, {"type":"TIA"}]}
]}
The rendering:

page 1 - row 1 is article, row 2 three columns of articles.
page 2 - column 1 is article, column 2 is 2 rows of articles.
page 3 - 1 column, 1 article.
page 4 - 2 rows of articles
Remote or Local?
A colleague pointed out the template definitions must be defined and stored on the client app locally, as the app shouldn’t need to fetch a new template from the server when the device changes orientation. I haven’t thought about that yet. To me, it makes more sense to have the server picking templates that are more suited to the content being served. I’m totally not thinking about how to deal with landscape orientations yet.
Extended Reading:
- COMPUTER-ASSISTED LAYOUT OF NEWSPAPERS. Reubtures et al.
- Optimization of web newspaper layout in real time. J. Gonzalez et al. / Computer Networks 36
Read More
Cassius is on github
Finally got round to putting together a decent enough client! Although the code is pretty rough now and the app would crash after a while, but at least it’s a start innit?!
Repos:
-
https://github.com/kenshin03/camus (server side, java)
-
https://github.com/kenshin03/cassius (iOS client side)
The article pages were generated by a custom template that’s defined in JSON, and the image on the cover page is grabbed from Instagram’s API:
The iconic page flip effects were lifted straight out from AFKPageFlipper. Thanks again Marco!
Read MoreReplicating Flipboard Part IV – Prelude
It’s been forever since the last update. But amidst a trillion other things, this Flipboard study never strayed far from my mind (and the IDE).
More detailed posts on the progress and designs of the prototype will come this week, but right now, I can’t wait to share the very first screenshots of Cloneboard Cassius.
What you see here is a cover page using a random image from Instagram transitioning into the first articles page showing stories from my Twitter feed. The layout was generated dynamically from a custom template defined in json.
I’ll be putting all the code up on Github for better back up and version control soon.
Read MoreJQueryMobile & Prezi
Amidst the crazily squeezed schedule in the last few weeks, a couple of things fascinated me so much I couldn’t resist from getting my hands dirty.
Putting the two together, this is a Prezi demo of a simple prototype I’ve built with JQuerymobile. The technical stuff will be added in this post later – for the moment, just sit back and be dazzled by pretty pictures flying across the screen!
Replicating Flipboard Part III – How Flipboard lays out content
Shifting the focus back to the iPad app, let’s take a look at how Flipboard processes and lays out Facebook and Twitter feeds. Does it really employ any social signal based ranking?
Here’s a sample of some Twitter feeds I receive, shown in a browser and in Flipboard:
![]() Sample Twitter Feed |
![]() Same group of Twitter feeds viewed in Flipboard |
Alright…what can be deduced from this? A few things became obvious when the same data is shown side-by-side:
- Ranking of the Twitter articles in Flipboard more or less retains the original sort-by-time order! Don’t think there’re any clever hidden social ranking at play here.
- Some Tweets were dropped by Flipboard, presumably because they do not contain links. Examples like (#3, #11, #15, #17, #18) are simply Twitter conversations.
- Tweets that have links to sites with Images (e.g. #7, #9) seemed to be given higher display priority.
What about Facebook? The typical Facebook feed is a lot more complex, with mixed content ranging from Check-ins, uploaded lmages, to Likes and notifications from whatever apps/games a user has added. The prospect of having to analyze so many different types of content itself is daunting enough, let alone the additional efforts of re-ranking and laying them out in an app.
As above, this is a snapshot of my Facebook News feed (names, faces and updates blurred out – don’t worry people!) in a browser and in Flipboard:
![]() facebook news feed |
![]() facebook in Flipboard |
Immediately we are striked with much more intriguing findings:
- Flipboard completely ditched the published time of the feed articles and laid them out entirely based on readability attributes (text length, image size.etc.).
- Quite a few Facebook articles were dropped:
- Article #5 – a Facebook Places-Checkin: Due to lack of images?
- Article #6 – An Image uploaded from iPhone. Not sure why it wasn’t shown in Flipboard.
- Articles #7, #13 – These were “xxx started using xxx app” messages, which contain no links or images and frankly, nothing interesting.
- Articles #20, #21 – These were messages from Groupon. As article #14 was also a Groupon article, I guess #20 and #21 were dropped as Flipboard prevents showing too many articles from the same source all at once.
And while looking at Facebook articles, we might as well take a quick look again if Facebook-Likes contribute to the layout or ranking at all. The verdict – NOPE. If social signals play a major role in ranking, then #12 and #19 should have higher placements in Page 1 or 2 at least.
- #1 – 1 reply
- #2 – 1 Like
- #3 – 0 Likes (but 82 youtube Likes)
- #8 – 0 Likes
- #4, #9 – 0 Likes
- #11 – 4 Likes, 7 Comments
- #12 – 1 Reply, 92 Likes
- #14 – 0 Likes
- #16 – 3 Comments
- #19 – 34 Likes
- #17 – 4 Likes
- #18 – 2 Likes
- #22 – 2 Likes
From such observations, I suspect Flipboard conducts workflows that roughly compose of the steps below:
![]() Likely processing workflow for Facebook feeds |
![]() Likely processing workflow for Twitter feeds |
Initially, I found it hard to accept this interpretation of templates-over-content. It is basically saying there’s no magic, that Flipboard merely puts together a random collection of page templates then proceeds to filling those templates with the next most suitable article from a content feed.
Putting an end to this suspicion, I switched on Flipboard after a few more articles emerged in my Facebook feed. Given a largely unchanged set of data, if Flipboard employs a content-centric ranking criteria, then the layout of the pages should remain more or less the same? From the screenshots below, this was clearly not the case. Notice how the same data was laid out much more differently:
![]() Original render in Flipboard |
![]() Second render of the same Facebook feed |
So that concludes the brief study on Flipboard’s layout algorithm. From this point on I shall start firing up the IDEs and getting my hands dirty on building that prototype. Pretty thrilled by the amount of attention and support this little project seems to be garnering – thanks and please stay tuned!
Previous Post
Related:


















Recent Comments