最近越来越懒于写博客了.
可能很大程度上归咎于微博的兴起.
即使偶尔有表达的冲动, 由于微博发布的方便性而将这种冲动拼命压缩压缩压缩压缩到140字以内.
为什么不呢, 毕竟Echofon就在浏览器的右下角, just one click away.
为什么要写博客呢, 毕竟我还要打开博客后台, 拼命码字码字码字码到看起来像一篇文章的长度.
但是, 看着twitter上的碎碎念, 再看看博客上的荒凉零落, 两相对比, 总觉得过去一段时间过得没那么充实.
无论如何, 在这个新旧年交替的时候做个总结与展望吧.
2011总结
移动开发
2010年底(准确地说是十月), 为了参加Google的Android应用大赛, 花了半个月的时间学习Android开发, 并动手写了个用于名片分享的小应用. 为了测试程序, 还买了个廉价的Android机.
这应该算是关注移动应用的开始.
2011年初, 放寒假在家实在闲得没事干, 于是决定开始学习iOS开发, 入门教程就是那个经典的Stanford iOS开发课程视频. 如果没记错的话, 2011年除夕是在边看春晚边看教程中度过的. 看了几集之后就跃跃欲试了. 真正动手写程序是在大年初一.
当时对移动阅读很感兴趣, 于是开始着手学习这方面的东西. 其中最重要的一块就是正文抽取. 当时看了不少这方面的论文和文章. 也了解了很多现有的产品, 如readability/instapaper/readitlater/etc.
当时脑子里整天想的就是该如何做正文抽取, 纸上谈兵地分析比较各种算法. 想了一段时间后选择了一种开始动手实现. 实现的难度其实不大, 难的是参数的调优. 没有用大量测试集进行严格的测试, 调整参数只是凭"感觉"和少量的测试.
然后自己开发的第一个应用Mage Reader就逐步成型了.
后来, 后来就注册了IDP, 把Mage Reader放到了苹果应用商店里面.
放上之后, 感觉很累, 休息了一段时间.
然后对cocos2d又有了兴趣, 花了十天时间写了个小游戏: 24点.
目前为止这两个应用的收入接近于0, 至少和我付出的时间和精力相比不值一提.
但是编程的时候很开心并且很投入, 这就值了.
再说并不是所有付出的回报都是以金钱的形式体现.
很多付出的回报是----知识, 机会, 或者叫做运气, 它们是同义词.
在此期间, 很多童鞋都提供了很多的帮助, 包括花花/ftao/kai/tjj/etc, 感谢他们.
关于学习
深圳没有很多课可以选, 感兴趣的更少. 不过最后的绩点倒是很高, 这点出乎我的意料.
可能是我学习真的比以前认真了. 也有可能因为老师给分比较慷慨. 或者二者都有.
分布式系统与NoSQL
暑假回到北京后, 开始做导师的项目.
然后就开始研究Hadoop和NoSQL, 几乎整个学期都在看这方面的东西.
目前还只限于了解, 真正动手做只是刚开始.
10月底到12月底在IBM CRL实习了一段时间, 过得还比较充实.
关于生活
回到北京后, 住在学校万柳公寓, 距离学校20分钟公交车程. 每天坐公交往返于学校与宿舍之间, 挺麻烦的. 平均每天都有一个小时浪费在路上.
北大食堂很多, 饭菜还好, 至少可以找到不少可以吃的菜. 算得上是价廉物美了.
看了一些书, 《黑客与画家》、《浪潮之巅》, 最近在看《Steve Jobs》, 英文版, 看得比较慢.
不久之前入手了个Kindle 3, 这段时间也用它看了一些书, 《1Q84》、《三体》, 等等.
根据在豆瓣上的统计, 2011年我看了 203 部电影!!
这两个叹号一个是为了炫耀, 另一个用来表达惊讶, 因为这表示我花了至少400个小时在电影上面.
关于爱情
(null)
2012展望
对于自己的2011, 我基本满意, 没有浪费太多时间. 不足之处在于, 在学术方面没有多少进展, 没有选定一个方向深入进去. 所以, 希望2012年能够在某个方向上深入下去, 更学术一点. 最好能发发paper.
另外一个努力的重点是GRE, 希望英语水平能提高点.
还有余力的话, 找个实习?
选一些数据挖掘/机器学习的课程?
还要加强锻炼, 很久没有像样地运动过了.
多看些书.
减少浪费在社交网络上的时间. 特别是人人网.
2012年2月3日星期五
2011年8月17日星期三
Mage Reader 2.0 的截图
上一版本(v1.0)的截图是没有p过的,全是真正的截图。
(当然在图片上加上帮助说明之类的文字不算。)
现在看来很丑很简陋。
后来在cocoachina论坛上看到有人说图标和截图灰常重要,
除了图标外,在截图上也要下功夫,能p的一定要p。
所以,在2.0版的截图上稍微花了点功夫。
因为翻页特效(page flip)看上去是最大的“亮点”,所以p了一下这个。(其他还是没p的)
下面展示一下我高超蹩脚的ps技巧:
最后希望Mage Reader 2.0快点审核通过吧。
(当然在图片上加上帮助说明之类的文字不算。)
现在看来很丑很简陋。
后来在cocoachina论坛上看到有人说图标和截图灰常重要,
除了图标外,在截图上也要下功夫,能p的一定要p。
所以,在2.0版的截图上稍微花了点功夫。
因为翻页特效(page flip)看上去是最大的“亮点”,所以p了一下这个。(其他还是没p的)
下面展示一下我
最后希望Mage Reader 2.0快点审核通过吧。
2011年7月8日星期五
再见,深研院!你好,燕园!
再见,深研院A栋实验室C202C203大学城图书馆镜湖五四塔J栋国际会议中心D栋阶梯教室山景阁N栋303饮水机热水器空调20块包月10M网络坏掉的体重秤北大食堂二楼的拉面刀削面和无所不在的肉末清华食堂白菜粉丝茄子花菜土豆片土豆条鱼虾各种饼校园超市的酸奶煮意坊大学论语从没去过的subway健身房的跑步机租的24号自行车清华的长廊周末总是停满车的停车场路边红绿相间的凤凰木。
你好,燕园理科一号楼迷宫无数的实验室饭卡澡卡网卡校园卡康博思学一学五农园食堂便宜的饭菜上不了的国际网藏书丰富的图书馆简陋得让人泪流满面的检索页面三角地百年讲堂西门的天桥畅春新园畅春园阴暗的地下室无空调风扇卫生间不通风无手机信号只能摆得下四张床的堆满了杂物的宿舍五六年未见的公共盥洗室冷水淋浴间门口饭馆18块的地三鲜。
你好,燕园理科一号楼迷宫无数的实验室饭卡澡卡网卡校园卡康博思学一学五农园食堂便宜的饭菜上不了的国际网藏书丰富的图书馆简陋得让人泪流满面的检索页面三角地百年讲堂西门的天桥畅春新园畅春园阴暗的地下室无空调风扇卫生间不通风无手机信号只能摆得下四张床的堆满了杂物的宿舍五六年未见的公共盥洗室冷水淋浴间门口饭馆18块的地三鲜。
2011年6月11日星期六
【翻译】飞蛾之死 (弗吉尼娅·伍尔芙)
其实是英语课的作业,翻译了一个晚上。
====================================
====================================
飞蛾之死
弗吉尼娅·伍尔芙
弗吉尼娅·伍尔芙
准确地说,白天飞来飞去的蛾子并不叫做飞蛾。在窗帘的阴影中熟睡的常见的黄蛾总是会让我们大吃一惊,但是这些飞蛾并不会唤起我们对于黑暗的秋叶和绽放的常青藤的愉快的情感。它们是杂交的产物,既不像蝴蝶一样色彩缤纷,也不像它们的同类一般色彩黯然。然而在我面前的这个,有着狭长的干草色的翅子,翅子的边缘还有着颜色相同的流苏一般的纹饰,看上去活得心满意足。这是一个令人愉悦的九月中旬的早晨,温和、宜人,只是风吹得比夏季更加强烈了一些。犁已经在窗户外的田野中留下了刻痕,犁头过处,泥土被压得平平整整,散发着湿润的气息。这种活力的气息从田野和更远的地方滚滚而至,以致令人难以把心思完全放在书本上。白嘴鸦在忙着过它们每年一度的节日:它们绕着树顶飞来飞去,直到看上去仿佛有一张巨大的缀有成千上万个黑结的网撒开在空中。过了一会儿,这张网慢慢罩在树上,直到每一个树枝的枝头都有了一个结。突然这张网仿佛再次被撒到空中一样,然而是绕了一个更大的圈,伴随着很大的喧闹和叫嚷,就像被撒到空中然后慢慢落到树上这个过程是个极为令人兴奋的体验。
激发白嘴鸦、农夫、马匹、甚至光秃的小山丘的这种力量,也让飞蛾从窗玻璃的一边飞到另一边。这让人情不自禁地盯着看。这的确让人感到一种怜惜。那个早上似乎本应是快乐的,但是一只飞蛾对于短暂的生命的热情却让人不禁想到命运的残酷。它快乐地飞到格子的一个角落,在那里没有待到一秒钟,又飞向了另一个角落。接下来除了飞向第三个和第四个角落,它还有什么选择呢?尽管山丘是那么延绵起伏,天空是那么辽阔,房屋上空的炊烟飘得那么远,远处不时响起的海轮的汽笛声如此浪漫,这就是它所能做的事。它做了力所能及的事。看它,在它那脆弱而纤小的身体里就像有一条看不见的纤细而晶莹的充满整个世界的能量细线。当它频繁在玻璃上飞来飞去的时候,我仿佛看见了这条明亮的细线。它微小而无足轻重,但是充满了生命力。
然而,正是因为它的渺小和简单的能量穿过了开着的窗户,在人们的脑海中的狭小而错综复杂的走廊上横冲直撞,人们对它充满了某种惊奇且悲哀的感情。这就像有个人取了一滴生命的珠子,用绒毛和翅膀进行了尽量简单的装饰,然后让它转着圈舞蹈来向我们展示生活的真谛。人们不会忘记这种奇怪的感觉。人们倾向于忘记所有关于生活的东西,只是把它当作坎坷、无法改变、虚浮和苦难的,所以不得不小心谨慎地面对。人们又不禁会想到,如果生命将它生就为另一种模样,那又会怎么样呢?这让人们开始以一种怜悯的态度看待它这些简单的动作。
过了一段时间,它显然开始感到疲倦了,它停到了沐浴在阳光中的窗户边缘上。随着这种奇怪的景象进入尾声,我完全忘记了它。然后,当我抬头的时候,我的目光再次被它吸引过去。它正试图重新开始舞蹈,但是它只能僵硬或者说是笨拙地扇动翅膀飞到窗玻璃的底部。当它试图飞过去的时候却失败了。专注于其他事情的我看了一会儿这些无用的努力,而没有任何想法,只是单纯等着它再次飞起来,就像人们等待一台停了一会儿的机器重新启动起来却没去思考它停止的原因。差不多试了七次之后,它终于从木窗框上滑落下来,挥着翅膀向后跌到了窗台上。它无助的姿势唤醒了我,我终于意识到它遇到了困难。它再也飞不起来了,它的腿徒劳地拨动着。但是当我伸出铅笔想去帮助它翻过身来时,我意识到这种失败和笨拙意味着即将到来的死亡。想到这里,我又放下了手中的铅笔。
它的腿又开始猛烈地拨动了,在我看来就像在与敌人搏斗。我向门外看去,外面怎样了?大概已经到了中午,田地里的劳作停止了。静止与安静代替了之前的喧闹。那些白嘴鸦已经飞走去小溪边喝水了。马儿们站着一动也不动。但是那些力量仍然在那里,充满了外面的每个角落,中立、客观,不纠结于任何特别的事物。某种程度上讲这与眼前的这个有着枯草般的颜色的小飞蛾恰好相反。做任何事情都是徒劳的。人们只能旁观着那些细小的腿做出的离奇的努力来抗拒即将到来的毁灭。这种毁灭的力量可以让整个城市连同大批的人类沉入海底。没有什么能够抗拒死亡。然而在因为筋疲力尽而停了一小会儿后,它的腿又开始动了。这最后的抗拒是如此强烈和疯狂,它终于成功地翻过了身来。人们的同情当然永远会站在生命的一边。当没有人关心或者毫无所知的时候,一个无关紧要的小飞蛾为了维持别人不会珍惜和保留的生命,付出了不可思议的努力去对抗如此巨大的力量。这让人莫名地感动。不知为何,我再一次见证了生命的死亡。我再次举起了铅笔,尽管我知道这是徒劳的。但是甚至在我这样做的时候,死亡的象征再次毫无疑问地显现了出来。它的身体舒展开来,旋即变得僵硬。挣扎结束了。这个渺小的生命逝去了。看着这只死去的飞蛾,如此强大的力量战胜了如此普通的对手,这种胜利却让我感到惊讶。就像几分钟前生命曾经很诡谲一样,现在死亡也变得诡谲了。这只翻过身来的飞蛾终于优雅而毫无怨言地安详逝去了。是的,它好像在说,死亡比我强大多了。
(附原文)
The Death of the Moth (1942)
VIRGINIA WOOLF
Moths that fly by day are not properly to be called moths; they do not excite that
pleasant sense of dark autumn nights and ivy-blossom which the commonest yel-
low-underwing asleep in the shadow of the curtain never fails to rouse in us.
They are hybrid creatures, neither gay like butterflies nor somber like their own
species. Nevertheless the present specimen, with his narrow hay-colored wings,
fringed with a tassel of the same color, seemed to be content with life. It was a
pleasant morning, mid-September, mild, benignant, yet with a keener breath
than that of the summer months. The plough was already scoring the field oppo-
site the window, and where the share had been, the earth was pressed flat and
gleamed with moisture. Such vigor came rolling in from the fields and the down
beyond that it was difficult to keep the eyes strictly turned upon the book. The
rooks too were keeping one of their annual festivities; soaring round the tree tops
until it looked as if a vast net with thousands of black knots in it had been cast up
into the air; which, after a few moments sank slowly down upon the trees until
every twig seemed to have a knot at the end of it. Then, suddenly, the net would
be thrown into the air again in a wider circle this time, with the utmost clamor
and vociferation, as though to be thrown into the air and settle slowly down
upon the tree tops were a tremendously exciting experience.
The same energy which inspired the rooks, the ploughmen, the horses, and
even, it seemed, the lean bare-backed downs, sent the moth fluttering from side
to side of his square of the windowpane. One could not help watching him. One,
was, indeed, conscious of a queer feeling of pity for him. The possibilities of plea-
sure seemed that morning so enormous and so various that to have only a moth’s
part in life, and a day moth’s at that, appeared a hard fate, and his zest in enjoy-
ing his meager opportunities to the full, pathetic. He flew vigorously to one cor-
ner of his compartment, and, after waiting there a second, flew across to the
other. What remained for him but to fly to a third corner and then to a fourth?
That was all he could do, in spite of the size of the downs, the width of the sky,
the far-off smoke of houses, and the romantic voice, now and then, of a steamer
out at sea. What he could do he did. Watching him, it seemed as if a fiber, very
thin but pure, of the enormous energy of the world had been thrust into his frail
and diminutive body. As often as he crossed the pane, I could fancy that a thread
of vital light became visible. He was little or nothing but life.
Yet, because he was so small, and so simple a form of the energy that was
rolling in at the open window and driving its way through so many narrow and
intricate corridors in my own brain and in those of other human beings, there
was something marvelous as well as pathetic about him. It was as if someone had
taken a tiny bead of pure life and decking it as lightly as possible with down and
feathers, had set it dancing and zigzagging to show us the true nature of life. Thus
displayed one could not get over the strangeness of it. One is apt to forget all
about life, seeing it humped and bossed and garnished and cumbered so that it
has to move with the greatest circumspection and dignity. Again, the thought of
all that life might have been had he been born in any other shape caused one to
view his simple activities with a kind of pity.
After a time, tired by his dancing apparently, he settled on the window ledge
in the sun, and, the queer spectacle being at an end, I forgot about him. Then,
looking up, my eye was caught by him. He was trying to resume his dancing, but
seemed either so stiff or so awkward that he could only flutter to the bottom of
the windowpane; and when he tried to fly across it he failed. Being intent on
other matters I watched these futile attempts for a time without thinking, uncon-
sciously waiting for him to resume his flight, as one waits for a machine, that has
stopped momentarily, to start again without considering the reason of its failure.
After perhaps a seventh attempt he slipped from the wooden ledge and fell, flut-
tering his wings, on to his back on the windowsill. The helplessness of his attitude
roused me. It flashed upon me that he was in difficulties; he could no longer raise
himself; his legs struggled vainly. But, as I stretched out a pencil, meaning to help
him to right himself, it came over me that the failure and awkwardness were the
approach of death. I laid the pencil down again.
The legs agitated themselves once more. I looked as if for the enemy against
which he struggled. I looked out of doors. What had happened there? Presum-
ably it was midday, and work in the fields had stopped. Stillness and quiet had re-
placed the previous animation. The birds had taken themselves off to feed in the
brooks. The horses stood still. Yet the power was there all the same, massed out-
side, indifferent, impersonal, not attending to anything in particular. Somehow it
was opposed to the little hay-colored moth. It was useless to try to do anything.
One could only watch the extraordinary efforts made by those tiny legs against an
oncoming doom which could, had it chosen, have submerged an entire city, not
merely a city, but masses of human beings; nothing, I knew had any chance
against death. Nevertheless after a pause of exhaustion the legs fluttered again. It
was superb this last protest, and so frantic that he succeeded at last in righting
himself. One’s sympathies, of course, were all on the side of life. Also, when there
was nobody to care or to know, this gigantic effort on the part of an insignificant
little moth, against a power of such magnitude, to retain what no one else valued
or desired to keep, moved one strangely. Again, somehow, one saw life, a pure
bead. I lifted the pencil again, useless though I knew it to be. But even as I did so,
the unmistakable tokens of death showed themselves. The body relaxed, and in-
stantly grew stiff. The struggle was over. The insignificant little creature now
knew death. As I looked at the dead moth, this minute wayside triumph of so
great a force over so mean an antagonist filled me with wonder. Just as life had
been strange a few minutes before, so death was now as strange. The moth having
righted himself now lay most decently and uncomplainingly composed. O yes, he
2011年2月21日星期一
2011年1月16日星期日
2011年1月6日星期四
使用 Apache Mahout 和 Google Reader Share 搭建社会化阅读推荐引擎
一. Google Reader分享数据的抓取
主要参考以下文章:
Google Reader的数据收集
google提供的几种读取feed的API
值得注意的地方是在请求中可以添加以下参数:
1. 返回数量
2. 返回格式(json/xml)
用Python写了一个爬虫, 单线程抓取, 用的默认参数(n=20, xml). 目前已经爬了将近48小时, 共抓取了30000+个中文用户, 此外有20000+个不活跃用户或非中文用户.
判断中文用户的所使用的方法是: 在返回的结果中查找"的"字, 这种方法同时适用于简体中文和繁体中文. 目前看来效果良好.
抓回来的数据目前是保存为本地xml文件.
二. 数据解析
扫描所有xml文件, 产生打分数据.
打分数据格式为:
userID,itemID
表示用户(userID)喜欢/分享了文章(itemID).
第三项"分值"(preference)可以省略.
因为这里的打分的分值是bool值, 不是1就是0.
三. 搭建社会化阅读推荐引擎
1. 首先需要编译Mahout, 主要参考:
Recommender Documentation
基于 Apache Mahout 构建社会化推荐引擎
BuildingMahout
Recommender First-Timer FAQ
Apache Mahout 简介
RecommenderJob in mahout-0.4 returning 1.0 score for each recommendation
2. 然后基于grouplens的sample修改, 需要修改的主要是recommender:
public MyRecommender(DataModel dataModel) throws TasteException {
UserSimilarity userSimilarity = new LogLikelihoodSimilarity(dataModel);
// Optional:
//userSimilarity.setPreferenceInferrer(new AveragingPreferenceInferrer(dataModel));
UserNeighborhood neighborhood = new NearestNUserNeighborhood(3, userSimilarity, dataModel);
recommender = new CachingRecommender(
new GenericBooleanPrefUserBasedRecommender(dataModel, neighborhood, userSimilarity));
}
值得注意的是, 因为我们的分数是bool值, 所以这里使用的是GenericBooleanPrefUserBasedRecommender.
3. DataModel比较好修改, 将
super(convertGLFile(ratingsFile));
改为
super(ratingsFile);
就可以了.
4. 编译生成jar包, 拷贝到taste-web的lib下, 然后生成war文件, 然后拷贝到tomcat的webapps下, OK, 部署完毕.
另外需要修改JVM的最大堆尺寸: 修改/usr/share/tomcat6/bin/catalina.sh,
在第一行添加CATALINA_OPTS="-Xmx1024M"
然后重启tomcat:
#/etc/init.d/tomcat6 restart
四. 结果
OK, 现在用浏览器打开http://localhost:8080/[my app name]/RecommenderServlet?userID=[some user id]
能够看到输出 T___T, 类似于:
推荐效果不明, 因为都是文章的数字ID
暂时不会根据这堆数字产生人类可以理解的推荐结果网页
所以只能对着一堆数字泪流满面......
五. TODO
1. 根据推荐结果生成网页?
2. 调整recommender的类型/参数
主要参考以下文章:
Google Reader的数据收集
google提供的几种读取feed的API
值得注意的地方是在请求中可以添加以下参数:
1. 返回数量
2. 返回格式(json/xml)
用Python写了一个爬虫, 单线程抓取, 用的默认参数(n=20, xml). 目前已经爬了将近48小时, 共抓取了30000+个中文用户, 此外有20000+个不活跃用户或非中文用户.
判断中文用户的所使用的方法是: 在返回的结果中查找"的"字, 这种方法同时适用于简体中文和繁体中文. 目前看来效果良好.
抓回来的数据目前是保存为本地xml文件.
二. 数据解析
扫描所有xml文件, 产生打分数据.
打分数据格式为:
userID,itemID
表示用户(userID)喜欢/分享了文章(itemID).
第三项"分值"(preference)可以省略.
因为这里的打分的分值是bool值, 不是1就是0.
三. 搭建社会化阅读推荐引擎
1. 首先需要编译Mahout, 主要参考:
Recommender Documentation
基于 Apache Mahout 构建社会化推荐引擎
BuildingMahout
Recommender First-Timer FAQ
Apache Mahout 简介
RecommenderJob in mahout-0.4 returning 1.0 score for each recommendation
2. 然后基于grouplens的sample修改, 需要修改的主要是recommender:
public MyRecommender(DataModel dataModel) throws TasteException {
UserSimilarity userSimilarity = new LogLikelihoodSimilarity(dataModel);
// Optional:
//userSimilarity.setPreferenceInferrer(new AveragingPreferenceInferrer(dataModel));
UserNeighborhood neighborhood = new NearestNUserNeighborhood(3, userSimilarity, dataModel);
recommender = new CachingRecommender(
new GenericBooleanPrefUserBasedRecommender(dataModel, neighborhood, userSimilarity));
}
值得注意的是, 因为我们的分数是bool值, 所以这里使用的是GenericBooleanPrefUserBasedRecommender.
3. DataModel比较好修改, 将
super(convertGLFile(ratingsFile));
改为
super(ratingsFile);
就可以了.
4. 编译生成jar包, 拷贝到taste-web的lib下, 然后生成war文件, 然后拷贝到tomcat的webapps下, OK, 部署完毕.
另外需要修改JVM的最大堆尺寸: 修改/usr/share/tomcat6/bin/catalina.sh,
在第一行添加CATALINA_OPTS="-Xmx1024M"
然后重启tomcat:
#/etc/init.d/tomcat6 restart
四. 结果
OK, 现在用浏览器打开http://localhost:8080/[my app name]/RecommenderServlet?userID=[some user id]
能够看到输出 T___T, 类似于:
0.8012257 9162463033714117388 0.8012257 -4561230713080859140 0.8012257 2660300300542533338 0.8012257 -2449470947652448865 0.8012257 4517199100982238889 0.8012257 8506464146746528189 0.8012257 -3632037840702745266 0.8012257 -8137494627916127284 0.8012257 -4976713633681791837 0.8012257 1080507498851365445
推荐效果不明, 因为都是文章的数字ID
暂时不会根据这堆数字产生人类可以理解的推荐结果网页
所以只能对着一堆数字泪流满面......
五. TODO
1. 根据推荐结果生成网页?
2. 调整recommender的类型/参数
订阅:
博文 (Atom)