| 浆果奶昔 的个人资料浆果奶昔照片日志列表 | 帮助 |
|
|
2006/4/16 统计学的意义-搞笑版 (转载)统计学的意义-搞笑版 (转载)
第一次接触统计学,还是在小学时。虽然那时我们还很小,但当时接触的数学模型却非常
先进,跳过了加法减法乘法除法平方根自然对数,直接进入了一种类似于微积分的运算:
众多的微小量累积成一个巨大量,其模型主体为:“如果每个人节约一度电,那么……” ;“只要每个人节约一分钱,那么……”。这是一种所向披靡勇不可挡的数学工具。简而 言之,如果遇到一个像锅盖那么大的麻烦,我们马上就能拿出两个锅盖那么大的解决办法 ;如果遇到一座大山那样重量级的困难,我们马上就能变出三座大山那样重量级的应对方 案,而且操作过程是如此地简洁明快与清晰有力:只要大致测算一下面对的问题,反过来 除以几亿人民,摊到每个人身上,“如果”一下,问题立即就冰融雪解烟消云散灰飞烟灭 ,几乎达到了杜甫笔下大宛胡马“所向无空阔,真堪托死生;骁腾有如此,万里可横行” 的境界。至今回想起来,那种指点江山、举重若轻、谈笑间决胜千里之外的乐观主义,依 然沁人心脾。 工作以后,接触统计学最多的机会,是在每次的工作总结之时。众所周知, 工作总结是一种以乐观精神为主流思想的肯定性文体,统计数据则是支撑这种乐观精神屹 立不倒的支柱。这一阶段的统计模型更加进步到非线性数学与混沌理论的层面。比如:第 一年有不良资产100亿,减少30亿,成效喜人,还剩下120亿;第二年减少了40亿,成绩显 著,还剩下170亿;第三年减少50亿,成果辉煌,还剩下190亿;第四年减少了整整70亿, 成就惊人,还仅仅剩下220亿。整个系统看似不可预测的混沌,其实存在着系统的奇异吸 引力,确定了系统重复的某种模式、一种滚滚向前不断发展壮大的模式。 总的来说,自然世界是遵循会计原则的:质量守恒、能量守恒、动量守恒、 角动量守恒,无论白云在天还是青水在瓶,一切都会归到会计报表的两端,溪花与禅意, 相对亦忘言。而人类社会则偏好统计原理,1969年,第一届诺贝尔经济学奖就是奖给两位 喜欢统计的家伙,计量经济学创始人RagnarFrisch(挪威)和宏观计量模型创始人 JanTinbergen(荷兰)。英国政治家迪斯雷里说:“世上有三种哄人的东西:谎言、该死 的谎言、统计数据”;换一种角度来说,世上有三样激动人心的东西:“宣传、要命的宣 传、统计数据”。 当然,人类社会也是局部地遵循会计原则。比如,世有十毒:吃喝嫖赌抽, 坑蒙拐骗偷。显而易见,前五个项目属于资金运用,后五个项目属于资金来源。经常吃喝 嫖赌的肯定坑蒙拐骗,经常坑蒙拐骗的难免吃喝嫖赌,有来源必有运用,有运用必有来源 ,两者必匹配。学问就是学问,道理早就被说透,在路的尽头等候。 对于一般人而言,和其它专业学科一样,全副武装时的统计学满身上下佩带 着专业术语与复杂方程,一个宏观经济计量模型涉及的变量动辄数万个之多,标准差、中 位数、常态曲线、离散分配、差异系数、临界区域等等概念、公式、图表像梦呓一样混乱 。但当它卸下装备,赤裸入浴时,它总是单纯的。对于拥有话语权的权威者而言,统计只 是他们得到支持其论点的一些最终的单纯数字的简单过程。这里面有一个适用广泛的定理 :任何事情越到核心,就越简单。“一般的信徒总是要比红衣主教更加虔诚,这就是一切 宗教能够存在的秘密。”《安德罗波夫传》一书的扉页上如是说。 统计有三大特性,可以用三句话予以简单的概括:
实用性:除了实情,数据能证明一切。 丰富性:统计就像比基尼,露出来的部位固然诱人,没有露出来的部分 才是真正要命的。 公平性:我们相信上帝,其它人请用数据说话。 对于任何一个事件来说,实情只有一条,不实之情却有千条万条,可见统计
的用武之地是如此地广阔。王尔德对女性曾经有过类似的赞美:女人对很多事情都非常精 明,除了显而易见的事情之外,其它的什么都瞒不了他们(这种夸奖,仅次于邱吉尔先生 关于女性比男性更善于保守秘密的赞誉:她们忠心耿耿,担心一个人无法守住这个秘密, 所以她们找了许多同伴来一起保护这个秘密)。可见统计学和女性的特点非常相符,加之 统计学所拥有的比基尼特征,统计的女人味更加浓郁。 民主政治的运作立足于数人头,是统计问题。游戏规则很简单,人头有多少 ,权利有多少,只要脑袋都是一个,不管是一个手,二个手还是三个手的朋友,权利是相 同的。专制政治的运作立足于砍人头,也是统计问题。商鞅变法的中心策略之一就是砍人 头统计学,其《军爵律》曰“得甲首一者,赏爵一级,益田一顷,益宅九亩”。拿一个敌 人的人头来换爵位一级,一手交钱一手交货。这相当于一个正科级干部砍了一颗人头就能 升为副处级干部,外加100亩地和1713平方米的住宅,简直像海洛因一样刺激,于是广大 人民群众都亢奋得两眼冒火操起家伙就往前线奔,后来就六王毕、四海一了。有人比较节 俭,觉得这种奖励力度好像大了一些,认为可以调整一下汇率,几个人头兑一个爵位比较 妥当,这种为饿们秦王勤俭持家的精神是好的,但是统计学告诉我们,人头是有限的,砍 得太少不好,砍得太多也不好。如果砍人头的速度大大超过长人头的速度,容易发生社会 危机。南美阿兹特克人性格比较纯真,他们不恋权不贪财不好色,生活的惟一爱好就是去 砍别人的人头向神献祭,他们的逻辑是完美循环的:只有进行战争才能得到献祭的人,只 有用人献祭才能成功地进行战争。当时不少阿兹特克城镇几乎都有堆积如山数以十万计的 人头,可见他们都工作得非常勤勉,所以后来大家都累垮了,几百个西班牙人一来,就把 他们整个民族摆平了,这就是砍头砍太多的教训。鲁迅先生诗曰:“一阔脸就变,所砍头 渐多。忽而又下野,南无阿弥陀”。忽而砍砍头,忽而念念佛,革命生产两不误,这样才 是上好之选。就像印度孔雀王朝阿育王,虽然他上任时砍了九十多个兄弟姐妹的人头,最 后一仗砍了十几万羯陵伽人的人头,但后来忽而去念念佛了,于是国家安定富足,个人声 名也流芳百世,是学习的好榜样。 统计工作中常常还存在一个问题,有时候需要矫正系统性的偏差,有时这种 偏差会很大。电影《美国派》中就论述了三分之一与三倍偏差规律,即:如果一个男生说 他和三个女朋友有过亲密接触,别相信,实际数字只是这个数字的三分之一,即一个;如 果一个女生说她只和两个男朋友有过亲密接触,也别相信,实际应该是该数字的三倍,即 六个。如果忽视了这种系统偏差,统计结果将会南辕北辙。我曾经看到一份号称非常权威 并登载在国家级正规刊物上的统计报告,声称根据调查,男性的性伙伴的数量要大大高于 女性的性伙伴数量,说明男性更怎么怎么云云。这就是忽略了上述规律而导致的荒诞结果 。在只有男女两性、且同性恋因素对男女的影响大致相等的情况下,谁能告诉我男性多出 来的那些性伙伴是什么东西? 伏尔泰说:常识是一种介于聪明和愚蠢之间的东西。日常生活时刻需要的统 计,自然也是介于聪明和愚蠢之间的东西,并不時让我们迷乱于不知道世界和我们,到底 哪个是蠢的。 统计有利于培育我们的悲观主义气质,增加我们的危机意识:比如,一个细 菌的重量是万分之一克,假如其繁殖分裂完全按照理想情况进行,即15分钟分裂一次,每 天分裂96次。一天半以后,其后代的重量将和地球一样;两天以后,其后代的重量将和太 阳一样。想到实际情况是我们身边的细菌远远不止一个,到处都有数以百万千万亿万计的 细菌在飘荡,就会知道小学时代的类微积分模型其实保守得像一个太谨慎守旧的老者。 统计也有利于培育我们的乐观主义气质,增强我们对生活的热爱:比如,你 有父母两人,祖父母4人,太祖父母8人,假设你的列祖列宗都做到25年一代地结婚生子, 倒推到1600年前(相当于东晋时期)的话,就有64代,那时你共有1850亿个祖先,很显然 ,那时根本没有那么多人,所以你的祖先肯定有重复,即大家远亲结婚混在一起了,也就 是说,那时的每一个人,以及折算回现在的每一个人,都有极高的概率,高到几乎可以断 定都是你的亲戚。地球一家,世界大同。 附带说明,如果有人骂你说要干你的十八代祖宗,不要生气、不要忧伤、不 要着急,要相信统计、相信科学。根据统计,仅仅以第十八代这一代来算,你的祖宗就共 有262144人,而1至18代祖宗则共计有524286人。即使对方鞠躬尽瘁呕心沥血一天干三个 ,也需要近480年,将近半个世纪。如果要在40年内干完,那么平均一天要干36个,而这 样他是无论如何连30岁也活不到的。总之,理论上对方是无法做到,可以放心,可以乐观 。 统计可以增强我们对历史的接受能力:劳伦斯·克劳斯在《一颗原子的时空 之旅》中有过非常有趣的统计:恺撒遇刺,临死前他深深地呼吸了最后一口气息。平时我 们呼吸的每口气中大约包含6×1022个氧原子,假设恺撒用尽全力呼吸的最后一口气比平 时大上四倍,那口气中包含了大约24×1022个氧原子。而在整个地球的大气中,氧原子的 总数是约4×1043个,也就是说,按平均计算,在大气中,每1022个氧原子中,包括了恺 撒最后一口气吸过的5个氧原子。如果我们肺活量不变,此刻还是每口气呼吸6×1022个氧 原子的话,那么,此刻我们每个人的每一次呼吸中,平均都有3个恺撒最后一口气中的氧 原子。可以证明:我们都是曾经间接地参与了伟大历史的一份子。 统计同样可以增强我们对现实的选择能力:前苏联的拉里科夫跟踪研究 15000名调查对象,初步统计表明,其中70%—80%是因为爱情而结婚的,15%—20%是因为 人人结婚才结婚的,3%—10%是因为个人利益而结婚的。进一步统计显示:因为爱情而结 婚的人,百分之一百不会感到幸福;因为利益而结婚的人70%感到不幸福;因为别人结婚 所以自己也随大流随便结婚的人55%不幸福。可以证明:如果你希望婚姻幸福,只要这个 婚姻不是基于爱情,那你还是有指望的。如果你还是堕落到非爱不嫁非爱不娶的顽冥程度 ,那就属于自作孽了。善哉,善哉。 -- ※ 来源:·BBS 未名空间站 http://mitbbs.com·[FROM: 24.136.] Top Statisticians_J.BergerDuke的大牛,马上能够瞻仰到:
Jim Berger received a Ph.D. in mathematics from Cornell University in 1974. He
was a faculty member in the Department of Statistics at Purdue University until 1997, at which time he moved to the Institute of Statistics and Decision Sciences at Duke University, where he is currently the Arts and Sciences Professor of Statistics. He is also Director of the national Statistical and Applied Mathematical Sciences Institute. Berger was president of the Institute of Mathematical Statistics from 1995-
1996, chair of the Section on Bayesian Statistical Science of the American Statistical Association in 1995, and president of the International Society for Bayesian Analysis during 2004. He has been involved with numerous editorial activities, including co-editorship of the Annals of Statistics during the period 1998-2000, and has organized or participated in the organization of over 30 conferences. Among the awards and honors Berger has received are Guggenheim and Sloan
Fellowships, the COPSS President's Award in 1985, the Sigma Xi Research Award at Purdue University for contribution of the year to science in 1993, election as foreign member of the Spanish Real Academia de Ciencias in 2002, election to the USA National Academy of Sciences in 2003, and award of an honorary Doctor of Science degree from Purdue University in 2004. Berger's research has primarily been in Bayesian statistics, foundations of statistics, statistical decision theory, simulation, model selection, and various interdisciplinary areas of science and industry. He has supervised 30 Ph.D. dissertations, published over 140 articles and has written or edited 13 books or special volumes. 2006/4/15 Top Statistician_George E.P. BoxHonor: 英国皇家学会会员
George Edward Pelham Box, born 18 October 1919 in England, was one of the most
influential statisticians of the 20th century and a pioneer in the areas of quality control, time series analysis, design of experiments and Bayesian inference. Box was originally trained as a chemist, and he worked on biochemical experiments on the effect of poison gases on small animals for the British Army during World War II. He needed statistical advice to analyze the results of his experiments, but could not find a statistician who could give him guidance so he taught himself statistics from available texts. After the war, he enrolled at University College London and obtained a bachelor's degree in mathematics and statistics. He received a Ph.D. from the University of London in 1953. From 1948 to 1956, George worked as a statistician for Imperial Chemical
Industries (ICI). While at ICI, he took a leave of absence for a year and served as a visiting professor at the University of North Carolina at Chapel Hill. He later went to Princeton University where he served as Director of the Statistical Research Group. In 1960, Box went to the University of Wisconsin in Madison to create the
Department of Statistics. He served as President of the Institute of Mathematical Statistics in 1979, was appointed Vilas Research Professor of Statistics (the highest honor accorded to any faculty member at the University of Wisconsin) in 1980, and became Emeritus Professor in 1992 at the University of Wisconsin. Throughout his career, George Box has written numerous research papers and published many books. One of his most important contributions to the field of experimental design was his book, Statistics for Experimenters. Today, his name is associated with important results in statistics such as Box-Jenkins models, Box-Cox transformations, Box-Behnken designs and numerous others. 2005/12/30 题目回忆: 时间序列:Time Series AnalysisName: Time Series Analysis
Date: Nov 29,2005
Instructor:Minqian Liu
Content:
一
1. ARMA(p,q),的平稳可逆条件 2. AR(p), MA(q) 的自相关系数与偏相关系数的截尾性与拖尾性 3. ARIMA 模型的相关结构的特点 4. 如何判断偏相关系数的截尾性 5. Pandit-Wu 建模思想 二
MA(q)模型的宽平稳性和自相关系数与自协方差系数 三
(1+0.1B)Xt=(1-0.6B+0.08B^2)Et 1. 传递形式 Xt=Sigma Gj Et-j的{Gj} 2. 逆转形式Et=Xt-Sigma Ij Xt-j 的{Ij} 四
一个商店周数据,有周期和线性递增趋势 1)是否是平稳数据,为什么,如何建模 2)如何做差分使得数据平稳 五 给了五个数据X1-X5
1)估计自相关系数rho1 rho2 2)如果来自AR(2)模型,用Yule-Walker 方程,估计α1 α2 3)预测X5(2) 六
ARMA(2,2)
说明如何产生一组N个样本数据来自这个模型 如何拟和这个ARMA模型 七 (1-αB)(1-B)Zt=Et做最优线性预测 2005/12/28 Efron的原文:A life in the random universe原文:
From AMSTAT NEWS[/i:563e42cf83] Dec 2004
Life in a Random Universe ASA President - Bradley Efron
December in Palo Alto brings with it two notable natural phenomena: the days get short and it starts to rain a lot. These are both "scientific facts" but they involve quite different kinds of science. Shorter days exemplify hard-edged science, precise and so predictable that you can sell an almanac saying exactly how short each day will be, down to the nearest second. The almanacs try to predict rainfall, too, but they're not nearly so successful. Rainfall is a famously random phenomenon, as centuries of unhappy farmers can testify. (My father's almanac went further, predicting good or bad fishing weather for each day, indicated by a full fish icon, an empty fish, or a half fish for borderline days.)
Hard-edged science still dominates public perceptions, but the attention of modern scientists has swung heavily toward rainfall-like subjects, the kind where random behavior plays a major role. A cartoon history of western thought might recognize three eras: an unpredictable pre-scientific world ruled by willful gods and magic; the precise clockwork universe of Newton and Laplace; and the modern scientific perspective of an understandable world, but one where predictability is tempered by a heavy dose of randomness. Deterministic Newtonian science is majestic, and the base is modern science ,too, but a few hundred years of it pretty much exhausted nature's storehouse of precisely predictable events. Subjects like biology, medicine, and economics require a more flexible scientific worldview, the kind we statisticians are trained to understand. These thoughts were very much in my mind at Phystat2003, a conference of particle physicists and statisticians held at the Stanford Linear Accelerator Center last year. It's at least slightly amazing to me that the physicists, who were the convening force, were eager to confer with us. One can't imagine Phystat1903, back when the physics world disdained statistics. "If your experiment, needs statistics you ought to have done a better experiment" in Lord Rutherford's words. (It may be a mistake to ever call a scientist "Lord."
Rutherford lived in a rich man's world of scientific experimentation, in which nature generously provided boatloads of data, enough for the law of large numbers to squelch any noise. Nature has gotten more tight-fisted with modern physicists. They are asking harder questions, ones in which the data is thin on the ground and where efficient inference becomes a necessity. In short, they have started playing in our ballpark. The question of greatest interest at Phystat2003 concerned the mass of the neutrino, a famously elusive particle that is much lighter than an electron, and may weigh almost nothing at all. Heroic experiments, involving house-sized vats of cleaning fluid in abandoned mine shafts, yielded only a few dozen or a few hundred neutrinos. This left lots of room for experimental noise, and in fact the best unbiased estimate of neutrino mass turned out to be negative. Mass itself can't be negative, of course. Given a negative estimate, the physicists wished to establish a statistical upper bound for the true mass, the smaller the better from the point of view of further experimentation. As a result the particle physics literature now contains a healthy debate on Bayesian versus frequentist ways of setting the bound. The current favorite is the "Feldman-Cousins" method, developed by two prominent physicists, a likelihood-ratio-based system of one-sided confidence intervals. It took enormously long for the statistical point of view to develop. Two thousand years separate Aristotelian logic from Bayes theorem, its natural probabilistic extension. Another 150 years went past before Fisher, Neyman, and other frequentists developed a statistical theory satisfactory for general scientific use in situations where Bayes's theorem is difficult to apply. The truth is that statistical reasoning does not come naturally to the human brain. We are cause-and -effect thinkers, ideal perhaps for avoiding the jaws of the sabre-toothed tiger, but less effective in dealing with partial correlation or regression to the mean. Once it caught on, though, statistical reasoning proved to be a scientific success story. Starting from just about zero in 1900, statistics spread from one field to another, becoming the dominant mode of quantitative thinking in literally dozens of fields, from agriculture, education, and psychology to medicine, biology, and economics, with the hard sciences knocking on our door now. A graph of statistics during the 20th century shows a steadily rising curve of activity and influence. Statisticians, a naturally modest bunch, tend to think of their field as a small one, but it is a discipline with a long arm, reaching into almost every area of science and social science these days. Our curve has taken a bend upwards in the 21st century. A new generation of scientific devices, typified by microarrays, produce data on a gargantuan scale-with millions of data points and thousands of parameters to consider at the same time. These experiments are "deeply statistical." Common sense, and even good scientific intuition, won't do the job by themselves. Careful statistical reasoning is the only way to see through the haze of randomness to the structure underneath. Massive data collection, in astronomy, psychology, biology, medicine, and commerce, is a fact of 21st century science, and a good reason to buy statistics futures if they are ever offered on the NASDAQ. Several years ago I served a term as associate dean for science-a mouse training to be a rat as the old saying goes, though I never graduted to rat status. It was an interesting job that gave me a chance to see what life was like for our fellow scientists. Most of the stories were happy ones, with lots of good work and scientific progress in view, but there were serious problems, too. Biologists and chemists work in big expensive teams these days, putting senior scientists on a constant treadmill of grant requests and project supervision. Physics seems to have an overpopulation problem, with too many smart people chasing too little data. (Perhaps that is why the physics stories I see in Scientific American have taken on a science fiction aspect: "You may be able to travel backwards in time through worm holes." Mathematics has become an inward-looking field, very successful in solving the problems it sets for itself, but dangerously cut off from the larger world of science. My dean experiences led me to write down a list of three conditions for a healthy scientific discipline: 1. An outside demand for answers in the discipline's chosen area.
2. Some evidence of past success in answering such questions. 3. An ongoing production of useful new ideas. This list reflects the importance of both the inside and the outside of a scientific discipline. The inside part, the internal development of the field along new directions, is what makes a field fun to work in. But without outside demands for answers, demands that test new ideas in the fire of genuine applications, the fun can turn solopsistic, drifting into "angels on the head of a pin" territory. This was my concern about mathematics.
Statistics has its own problems. It still has junior status in the science world, with less history and a less clearly defined subject area than the traditional heavyweights. All in all, though, my dean time made me happy to return to statistics, a smaller field but one on the way up, not overpopulated, not driven by major funding or equipment needs, with lots of interesting problems to work on, and a healthy (almost overly healthy) outside demand for answers.
I find the microarray story particularly encouraging for statistics. The first fact is that the biologists did come to us for answers to the inference problems raised by their avalanche of microarray data. This is our payoff for being helpful colleagues in the past, doing all those ANOVAs, t-tests, and randomized clinical trials that have become a standard part of biomedical research. And indeed we seem to be helping again, providing a solid set of new analytic tools for microarray experiments. The benefit goes both ways. Microarrays are helping out inside our field too, raising difficult new problems in large-scale simultaneous inference, stimulating a new burst of methodology and theory, and refocussing our attention on underdeveloped areas like empirical Bayes. Ken Alder's 2002 book, The Measure of All Things, brilliantly relates the story of the meter, one ten-millionth the distance from the equator to the pole, and how its length was determined in post-revolutionary France. Most of the book concerns the difficulties of the "savants" in carrying out their arduous astronomical-geographical measurements. One savant, Pierre Mechain, couldn't quite reconcile his readings, and wound up fudging the answers, driving himself to near-madness and death. Near the conclusion of Measure, Alder suddenly springs his main point, forgiving Mechain as laboring under an obsolete, overly precise notion of scientific reality: "Approach the world instead through the veil of uncertainty and science would never be the same. And nor would savants. During the course of the next century science learned to manage uncertainty. The field of statistics that would one day emerge from the insights to Legendre, Laplace, and Gauss would transform the physical sciences, inspire the biological sciences, and give birth to the social sciences. In the process 'savants' became 'scientists.'" Right on, Ken! Alder's new world of science has been a long time emerging but there is no doubt that 21st century scientists are committed to the statistical point of view. This puts the pressure on us, the statisticians, to fulfill our end of the bargain. We have been up to the task in the past and I suspect we will succeed again, though it may take a couple more Fishers, Neymans, and Walds to do the trick.
2005/12/27 极其经典的思考统计学科的文章从南开统计论坛看到Ross师兄贴的这篇文章,看了好多遍,结合自己的学习,感觉对统计的发展和未来又加深了理解。作者Efron是统计大家,是当代统计的领军人物,Bootstrap的提出者。
我把它翻译成中文,这样可能会有更多的人能够读一下,即使只是读了几行文字,也能够多少了解一点统计的意义。
译文:
随机世界中的生命
ASA 主席 Bradley Efron
Palo Alto的十二月带来了两个显著的现象:白昼变短和降水变多。它们都是科学事实,但却设计截然不同的科学。白昼变短是硬科学的一个例子,精确可以预见,所以你可以出版一本历书告诉人们白昼会变得多短,甚至可以精确到秒。历书也试图预测降水,可是它们很难成功。降水是一个典型的随机现象,几个世纪以来不高兴的农民们也可以证实这一点。(我父亲的历书更加先进一点,用整个鱼图标,空鱼图标,半个鱼图标标记他预测的捕鱼的好坏天气。)
硬科学仍然统治着公众的认知,但是当代科学家的注意力已经很大的转移到了类似降水这样一类随机行为在其中扮演主要角色的事件上了。 一个对西方思想史的动画描述会承认三个时代:一个不可预知的,被任性的神和巫术统治的前科学世界;一个牛顿和拉普拉斯奠定的像时钟一样精确的世界;一个当代科学视角下的可以被人类理解的世界,但是这也是一个预见性被大量随机性制约着的世界。 确定性的牛顿科学是伟大的,也是当代科学的基础,但是它几百年的历史已经严重地耗尽了自然的大仓库中的精确可预见的事件。 像生物学,医学,经济学这样的研究对象需要更加灵活的科学视角,这个视角就是我们统计学家被训练去理解的。
以上的想法是我去年(2003年) 年参加在Stanford线性加速中心召开的粒子物理学家和统计学家会议上想到的。 我有点吃惊,召集这次回忆的物理学家是那么急切的想和我们讨论。 人们很难想像一个物理统计会议1903,那时候物理世界是看不起统计学的。 “如果你的实验学要统计方法,那么你最好在做一个更好的实验。” 卢瑟福勋爵说。(把一个科学家称为勋爵可能就是个错误。) 卢瑟福生活地时代,人们做很多科学实验。 在这些实验中,自然慷慨地提供海量的数据,这些数据多的足够用大数律来消除噪声。 然而,对当代物理学家,自然变得越来越吝啬。这些物理学家不断提出更难的问题,对这些问题,原始数据很少很少,因此需要有效的统计推断。 一句话,物理学家们玩进了我们(统计学家)的操场。 在2003物理统计会议上,最引起兴趣地问题涉及微中子的质量。微中子是一类著名的逃逸例子,比电子还轻,轻的甚至没有一点质量。 宏伟的实验涉及用一个房子大的桶和废弃的矿轴里的清洁液体,这样的实验仅仅能产生几十个或者几百个微中子。 而且,这样的实验产生很多的实验噪声[误差],事实上,微中子质量的最佳无偏估计出现的是负值。 当然,质量本身不可能是负的。 给出了一个负的估计,物理学家希望找到微中子质量统计估计上界,从进一步实验的角度考虑,这个上界越小越好。 对于确定这个上界的方法,当前粒子物理文献中有用贝耶思还是用频率方法的健康争论。 当前最流行的方法是“Feldman-Cousins" 法。 这种方法是由两个杰出的物理学家提出的考虑单边置信区间的基于似然比的系统。 统计视角的发展经历了极长极长的时间。 两千年的历史把亚历士多德式逻辑从他的自然的概率形式的推广-贝耶思理论中分离开来。另一个150年过去了,才有Fisher,Neyman和其他一些频率学家发展了一套可以广泛科学使用的统计理论,这套理论很好的适用于Bayes理论难以应用的情况。 事实上,对于人类大脑来说,统计推理并不很自然。 我们人类是采用因果思维方式的。 这种思维方式或许很适合避免剑齿虎的伤害,但是对于处理偏相关或者均值回归却不有效。 一旦被统计思想抓住了机会,它就被证明是一种成功的科学。 从20世纪初页的零起点开始,统计从一个领域传播到另一个,在许多领域,从农学,教育学,心理学,到医学,生物学和经济学,统计成为占统治地位的量化思维方式,使硬科学在敲打着统计的大门。 一幅对20世纪统计学发展状况的图表显示了统计学活力和影响力的稳定的增长曲线。 本性上谦逊的统计学家倾向于把他们的学科当成很小的一个,但是这是一个有着很长胳膊的学科,这只胳膊伸展到了今天几乎所有的自然科学和社会科学的研究领域。 我们的发展曲线在21世纪有一个向上的弯曲。 一个以微阵列为代表的科学设备的新时代,能产生海量的百万数量级的数据点和成千上万需要同时考虑的参数。这些实验更加“统计化”。 常识甚至好的科学直觉不能独立的处理这些数据了。 细致的统计推理成了能透过随机迷雾看到背后结构的唯一方法。 在天文学,心理学,生物学,医学和商业等领域的大量数据采集成为了21世纪科学的实情,也是提高了人们对统计未来的信心,如果NASDAQ有统计股票的话,这更是一个好的购买统计股票的理由。 几年前,我担任了一期科学院的副院长。 这是个很有趣的工作,使我有机会能够看看我周围的其他科学家的状况。 大部分情况是使人欣喜的,有很多好的研究工作和可以预见的科学进步,但同时也存在着一些严重的问题。 生物学家和化学家在开销极大的研究小组里工作,迫使资深科学家从事着单调的科研经费申请和项目监督工作。 物理学看上去有过多的人在从事研究,然而实际情况是太多的聪明人在追逐太少的数据。(也许这是为什么我在《科学美国》杂志上看到物理学报道总是以科幻小说的形式出现的原因,比如:“你可能通过虫洞穿梭时空回到过去。”) 数学正在变成一个内敛的领域,它非常成功地解决了它自己给自己提出的问题,但是正在危险地把它从更广阔的科学世界中割裂开来。 我的院长经历引导我列出了以下一个健康学科发展必须满足的三个条件:
1 对该学科研究领域课题解答的外部需求
2 一些证据来表明能够成功解答这些问题。 3 不断产生的有用的新思想 以上列表同时反映了该学科内部和外部的重要性。 朝着新方向的学科内部发展使得该学科成为一门充满趣味的值得研究的学科。 但是如果没有对解答的外部需求,没有对新思想的真正的应用性考验,这种趣味会变的为我独尊式的,漂流到一个很小很小的领域。这也我对数学科学的看法。
当然,统计学也有它自己的不足。它仍然是科学世界中一门年轻的学科,相比其他传统的重要学科,统计历史很短,也没有清晰严格的研究对象的界定。 总而言之,我的院长经历让我很高兴能够回到统计界,这里是一个更小的领域,但是正处上升阶段,既没有太多的人从事研究,也不会受到经费和设备的束缚,充满有趣的研究问题,和对问题解答的健康(几乎是过于健康)的外部需求。 我发现微阵列技术特别需要和鼓励统计发展。 第一个事实是生物学家总是向我们寻求帮助来解决微阵列雪崩般的数据产生的推断问题。 这个是对我们(统计学家)过去成为他们(生物学家)有帮助的同僚的回报,我们提出的ANOVA(方差分析),t检验,随机临床试验等现在成为了标准医学研究的一部分。并且,事实上,我们还在不断的帮助他们,提供对微阵列数据一套可靠的的新的分析工具。 好处是双方面的。 微阵列技术也正在从外部帮助统计学科的内部发展,它提出了对于高数量级数据同时推断新的困难问题,刺激了新的方法和理论,使我们的注意力重新转移到一些未充分发展的研究领域如经验Bayes。
Ken Alder 在他2002年的著作《万物的度量》中,非常聪明的叙述了“米的故事”,故事提到了一“米”是赤道到极点距离的千万分之一和法国革命后如何测定这个度量。 这本书的大部分涉及了专家们费劲力气决定天文和地理上度量的困难。 一个叫Pierre Mechain的专家因为不能和他的知识调和,最后捏造了一个答案,几乎发疯致死。 在这本书的最后,Alder谅解了Mechain这样的费力的在过时又过度精确的科学现实中的徒劳探索,然后突然提出了他的主题:
“对世界的探索要透过不确定性的面纱,科学决不是再像以前那样了,专家们也和以前不同了。 下个世纪,科学要学会驾驭不确定性。 统计学科终会有一天将从Legendre, Laplace, 和Gauss的视角中浮现出来,终会改变自然科学,激发生命科学和给社会社学以新生。 在这个过程中,‘专家们’将成为‘科学家’。”
Ken说得太正确了! Alder的新科学世界很早前就开始浮现了,但是,毫无疑问,21世纪的科学是离不开统计视角的。 这就给我们统计学家施加了压力,我们要去完成我们的使命。 在过去,我们已经取得了成功, 我相信,我们还会再次成功,即使这需要更多的Fishers, Neymans, 和Walds来获得这样的成功。 2005/12/24 题目回忆: 应用随机过程: Applied Stochastic ProcessesName : Applied Stochasitc Processes
Test Date : July 2005
Instructor: Chunsheng Zhang
Content:
1: X(t) 服从强度为lamda的poisson分布,
1)求E(N(t)N(s+t)) 2)求E(N(t+s)/N(t)) 2 X(n)是一更新过程,X(n)的分布函数是:f(X)=lamda*exp(-lamda(x-mu)) when x is gr
eater than mu, 求P{ N(t)>=k} 3 markov链转移矩阵的分解
1)求N C Ck 2) 对每个Ck求周期t, 3)求lim pii(n) 4 B(t)是一个weiner过程,X(t)=t*B(1/t),求正:
1) X(s) and X(s+t)-X(s) 独立 2)X(t)也是一个weiner过程 (提示:正态分布的独立等价与不相关) 5 证明:martinale, 清华 林 书 Page 162 eg1. X(t) and Y(t)
6 证明 martinale,清华 林书 page 164 4.8
题目回忆: 泛函分析Functional AnalysisName: Functional Analysis
Test Date: Jan, 2005
Instructor : Risheng Wang
Content:
1 X是一个赋泛空间,证明X可分<=>Sx={||x||=1}可分
2 设X是赋泛空间,{Xn} 属于X,任意f属于X*,{f(xn)}收敛,求正||f(xn)||<=M||f||,对于任意f属于X*,任意Xn
3 X是赋泛空间,G是X的一个子集,任意f属于X*, 存在{Xn}属于G,||xn||<1.|f(xn)|>=||fn||-1/n,求证span(G)的闭包=X;
4 C[0,1] 上定义||x||=sup(|X(t)|+X'(t)|) t 属于【0,1】
1) 求证 C[0,1]为一个banach空间 2) 若f(x(t))=x'(1),求证f属于(C[0,1])*,并求||f|| 5举例或者证明存在L2[0,1]上的{xn(t)},||x||=1,xn弱收敛于0(||x||=sqrt(integrate(|x(
t)|^2,0,1))) 6 各举例可分但是非自反的banach空间,自反但非可分的Banach空间
7设 T:X->Y为满射,存在alpha,beta>0,alpha*||x||<=||Tx||<=beta*||x||,任意x属于X,求证,任意X*,Y*,alpha*||x*||<=||T*(x*)||<=beta*||x*||
8 H是一个hibert空间,M是一个子空间,求证M闭<=>M=M的正交补的正交补 题目回忆:非参数统计 nonparametric statisticsName: Nonparametric Statistics
Test Date: Dec 13,2005
Instructor: Qiaozheng Zhang:
Content:
1 一组数据,检验中位数是否是M0,做符号检验,计算P-value.
2 X1-Xn 来自 F(x),Y1-Ym来自G(y),原假设H0:F=G下,Y在合样本中的秩为R1-Rm 求证,在原假设下,统计量max(R1-Rm)是d-free的 3 计算两组数据的Pearson Spearman and Kendall 相关系数,说明为什么三个相关系数不一样
4 判断两组数据是否来自同一个总体(计算Kolmogorov-Smirov统计量即可)
5 对两组数据做Wilcoxon 秩和检验,数据:来自健康儿童与病儿眼球转动频率
6 说明两种设计的差别,(一种完全随机化的,另一种是分完全区组),对不同的设计做不同的检验,Kruskal-Wallis and Friedman 检验
7 X1-Xn iid F(x), p=P{X1>0},对参数sita=p(1-p),
a) 求sita的U统计量
b)求Var(U)
c) 求U的渐近方差
2005/12/23 In Praise of Bayes#1 In praise of Bayes (一篇关于Bayesian的文章)
Bayesianism is a controversial but increasingly popular approach to statistics that offers many benefits-although not everyone is persuaded of its validity
IT IS not often that a man born 300 years ago suddenly springs back to life. But that is what has happened to the Reverend Thomas Bayes, an 18th-century Presbyterian minister and mathematician-in spirit, at least, if not in body. Over the past decade the value of a statistical method outlined by Bayes in a paper first published in 1763 has become increasingly apparent and has resulted in a blossoming of "Bayesian" methods in scientific fields ranging from archaeology to computing. Bayes's fans have restored his tomb and posted pictures of it on the Internet, and a celebratory bash is planned for next year to mark the 300th anniversary of his birth. There is even a Bayes songbook-though, since Bayesians are an academic bunch, it is available only in the obscure file formats that are used for scientific papers.
Proponents of the Bayesian approach argue that it has many advantages over traditional, "frequentist" statistical methods. Expressing scientific results in Bayesian terms, they suggest, makes them easier to understand and makes borderline or inconclusive results less prone to misinterpretation. Bayesians claim that their methods could make clinical trials of drugs faster and fairer, and computers easier to use. There are even suggestions that Bayes's ideas could prompt a re-evaluation of fundamental scientific concepts of evidence and causality. Not bad for an old dead white male.
Previous convictions
The essence of the Bayesian approach is to provide a mathematical rule explaining how you should change your existing beliefs in the light of new evidence. In other words, it allows scientists to combine new data with their existing knowledge or expertise.
The canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun will rise again or not. He assigns equal prior probabilities to both possible outcomes, and represents this by placing one white and one black marble into a bag. The following day, when the sun rises, the child places another white marble in the bag. The probability that a marble plucked randomly from the bag will be white (ie, the child's degree of belief in future sunrises) has thus gone from a half to two-thirds. After sunrise the next day, the child adds another white marble, and the probability (and thus the degree of belief) goes from two-thirds to three-quarters. And so on. Gradually, the initial belief that the sun is just as likely as not to rise each morning is modified to become a near-certainty that the sun will always rise.
In a Bayesian analysis, in other words, a set of observations should be seen as something that changes opinion, rather than as a means of determining ultimate truth. In the case of a drug trial, for example, it is possible to evaluate and compare the degree to which a sceptic and an enthusiast would be convinced by a particular set of results. Only if the sceptic can be convinced should a drug be licensed for use.
This is far more subtle than the traditional way of presenting results, in which an outcome is deemed statistically significant only if there is a better than 95% chance that it could not have occurred by chance. The problem, according to Robert Matthews, a mathematician at Aston University in Birmingham, is that medical researchers have failed to understand that subtlety. In a paper to be published shortly in the Journal of Statistical Planning and Inference, he sets out to demystify the Bayesian approach, and explains how to apply it after the event to existing data.
Patients in clinical trials will soon benefit. Bayesian methods offer the possibility of modifying a trial while it is being conducted, something that is impossible with traditional statistics. Andy Grieve and his colleagues at Pfizer, a drug firm, are intending to do just that.
Traditionally, dose-allocation trials-in which the aim is to establish the most effective dose of a new drug-involve giving different groups of patients different doses and evaluating the results once the trial has finished. This is fine from a statistical point of view, but unfair on those patients who turn out to have been given non-optimal doses. Rather than analysing the results at the end of a trial, Dr Grieve's method will evaluate patients' responses during it, and adjust the doses accordingly. The advantage of this over the traditional approach that it maximises the medical benefit to all participants. It also means that fewer people are needed to conduct a trial, because participants on non-optimal doses can have those doses changed to increase the amount of data collected near the optimal dose.
Pfizer is intending to conduct a trial using this new method, and the plan is to re-analyse the data once it is completed in ways that will satisfy both Bayesians and non-Bayesians. This kind of parallel approach is likely to become increasingly common, at least until Bayesianism has been more widely accepted.
Bayesian methods can also be used to decide between several competing hypotheses, by seeing which is most consistent with the available data. This idea was recently used to determine the date of construction of "Seahenge", an ancient timber circle found off the coast of Norfolk, in eastern England. Results from tree-ring dating were inconclusive, suggesting such divergent dates as 2019BC, 2050BC and 2454BC. So six samples from the monument's central stump were radiocarbon-dated, and the results were used to evaluate the three tree-ring possibilities via Bayesian analysis. The evidence was overwhelmingly in favour of 2050BC, and was inconsistent with either of the other two tree-ring dates.
Decision-making using Bayesian methods has many applications in software, as well. Perhaps the best-known example is Microsoft's Office Assistant, which appears as a somewhat irritating anthropomorphic paper-clip that tries to help the user. When a user calls up the assistant, Bayesian methods are used to analyse recent actions in order to try to work out what the user is attempting to do, with this calculation constantly being modified in the light of new actions. (Unfortunately, a non-Bayesian approach is used to decide when to make the paper-clip pop up on its own, adding to the annoyance of many users.) According to Eric Horvitz, a statistician in Microsoft's research division, future products will try to determine users' intentions more broadly, so as to speed things up. Having worked out which link on a web article a user is most likely to click on, for example, the computer could fetch the corresponding article in advance, so that it appears more quickly.
Bayes is still, however, the focus of much controversy. Larry Wasserman, a statistician at Carnegie Mellon University, in Pittsburgh, says that although Bayesianism is becoming more acceptable, it is no panacea, and when used indiscriminately it becomes "more a religion than a science". Perhaps the grandest claims made for Bayesian methods are those of Judea Pearl, a computer scientist at the University of California, Los Angeles. Dr Pearl has suggested that by analysing scientific data using a Bayesian approach it may be possible to distinguish between correlation (in which two phenomena, such as smoking and lung cancer, occur together) and causation (in which one actually causes the other). This kind of claim makes many scientists, including many Bayesians, throw up their hands in horror. Evidently there is life in the old reverend yet.
Reasons to become A Statistician:
Deviation is considered normal... We feel complete and sufficient... We are "mean" lovers... We are right 95% of the time... We never have to say we are certain... We are honestly significantly different... I have read the vignette written by James O. Berger about the Baysian Analysis in the Statistics in the 21th Century . Both of these articles are in praise of Bayes. From my point of view, statistics itself is more or less subjective. What model to construct or which method to apply highly depend on your own experiences and knowledge structure. Unlike the mathematical science, statistical methods fail to prove itself right or wrong in a rigorous way. Everything in statistics is related to one's previous experience. Therefore, Bayes language may be the language of statistics. 2005/11/30 可怕的十二月明天到十二月了。
目前为止必须要做的事情: 目前状态
12月8日or9日:百项工程结题答辩 一点未准备
12月12日:非参数统计考试,考前须上交所有大小作业 一点未做,书未看
12月15日:berkeley deadline ps未写完,网申未完,推荐信未拿
12月18日:大学语文期末考试 一点未看
12月底: 数理统计结课,上交term paper 连续旷课中,未看书
看来日子不大好混 |
|
|