Linux Evolution Reveals Origins of Curious Mathematical Phenomenon

December 1, 2008 By Lisa Zyga feature
Linux Evolution Reveals Origins of Curious Mathematical Phenomenon

Enlarge

When the Zipf curve is plotted on a log-log scale, it appears as a straight line with a slope of -1. This graph shows that four Debian Linux releases each follow Zipf’s law: Woody (orange), Sarge (green), Etch (blue) and Lenny (black). Credit: T. Maillart, et al.

(PhysOrg.com) -- Zipf’s law is a testament to the order in our world, showing that the same patterns emerge in a wide variety of situations. The linguist George Kingsley Zipf first proposed the law in 1949, when he noticed that the distribution of words in a newspaper, book, or other literary article always followed the same pattern.

Zipf counted how many times each word appeared, and found that the probability of the occurrence of words starts high and tapers off. Specifically, the most frequent word occurs about twice as often as the second most frequent word, which occurs about twice as often as the fourth most frequent word, and so on. Mathematically, this means that the frequency of any word is inversely proportional to its rank. When the Zipf curve is plotted on a log-log scale, it appears as a straight line with a slope of -1.

Since Zipf’s discovery, researchers have found that the power law describes many other natural and human phenomena, including the distribution of cities ranked by their population, the distribution of corporate wealth, and Internet traffic characteristics.

When analyzing systems that follow Zipf’s law, researchers usually assume certain mechanisms to be responsible for this patterned behavior. However, no one has ever empirically demonstrated that these assumed mechanisms are indeed the origin of Zipf’s law.

Now, a team of researchers from ETH Zürich (the Swiss Federal Institute of Technology Zürich) in Switzerland has confirmed that these assumed mechanisms – such as scale-free, proportional growth rates – are at the origin of Zipf’s law. The researchers used four orders of magnitude of data detailing the evolution of open source software applications created for a Linux operating system to confirm the assumption.

The team studied Debian Linux, a free operating system continuously being developed by more than 1,000 volunteers from around the world. Developers create software packages, such as text editors or music players, that are added to the system. Beginning with 474 packages in 1996, Debian Linux has expanded to include more than 18,000 packages today. The packages form an intricate network, with some packages having greater connectivity than others, as defined by how many other packages depend on a given package.

“Open source offers a unique opportunity provided by the high completeness of data concerning open source (thanks to the disclosure policy of the open source terms of license),” lead author Thomas Maillart of ETH Zürich told PhysOrg.com. “Debian Linux allowed us to retrieve exhaustive information from several years ago. Many other complex systems are not so well ‘documented.’”

As the researchers explain, the Linux network is constantly changing: new packages enter, some disappear, and others gain or lose connectivity. Yet throughout the 12 years, the distribution of packages, as ranked by their number of incoming links from other packages, has followed Zipf’s law, with a few very popular packages having much greater connectivity than most.

While many previous models of Zipf’s law start with the assumption that the set of entities (e.g. packages) appeared at the same time, the Swiss researchers track the time evolution of package connectivity in the Linux network since 1996. This perspective enabled them to test for the presence of specific characteristics of the growth of the Linux network, which leads to the emergence of Zipf’s law.

Using the data, they showed that the growth rates of connectivities between packages are proportional to the degree of connectivity between packages. In addition, they showed empirically that the average growth rate of the total number of links to a given package over a time interval is proportional to that time interval. Further, the variability of the total number of links to a given package increases proportionally to the square-root of time, providing a crucial test of the mechanism of stochastic proportional growth of connectivity between packages. Altogether, these characteristics are responsible for the universal distribution pattern of Zipf’s law.

“We show that the distribution of connectivity of new entrants is also a power law with an exponent much bigger than 1, confirming that the proportional growth mechanism is solely responsible for the Zipf's law,” Maillart said.

He explained that, while Linux data allowed the researchers to confirm the origins of Zipf’s law, their results bring up more questions.

“Linux Debian gave us the opportunity to verify the ‘proportional mechanism,’ thanks to an important dataset and a huge investigation potential,” Maillart said. “All changes (evolution) in open source software are freely available and therefore can be tracked in detail. However, model verification has brought one answer and many resulting questions we intend to give an answer to. We think particularly of mechanisms of success/failure of projects in relation with their management.

“Remember that we still do not clearly understand the reasons of the success of the open source, since it's free and based on altruist contributions by programmers,” he said. “Additionally, one can bet that further research in this direction (open source and proportional growth) may raise useful questions for other systems (cities, economy, etc.) that would bring new insights to explain their evolution.”

More information: T. Maillart.; D. Sornette; S. Spaeth, and G. von Krogh. “Empircal Tests of Zipf’s Law Mechanism in Open Source Linux Distribution.” Physical Review Letters 101 218701 (2008).

Copyright 2008 PhysOrg.com.
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in whole or part without the express written permission of PhysOrg.com.

4.2 /5 (119 votes)  

Filter


Move the slider to adjust rank threshold, so that you can hide some of the comments.


Display comments: newest first

brane
Dec 01, 2008

Rank: 2.6 / 5 (5)
agreed
mattytheory
Dec 01, 2008

Rank: 3 / 5 (3)
Had to look up Zipf's law. The first part of the article was interesting. But, I agree.. the last part was yawn.
FredG
Dec 01, 2008

Rank: 3 / 5 (3)
Would not this only be true if the links were random? However, since 2003, corporations have taken over Linux development and have full time paid engineers doing the development.

So the packages are not random.
fleem
Dec 02, 2008

Rank: 2.6 / 5 (5)
Fleem's law: 10% of all so-called science articles (and grant recipients) will attempt to make something blatantly obvious and mundane seem mysterious and complex.
SmartK8
Dec 02, 2008

Rank: 3 / 5 (4)
Fleem: Agreed. Those are hot candidates for the Ig Nobel Prize 2009. Good luck guys.
Going
Dec 02, 2008

Rank: 3 / 5 (2)
I wonder Zipf%u2019s law also applies to the number of species evolved over time in a given environment.
theophys
Dec 02, 2008

Rank: 4 / 5 (4)
Seems more like an add for Linux than anything else. These guys just want funding to play around with programming. "further research in this direction...may raise useful questions for other systems (cities, economy, etc.) that would bring new insights to explain their evolution.%u201D

What a bunch of ballux. If you want to know how economic theories or cities, pick up a history book.
Yoknapatawpha
Dec 02, 2008

Rank: 4.3 / 5 (3)
I quote:

the growth rates of connectivities between packages are proportional to the degree of connectivity between packages

AND

the variability of the total number of links to a given package increases proportionally to the square-root of time, providing a crucial test of the mechanism of stochastic proportional growth of connectivity between packages

AND

they showed empirically that the average growth rate of the total number of links to a given package over a time interval is proportional to that time interval.

NOW MY QUESTIONS IS:
How could this article NOT point out that it should be the root inverse proportion of the mean??

And those of you that were bashing this article... if you read this far, can you still not get it?

sheesh...
Yok

A_Paradox
Dec 03, 2008

Rank: 3 / 5 (2)
a/
Would not this only be true if the links were random? However, since 2003, corporations have taken over Linux development and have full time paid engineers doing the development.

So the packages are not random.


Maybe it depends what you mean by random. In the evolution of linux code packages there might be some "true" randomness to the origin of a particular idea or strategy but once it is instantiated its evolution may be largely determined by 'dialectical' interaction with the rest of its world, just like biological species, etc, but will be mostly unpredictable due to the non-linear progression of these recursive interactions.

b/ I am not sure what Yok is saying about "root inverse proportion of the mean", but I am neither mathematician nor scientist. I agree with Yok though that the bashers must be sleeping through their lives, not to be entranced by yet another demonstration of the amazing depths of emergent order manifest by evolutionary processes.

Mark
trimleyman
Dec 03, 2008

Rank: 3 / 5 (4)
linux is opensource so unlike apple and microsoft stands on it's merits alone. apple and microsoft spend billions advertising each-others failings, perceived or real. we who use linux understand that it is not perfect but is at least headed in the right direction and is still developed primarily by the community. a huge industry has built around the failings of mircosoft's os in particular. this employers large numbers in the Silicon Valley and elseware.
kwilco
Dec 06, 2008

Rank: 3 / 5 (1)
Fleem's law: 10% of all so-called science articles (and grant recipients) will attempt to make something blatantly obvious and mundane seem mysterious and complex.


Although the article is about linux, the underlying natural law, "Zipfs law, a testament to the order in our world, showing that the same patterns emerge in a wide variety of situations," is potentially profound. Evidently, humans are driven by forces beyond our comprehension. Beyond psychology and into the physical world.
kwilco
Dec 06, 2008

Rank: 5 / 5 (1)
P.S. Check out Stephen Wolfram's cellular automata:

http://www.maa.or...ram.html
corymp
Dec 07, 2008

Rank: 5 / 5 (1)
this reminds me of the double slit experiment... if this document was read by one of those volunteers and they realized that they were contributing to this, would you think the whole pattern would change completely?
tigger
Dec 07, 2008

Rank: 5 / 5 (1)
Linux bites the big one... hey, let's recompile the kernel because we want to install a web cam. Uggghhhh, Linux fan boys LIKE the fact that they have to compile the kernel because they think it makes them software engineers... when the reality is they are generally social outcasts trying to elevate themselves through delusion.

Sigh... anyway, Zipf%u2019s law, great, yeah.
Ashibayai
Dec 07, 2008

Rank: 5 / 5 (1)
Sounds like a case of e^x to me.
deatopmg
Dec 07, 2008

Rank: 5 / 5 (1)
Fleem's law: 10% of all so-called science articles (and grant recipients) will attempt to make something blatantly obvious and mundane seem mysterious and complex.


ONLY 10%!
Rank 4.2 /5 (119 votes)
Related Stories
Relevant PhysicsForums posts

More news stories

A quantum connection between light and motion

(PhysOrg.com) -- Physicists have demonstrated a system in which light is used to control the motion of an object that is large enough to be seen with the naked eye at the level where quantum mechanics governs ...

Physics / Quantum Physics

created 22 hours ago | popularity 4.9 / 5 (17) | comments 7 | with audio podcast

Electrons in concert: A simple probe for collective motion in ultracold plasmas

(PhysOrg.com) -- Collective, or coordinated behavior is routine in liquids, where waves can occur as atoms act together. In a milliliter (mL) of liquid water, 1022 molecules bob around, colliding. When a bre ...

Physics / Plasma Physics

created 21 hours ago | popularity 4.2 / 5 (5) | comments 0 | with audio podcast

Quantum microphone captures extremely weak sound

(PhysOrg.com) -- Scientists from Chalmers have demonstrated a new kind of detector for sound at the level of quietness of quantum mechanics. The result offers prospects of a new class of quantum hybrid circuits ...

Physics / Quantum Physics

created 21 hours ago | popularity 5 / 5 (1) | comments 3

Progress and promise in DIAL LIDAR

For climatologists and environmental policy makers who need to determine the flux of greenhouse gases (GHG), there are three paramount questions: Where is it, how much is there, and how is it moving? A new ...

Physics / General Physics

created 17 hours ago | popularity 5 / 5 (4) | comments 0

Repulsive gravity as an alternative to dark energy (Part 2: In the quantum vacuum)

(PhysOrg.com) -- During the past few years, CERN physicist Dragan Hajdukovic has been investigating what he thinks may be a widely overlooked part of the cosmos: the quantum vacuum. He suggests that the quantum vacuum has ...

Physics / General Physics

created Feb 01, 2012 | popularity 4.8 / 5 (71) | comments 138 | with audio podcast report


Our Amorphophallus is smaller: New plant species from Madagascar smells like roadkill

The famed "corpse flower" plant – known for its giant size, rotten-meat odor and phallic shape – has a new, smaller relative: A University of Utah botanist discovered a new species of Amorphophallus that i ...

Invasive alien predator causes rapid declines of European ladybirds

A new study provides compelling evidence that the arrival of the invasive non-native harlequin ladybird to mainland Europe and subsequent spread has led to a rapid decline in historically-widespread species ...

New findings highlight the benefit of exercise ECGs just as they are being scrapped

In the UK, the exercise electrocardiogram (ECG) is the most common initial test for the evaluation of stable chest pain and has been used widely for almost half a century. However, recent NICE guidelines recommend that it ...

Long-term study shows epilepsy surgery improves seizure control and quality of life

While epilepsy surgery is a safe and effective intervention for seizure control, medical therapy remains the more prominent treatment option for those with epilepsy. However, a new 26-year study reveals that following epilepsy ...

New DVT guidelines: No evidence to support 'economy class syndrome'

New evidence-based guidelines from the American College of Chest Physicians (ACCP) address the many risk factors for developing a deep vein thrombosis (DVT), or blood clot, as the result of long-distance travel. These risk ...

Nicira promises virtual networks will transform networking

(PhysOrg.com) -- For the past four years, founders of the start-up company Nicira have been developing cutting-edge software that they predict will transform the networking technology underlying the Internet. ...