<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	
	xmlns:georss="http://www.georss.org/georss"
	xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
	>

<channel>
	<title>piper Archives - Pietari Heino&#039;s personal website</title>
	<atom:link href="https://www.extreg.com/blog/tag/piper/feed/" rel="self" type="application/rss+xml" />
	<link>https://www.extreg.com</link>
	<description></description>
	<lastBuildDate>Wed, 22 Feb 2017 05:49:49 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=5.6.7</generator>
<site xmlns="com-wordpress:feed-additions:1">99365322</site>	<item>
		<title>Google&#8217;s ultra-large-scale monolithic source code repository</title>
		<link>https://www.extreg.com/blog/2017/02/googles-ultra-large-scale-monolithic-source-code-repository/</link>
					<comments>https://www.extreg.com/blog/2017/02/googles-ultra-large-scale-monolithic-source-code-repository/#respond</comments>
		
		<dc:creator><![CDATA[Pietari]]></dc:creator>
		<pubDate>Wed, 15 Feb 2017 08:42:57 +0000</pubDate>
				<category><![CDATA[Fascinating engineering]]></category>
		<category><![CDATA[atk]]></category>
		<category><![CDATA[google]]></category>
		<category><![CDATA[google clients in the cloud]]></category>
		<category><![CDATA[google piper]]></category>
		<category><![CDATA[google source-code]]></category>
		<category><![CDATA[google tools]]></category>
		<category><![CDATA[google workflow]]></category>
		<category><![CDATA[piper]]></category>
		<category><![CDATA[trunk-based development]]></category>
		<category><![CDATA[version control]]></category>
		<guid isPermaLink="false">https://extreg.com/?p=142</guid>

					<description><![CDATA[<p>Why Google Stores Billions of Lines of Code in a Single Repository is an excellent paper by Rachel Potvin and Josh Levenberg. They both work at Google, Rachel being an engineering manager and Josh a software engineer. Their writing, published in the Communications of the ACM in July 2016, may be found here. They provide a fascinating ... <span class="more"><a class="more-link" href="https://www.extreg.com/blog/2017/02/googles-ultra-large-scale-monolithic-source-code-repository/">[Read more...]</a></span></p>
<p>The post <a rel="nofollow" href="https://www.extreg.com/blog/2017/02/googles-ultra-large-scale-monolithic-source-code-repository/">Google&#8217;s ultra-large-scale monolithic source code repository</a> appeared first on <a rel="nofollow" href="https://www.extreg.com">Pietari Heino&#039;s personal website</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p><em>Why Google Stores Billions of Lines of Code in a Single Repository</em> is an <strong>excellent</strong> paper by Rachel Potvin and Josh Levenberg. They both work at Google, Rachel being an engineering manager and Josh a software engineer. Their writing, published in the Communications of the ACM in July 2016, may be found <a href="http://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext">here</a>. They provide a fascinating deepdive into the way Google handles source code in a monolithic single repository, the trunk based development, the Google workflow, all the Google-built tooling, pros and cons, and an analysis of using a single repo at ultra-scale.</p>
<p>I <strong>really</strong> hope you read the paper, it&#8217;s wicked fascinating software engineering. I liked the piece so much that I decided to read it again and write some notes/summaries of the different topics touched. Scroll through them below and read the paper (which is <strong>vastly more infromative </strong>than what you can find here!).</p>
<p><span id="more-142"></span></p>
<p><strong>Super-shortly:</strong><br />
Pros: unified versioning, extensive code sharing, simplified dependency management, atomic changes, large-scale refactoring, collaboration across teams, flexible code ownership, code visibility<br />
Cons: having to create *and* scale tools for development and exectuion and maintain code health (also a possibility of potential codebase complexity)</p>
<p><strong>The repo:</strong><br />
~1 billion files<br />
~35 million commits<br />
~85 TB of data<br />
~2 billion lines of code<br />
~9 million source files</p>
<p>2014: 15 million lines of code changed in 250,000 files. 25,000 users and avg 500,000 queries per second.<br />
note: most of the traffic comes from Google&#8217;s automated build and test systems</p>
<p>Compare to Linux kernel: ~15 million lines of code in ~40 000 files</p>
<p><strong>Google Piper design</strong><br />
&#8211; stores a single large repository<br />
&#8211; implemented on top of standard Google infra, namely Spanner<br />
&#8211; distributed on 10 datacenters<br />
&#8211; Paxos for replica consistency<br />
&#8211; Google infra and private networks cut the latency and deliver needed speed<br />
&#8211; Google originally used a massive Perforce instance with custom-built caching and other infra for over 10 years</p>
<p><strong>Piper security</strong><br />
&#8211; supports file-level access control lists<br />
&#8211; most of the stuff seen by everyone, anything may be hidden if needed<br />
&#8211; read/write logs; owner can see who viewed, when, and what<br />
&#8211; purgin of accidental critical secrets<br />
&#8211; for instance business critical secrets like algorithms might not be available for everyone (but: over 99 % of all version-controlled stuff is seen by all full-time Googlers)</p>
<p><strong>Piper workflow</strong><br />
&#8211; create a local copy, store files in the developer&#8217;s workspace<br />
&#8211; &#8211; this is like working copy in Subversion, local clone in Git, or client in Perforce<br />
&#8211; pull updates from Piper<br />
&#8211; share the workspace as a snapshot for other devs to review<br />
&#8211; commit *only* after code-review</p>
<p><strong>Clients in the Cloud, or CitC</strong><br />
&#8211; cloud-based storage backend + Linux-only FUSE fs<br />
&#8211; Piper workspaces seen as directories in the fs<br />
&#8211; support the usual Unix tools<br />
&#8211; local changes laid on top<br />
&#8211; browsing, searching, editing any files in the Piper repo<br />
&#8211; only edited files stored locally<br />
&#8211; avg workspace has &lt;10 files while still showing everything in the Piper repo<br />
&#8211; *all writes* stored automatically, can be tagged, named, and rollbacked</p>
<p><strong>Trunk-based development</strong><br />
&#8211; vast majority of Piper users work on &#8220;head&#8221;, &#8220;trunk&#8221;, or &#8220;mainline&#8221;, that is the most recent version of everything<br />
&#8211; all commits in there<br />
&#8211; all changes seen by everyone using Piper after every commit (remember: commits only after code-review)<br />
&#8211; using branches very very rare except for releases<br />
&#8211; releases usually a snapshot of the trunk + cherry-picks from it<br />
&#8211; no dev branches, no feature branches, no nothing<br />
&#8211; feature-development through the use of feature-flags in code<br />
&#8211; feature-flags controlled by conf. files, no need for new binaries<br />
&#8211; feature-flags typically used in project-specific code, not libraries<br />
&#8211; easy to experiment with small amount of users</p>
<p><strong>Code review</strong><br />
&#8211; nothing is committed without a code review<br />
&#8211; the committer can enable a flag for auto-commit if the review passes<br />
&#8211; the reviewers have tools for viewing and adjusting the code easily anywhere in the Piper repo (tools are named Critique and CodeSearch)<br />
&#8211; commits have to be accepted by directory owners<br />
&#8211; remember: the whole Piper repo is availabe for anyone -&gt; anyone can propose changes in any piece of code anywhere, but the owners of directories have to accept them<br />
&#8211; directory owners are the people most familiar with the code/project/library in question</p>
<p><strong>Commit-infra &amp; refactoring</strong><br />
&#8211; automatic rebuild of all dependencies, testing<br />
&#8211; automatic rollback in case of widespread breakage<br />
&#8211; vast and customizable pre-submit testing and analysis, runs before anything is committed<br />
&#8211; static analysis system called Tricorder<br />
&#8211; &#8211; provides data on code quality, test coverage, test results<br />
&#8211; &#8211; provides automatic suggestions for fixes with one-click applying<br />
&#8211; &#8211; triggered after all changes and periodically<br />
&#8211; &#8211; used to ensure codebase health<br />
&#8211; set of devs periodically dig through Piper directories to refactor code in order to keep it healthy<br />
&#8211; large backwards-compatible changes first, removing unused paths second<br />
&#8211; tool called Rosie suppors that by splitting the large patches made by the devs into smaller patches that are individually reviewed by the directory owners</p>
<p><strong>Analysis</strong><br />
<strong>Advantages</strong><br />
&#8211; unified versioning, one source of truth<br />
&#8211; extensive code-sharing and reuse<br />
&#8211; simplified dependency management<br />
&#8211; atomic changes<br />
&#8211; large-scale refactoring<br />
&#8211; collaboration across teams<br />
&#8211; flexible team boundaries, code ownership and visibility, implicit namespacing<br />
&#8211; all code depend on other code directly<br />
&#8211; the diamond-dependency problem is gone<br />
&#8211; atomic changes enable refactorings of variables or api calls for hundreds of thousands of files without test/build breakage (in a single commit)<br />
&#8211; engineers don&#8217;t depend on specific versions -&gt; no need to update them<br />
&#8211; all files uniquely identified<br />
&#8211; a good example:<br />
&#8211; &#8211; the Google compiler team can run regression etc. tests nightly on all affected code and validate new versions<br />
&#8211; &#8211; code can be refactored to support new versions of compilers before shipping them<br />
&#8211; &#8211; ~20 compiler releases a year<br />
&#8211; &#8211; compilers can be tuned to use best possible default settings</p>
<p><strong>Drawbacks, trade-offs etc.</strong><br />
&#8211; tooling investment is HUGE<br />
&#8211; couldn&#8217;t be used without all the special support-systems<br />
&#8211; codebase complexity,<br />
&#8211; unnecessary dependencies<br />
&#8211; discoverity difficulties<br />
&#8211; effort in code health<br />
&#8211; sometimes hard to explore code<br />
&#8211; the usual suspects like grep unusable from time to time<br />
&#8211; too easy to add dependencies -&gt; unused dependencies<br />
&#8211; lack of will to write documentation if everyone can look up the apis themselves<br />
&#8211; depending on more than just the api because you see how the code works</p>
<p><strong>Alternatives</strong><br />
&#8211; the favoring and use of DVCSs has have grown -&gt; moving has been investigated<br />
&#8211; moving to a DVCS (eg Git) would require a split to thousands of repos<br />
&#8211; &#8211; Android is Git hosted and North of 800 repos<br />
&#8211; currently available DCVSs don&#8217;t provide needed security controls<br />
&#8211; investigating whether Mercurial could be made to support Google scale</p>
<p>Checkout the wonderful paper <a href="http://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext">here</a>.</p>
<p>The post <a rel="nofollow" href="https://www.extreg.com/blog/2017/02/googles-ultra-large-scale-monolithic-source-code-repository/">Google&#8217;s ultra-large-scale monolithic source code repository</a> appeared first on <a rel="nofollow" href="https://www.extreg.com">Pietari Heino&#039;s personal website</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://www.extreg.com/blog/2017/02/googles-ultra-large-scale-monolithic-source-code-repository/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
		<post-id xmlns="com-wordpress:feed-additions:1">142</post-id>	</item>
	</channel>
</rss>
