--- layout: report year: "2021" month: "07" title: "Reproducible Builds in July 2021" draft: false date: 2021-08-05 15:11:12 --- [![]({{ "/images/reports/2021-07/reproducible-builds.png#right" | relative_url }})](https://reproducible-builds.org/) **Welcome to latest report from the [Reproducible Builds](https://reproducible-builds.org) project.** In this post, we round up the important things that happened in the world of reproducible builds in July 2021. As always, if you are interested in contributing to the project, please visit the [*Contribute*]({{ "/contribute/" | relative_url }}) page on our website. [![]({{ "/images/reports/2021-07/lastmilepy.png#right" | relative_url }})](https://2021.esec-fse.org/details/fse-2021-papers/61/LastPyMile-Identifying-the-Discrepancy-between-Sources-and-Packages) On Friday 27th August, Duc Ly Vu, Fabio Massacci, Ivan Pashchenko, Henrik Plate and Antonino Sabetta will present a paper at the [ACM Foundations of Software Engineering](https://2021.esec-fse.org/) (ESEC/FSE) conference. Titled [**LastPyMile: Identifying the Discrepancy between Sources and Packages**](https://2021.esec-fse.org/details/fse-2021-papers/61/LastPyMile-Identifying-the-Discrepancy-between-Sources-and-Packages), the abstract of the talk mentions that: > Our empirical assessment of 2,438 popular packages in [PyPI](https://pypi.org/) with an analysis of around 10M lines of code shows several differences in the wild: modifications cannot be just attributed to malicious injections. Yet, scanning again all and whole ‘most likely good but modified' packages is hard to manage for FOSS downstream users. We propose a methodology, LastPyMile, for identifying the differences between build artifacts of software packages and the respective source code repository. [[...](https://2021.esec-fse.org/details/fse-2021-papers/61/LastPyMile-identifying-the-discrepancy-between-sources-and-packages)]
[![]({{ "/images/reports/2021-07/arstechnica.jpg#right" | relative_url }})](https://arstechnica.com/gadgets/2021/07/malicious-pypi-packages-caught-stealing-developer-data-and-injecting-code/) Last month, we linked to [Ars Technica](https://arstechnica.com/)'s report that counterfeit packages on [PyPI](https://pypi.org/), the official Python package repository, [contained secret code that installed cryptomining software on infected machines](https://arstechnica.com/gadgets/2021/06/counterfeit-pypi-packages-with-5000-downloads-installed-cryptominers/). This month, however, Dan Goodin reported on another PyPI malware issue: in [**Software downloaded 30,000 times from PyPI ransacked developers' machines**](https://arstechnica.com/gadgets/2021/07/malicious-pypi-packages-caught-stealing-developer-data-and-injecting-code/), Dan writes about a number of malicious payloads (such as [Discord](https://discord.com/) token and credit card 'stealers') that appear to have targeted programmers' computers. ([Another source](https://jfrog.com/blog/malicious-pypi-packages-stealing-credit-cards-injecting-code/).)
Joshua Lock posted to the [VMWare Open Source blog](https://blogs.vmware.com/opensource/) the first part of a two-part security-related series. Titled [**First Steps for Securing the Software Supply Chain**](https://blogs.vmware.com/opensource/2021/07/27/first-steps-for-securing-the-software-supply-chain-part-1/), Joshua mentions: > The Reproducible Builds project develops tools, documentation, standards and patches for upstream open source projects that enable the production of bit-for-bit identical builds given the same inputs. This is no small feat, as many things influence the output of a build. The project's major initial innovation was recognizing that the time at which a build runs is embedded into multiple artifacts produced during that build. It defined a standard way of fixing time for a build, called [`SOURCE_DATE_EPOCH`](https://reproducible-builds.org/specs/source-date-epoch/), that more and more projects are adopting, and which removes a major source of non-deterministic output. Joshua also mentions our sister [Bootstrappable Builds](https://bootstrappable.org/) project, as well as number of other reproducible adjacent tools such as the [Bazel](https://bazel.build/) build system.
[![]({{ "/images/reports/2021-07/vrojmYRHxwY.jpg#right" | relative_url }})](https://www.youtube.com/watch?v=vrojmYRHxwY) Touching on Bazel, Gaspare Vitta recently presented at the [Conf42 Python](https://www.conf42.com) 2021 on [**Reproducible Builds with Bazel**](https://www.youtube.com/watch?v=vrojmYRHxwY). In the abstract for his talk, Gaspare writes: > If you run two builds with the same source code and the same commit but on two different machines, do you expect to get the same result? Well, in most cases you will not! In this talk, we'll identify sources of non-determinism in most build processes and look at how Bazel can be used to create reproducible, hermetic builds. We'll then create a reproducible Flask application that can be built with Bazel so that the Python interpreter and all dependencies are hermetical.
Lastly, it was noticed that Manuel Pöll's thesis at the [Johannes Kepler University](https://www.jku.at/) in Linz, Austria is now available online. Called an [**An Investigation Into Reproducible Builds for AOSP**](https://www.digidow.eu/publications/2020-poell-bachelorthesis/Poell_2020_BachelorThesis_SOAP.pdf) (PDF), Manuel's thesis touches on techniques to achieve deterministic builds in AOSP, more usually known as Google's [Android](https://source.android.com/).
### Community updates [![]({{ "/images/reports/2021-07/ircmeeting.png#right" | relative_url }})](http://meetbot.debian.net/reproducible-builds/2021/reproducible-builds.2021-07-27-15.00.html) We ran a productive meeting on IRC this month ([original announcement](https://lists.reproducible-builds.org/pipermail/rb-general/2021-July/002300.html)) which ran for just short of two hours. A [full set of notes](http://meetbot.debian.net/reproducible-builds/2021/reproducible-builds.2021-07-27-15.00.html) from the meeting is available. Chris Lamb updated the [main Reproducible Builds website and documentation](https://reproducible-builds.org/) this month, including migrating the old 'history' page from the Debian wiki [[...](https://salsa.debian.org/reproducible-builds/reproducible-website/commit/1b5838f)], made the emphasis on 2020 less prominent on the events page [[...](https://salsa.debian.org/reproducible-builds/reproducible-website/commit/0a66019)] in addition to many other changes. Also, Holger Levsen added [MirageOS](https://mirage.io/) to our [projects page]({{ "/who/" | relative_url }}) [[...](https://salsa.debian.org/reproducible-builds/reproducible-website/commit/2d8d0f0)][[...](https://salsa.debian.org/reproducible-builds/reproducible-website/commit/e7cb0dc)] and Tobias Stoeckmann noted that the `#archlinux-reproducible` IRC channel has moved to the [libera.chat](https://libera.chat) network [[...](https://salsa.debian.org/reproducible-builds/reproducible-website/commit/199dc01)]. A number of the Reproducible Builds team are in the process of building an 'ecosystem map' in order to better understand the relationships between projects in and around reproducible builds. This month, Chris Lamb [posted a request to our mailing list](https://lists.reproducible-builds.org/pipermail/rb-general/2021-July/002302.html) to solicit input from the wider community.
### Software development #### [*diffoscope*](https://diffoscope.org) [![]({{ "/images/reports/2021-07/diffoscope.svg#right" | relative_url }})](https://diffoscope.org) [*diffoscope*](https://diffoscope.org) is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, [Chris Lamb](https://chris-lamb.co.uk) made a number of changes, including releasing [version 178](https://diffoscope.org/news/diffoscope-178-released/)) and [version 179](https://diffoscope.org/news/diffoscope-179-released/)) as well as the following changes: * Ensure that various [LLVM](https://llvm.org/) tools are installed, even when testing whether a MacOS binary has no differences compared to *itself*. ([#270](https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/270)) * Rewrite how we calculate the 'fuzzy hash' of a file to make the control flow cleaner. [[...](https://salsa.debian.org/reproducible-builds/diffoscope/commit/15590583)][[...](https://salsa.debian.org/reproducible-builds/diffoscope/commit/2201a325)] * Don't traceback when encountering a broken symlink within a directory. ([#269](https://salsa.debian.org/reproducible-builds/diffoscope/-/issues/269)) * Update some copyright years. [[...](https://salsa.debian.org/reproducible-builds/diffoscope/commit/1f480e07)] In addition, Edward Betts updated the [*try.diffoscope.org*](https://try.diffoscope.org/) service to add a HTML `alt` attribute to an image. [[...](https://salsa.debian.org/reproducible-builds/try.diffoscope.org/commit/2348d26)]
#### Debian [![]({{ "/images/reports/2021-07/debian.png#right" | relative_url }})](https://debian.org/) Roland Clobus sent a second status update on his [progress towards fully-reproducible 'Live' ISO images](https://lists.debian.org/debian-live/2021/07/msg00009.html). Amongst many other things, Roland mentions that all major configurations are now built on a daily basis and only the [Cinnamon](https://en.wikipedia.org/wiki/Cinnamon_(desktop_environment)) image is not reproducible. However, [*diffoscope*](https://diffoscope.org/) has issues when comparing the results — work is in progress to address this [#991059](https://bugs.debian.org/991059). 2 reviews of Debian packages were added, 50 were updated and 33 were removed this month adding to [our knowledge about identified issues](https://tests.reproducible-builds.org/debian/index_issues.html). Three issue types were updated, however: `nondeterminism_in_autolex_bin` is now fixed in Debian *bullseye* [[...](https://salsa.debian.org/reproducible-builds/reproducible-notes/commit/441492f5)], a new `test_suite_logs` issue was added [[...](https://salsa.debian.org/reproducible-builds/reproducible-notes/commit/46454347)] and the description for the `records_build_flags` issue was updated [[...](https://salsa.debian.org/reproducible-builds/reproducible-notes/commit/08b8e365)]. Helmut Grohne and Johannes Schauer Marin Rodrigues reported Debian bug [#990712](https://bugs.debian.org/990712): "While working on `DPKG_ROOT` reproducibility, we observed that the [`dpkg`] trigger database differs for the foreign and native case". [[...](https://bugs.debian.org/990712)] Chris Lamb modified the [Lintian](https://lintian.debian.org/) static analyser for Debian packages to check for Python tracebacks in manual pages. These are usually caused by failing `help2man` calls and, crucially, cause reproducibility issues as the traceback includes absolute path names [[...](https://salsa.debian.org/lintian/lintian/commit/86da641bbb0945746ae14f3078b8d1824d46ea03)]. Lastly, Holger filed Debian bug [#991285](https://bugs.debian.org/991285) to 'unblock' version `1.12-0.1` of *strip-nondeterminism* in order to ensure that this version ended up in the upcoming release of Debian *bullseye*.
#### Mobile development It was noticed that from August 2021, Android ['app bundles'](https://developer.android.com/guide/app-bundle) will become mandatory for the Google Play Store. This will result in smaller file sizes and other advantages for the end-user, yet it will also require app developers to push equivalent 'APK' versions of their apps to other non-Play Store channels as well. But this will also mean that developers will need to supply Google with their app signing keys. The introduction of [code transparency for app bundles](https://developer.android.com/guide/app-bundle/code-transparency) does add an *optional* code signing and verification mechanism (using a separate signing key held solely by the app developer). Unfortunately, code transparency files are not verified at install time — only manual verification is currently possible — and only guarantee the integrity of DEX and native code files (meaning interpreted code and assets could still have been modified). Further information can be found on the announcements on the [Android Authority](https://www.androidauthority.com/android-apks-sunset-1636829/) and [XDA Developers](https://www.xda-developers.com/google-play-billing-v3-app-bundle-requirement-2021/) sites. In addition, The [Jiten Japanese Dictionary](https://f-droid.org/packages/dev.obfusk.jiten/) and [Bitcoin Wallet](https://f-droid.org/en/packages/de.schildbach.wallet/) applications on the [F-Droid](https://f-droid.org) application store are now reproducible using [signatures in metadata](https://f-droid.org/docs/Reproducible_Builds/). Lastly, it was noticed that the [Android library bug affecting *NewPipe*](https://github.com/TeamNewPipe/NewPipe/issues/6486) also affects the [Swiss Covid Certificate](https://github.com/admin-ch/CovidCertificate-App-Android/issues/206#issuecomment-887616373) app.
#### Other distributions [![]({{ "/images/reports/2021-07/archlinux.png#right" | relative_url }})](https://archlinux.org/) Jelle van der Waa posted a blog post detailing the [recent progress of reproducibility-related issues in Arch Linux ](https://vdwaa.nl/arch-repro-july-2021.html), including issues with compressed manual pages as well as embedded build dates and hostnames. *kpcyrd* also [posted a monthly report](https://vulns.xyz/2021/07/monthly-report/) mentioning, reproducibility-related issues in Arch, in addition to documenting his progress towards reproducible [Alpine](https://alpinelinux.org/) Linux on the [Raspberry Pi](https://www.raspberrypi.org/). Finally, Bernhard M. Wiedemann posted his [monthly reproducible builds status report](https://lists.opensuse.org/archives/list/factory@lists.opensuse.org/message/RTRKIE6QJ7YSV7JCLB7DIGWBCXCGHVHB/) for openSUSE.
#### Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: * Bernhard M. Wiedemann: * [`containerd`](https://build.opensuse.org/request/show/904627) (parallelism and Golang `buildid` issues) * [`curl`](https://github.com/curl/curl/pull/7512) (build fails in 2030) * [`duplicity`](https://gitlab.com/duplicity/duplicity/-/merge_requests/63) (date-related issue) * [`google-guest-agent`](https://build.opensuse.org/request/show/904623) (parallelism Golang `buildid` issues) * [`google-osconfig-agent`](https://build.opensuse.org/request/show/904624) (parallelism and Golang `buildid` issues) * [`guile-git`](https://build.opensuse.org/request/show/904617) (parallelism / Guile) * [`latex2html`](https://bugzilla.opensuse.org/show_bug.cgi?id=1188918) (report PID-nondeterminism) * [`lxd`](https://build.opensuse.org/request/show/904621) (parallelism and Golang `buildid` issues) * [`monitoring-plugins`](https://build.opensuse.org/request/show/903362) (drop date from `gettextize`) * [`perl-Web-Machine`](https://build.opensuse.org/request/show/909033) (build failure in 2036) * [`starlette`](https://github.com/encode/starlette/issues/1255) (report build failure in single-core VM) * [`sudoku-sensei`](https://build.opensuse.org/request/show/907272) ([ASLR](https://en.wikipedia.org/wiki/Address_space_layout_randomization)-issue via toolchain component: [reported upstream](https://sourceforge.net/p/sudoku-sensei/bugs/2/)) * [`watchdog`](https://github.com/gorakhargosh/watchdog/issues/823) (report build failure in single-core VM) * Jelle van der Waa: * [`percona-toolkit`](https://github.com/percona/percona-toolkit/pull/499) (date) * [`skaffold`](https://github.com/GoogleContainerTools/skaffold/pull/6238) (date) * Nilesh Patra: * [#990768](https://bugs.debian.org/990768) filed against [`tcode`](https://tracker.debian.org/pkg/tcode). * [#990795](https://bugs.debian.org/990795) filed against [`libident`](https://tracker.debian.org/pkg/libident). * Richard Purdie: * [`python-setuptools`](https://github.com/pypa/setuptools): Sort the output of `glob.glob` as it inherits the nondeterministic ordering of `os.listdir` and the underlying filesystem. [[...](https://github.com/pypa/setuptools/commit/5a0404fa3875a069f7a6436f508116e852909cf2)] * Vagrant Cascadian: * [#990339](https://bugs.debian.org/990339) previously filed against [`matplotlib`](https://tracker.debian.org/pkg/matplotlib) (now [submitted upstream](https://github.com/matplotlib/matplotlib/pull/20608)). * [#990839](https://bugs.debian.org/990839) filed against [`opentest4j`](https://tracker.debian.org/pkg/opentest4j). * [#990840](https://bugs.debian.org/990840) filed against [`apiguardian`](https://tracker.debian.org/pkg/apiguardian). * [#990843](https://bugs.debian.org/990843) and [#990844](https://bugs.debian.org/990844) filed against [`libtheora`](https://tracker.debian.org/pkg/libtheora). * [#990858](https://bugs.debian.org/990858) filed against [`dask`](https://tracker.debian.org/pkg/dask). * [#990862](https://bugs.debian.org/990862) filed against [`infinipath-psm`](https://tracker.debian.org/pkg/infinipath-psm). * [#990910](https://bugs.debian.org/990910) filed against [`p7zip`](https://tracker.debian.org/pkg/p7zip). * [#990912](https://bugs.debian.org/990912) filed against [`perl-tk`](https://tracker.debian.org/pkg/perl-tk). * [#990914](https://bugs.debian.org/990914) filed against [`lcov`](https://tracker.debian.org/pkg/lcov). * [#990952](https://bugs.debian.org/990952), [#990953](https://bugs.debian.org/990953) and [#990969](https://bugs.debian.org/990969) filed against [`lxml`](https://tracker.debian.org/pkg/lxml). * [#990999](https://bugs.debian.org/990999) filed against [`biber`](https://tracker.debian.org/pkg/biber). * [#991001](https://bugs.debian.org/991001) and [#991002](https://bugs.debian.org/991002) filed against [`automake1.11`](https://tracker.debian.org/pkg/automake1.11). * [#991020](https://bugs.debian.org/991020) filed against [`gcc-mingw-w64`](https://tracker.debian.org/pkg/gcc-mingw-w64). * [#991104](https://bugs.debian.org/991104) and [#991106](https://bugs.debian.org/991106) filed against [`antlr`](https://tracker.debian.org/pkg/antlr). * [#991177](https://bugs.debian.org/991177) filed against [`libdebian-installer`](https://tracker.debian.org/pkg/libdebian-installer). * [#991180](https://bugs.debian.org/991180) filed against [`xaw3d`](https://tracker.debian.org/pkg/xaw3d). * [#991181](https://bugs.debian.org/991181) filed against [`cmocka`](https://tracker.debian.org/pkg/cmocka).
#### Testing framework [![]({{ "/images/reports/2021-07/testframework.png#right" | relative_url }})](https://tests.reproducible-builds.org/) Reproducible Builds runs a [Jenkins](https://jenkins.io/)-based testing framework that powers [`tests.reproducible-builds.org`](https://tests.reproducible-builds.org). The following changes were made this month: * Alexander Couzens: * Correct OpenWRT-related log artifacts in a failure case. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/29cfbe18)] * Holger Levsen: * Create a [new view of Debian Live jobs](https://jenkins.debian.net/view/live/) maintained by Roland Clobus. * Randomize the start time of the Debian Live image building. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/f23bdc5f)] * Only run the Debian 'rebuilder prototype' on demand; it has mostly served it's purpose. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/fbfabdc3)][[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/57494ca1)] * Detect [*diffoscope*](https://diffoscope.org/) failures in the health check. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/d0d9293d)][[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/b6bd74aa)] * Build packages with less parallelism on the `i386` architecture to reduce load. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/4fa74bde)][[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/be7a86fd)] * Improve output of reproducible [OpenWrt](https://openwrt.org/)-related jobs. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/d6ad4701)] * Note that a node is low on disk space in the health check, so remind us to remove old kernels. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/c0577fd7)] * Add retired `armhf` architecture nodes to our definition of 'zombies'. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/936eba60)] * Mattia Rizzolo: * Share the same [Apache web server](https://httpd.apache.org/) settings between `debian` and `debian_live_build` artifacts. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/93a6d1ac)] * Roland Clobus: * Build all Debian 'live' images. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/51fde8f5)] * Allow [*diffoscope*](https://diffoscope.org/) to run for longer as the image is currently not reproducible. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/4305455c)] * Vagrant Cascadian: * Default to using a [tmpfs](https://en.wikipedia.org/wiki/Tmpfs)-backed `/tmp` directory for [schroots](https://wiki.debian.org/Schroot). [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/385b4128)] * Retire most `armhf` architecture nodes with only 2GB of RAM. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/123dfab8)] * Match `armhf` nodes named `ff*` for in the `common-functions` script. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/5df489b7)] * Update number of `armhf` boards used for reproducible builds in the documentation. [[...](https://salsa.debian.org/qa/jenkins.debian.net/commit/8c21bd61)]

If you are interested in contributing to the Reproducible Builds project, please visit our [*Contribute*](https://reproducible-builds.org/contribute/) page on our website. However, you can get in touch with us via: * IRC: `#reproducible-builds` on `irc.oftc.net`. * Twitter ([@ReproBuilds](https://twitter.com/ReproBuilds)) and Mastodon ([@reproducible_builds@fosstodon.org](https://fosstodon.org/@reproducible_builds)). * Reddit: [/r/ReproducibleBuilds](https://reddit.com/r/reproduciblebuilds) * Mailing list: [`rb-general@lists.reproducible-builds.org`](https://lists.reproducible-builds.org/listinfo/rb-general)