Over the past five years, our bug bounty program has become an important part of improving our security posture, as it is now for many large tech companies. Transparency and defending the rights of legitimate researchers are cornerstones of the progress we’ve made, and the world is safer for it. To those outside of the security community, it may seem counterintuitive that you can make your platform safer by encouraging security researchers to attack you, but that’s exactly the value that these programs deliver. This process of discovering and remediating bugs is key to our maintaining a highly secure organization and increasingly hardened product surfaces.
Many moons ago, I was working at the New York Times and created a library called Store, which was “a Java library for effortless, reactive data loading.” We built Store using RxJava and patterns adopted from Guava’s Cache implementation. Today’s app users expect data updates to flow in and out of the UI without having to do things like pulling to refresh or navigating back and forth between screens. Reactive front ends led me to think of how we can have declarative data stores with simple APIs that abstract complex features like multi-request throttling and disk caching that are needed in modern mobile applications.
The Dropbox Traffic team is charged with innovating our application networking stack to improve the experience for every one of our users—over half a billion of them. This article describes our work with NS1 to optimize our intelligent DNS-based global load balancing for corner cases that we uncovered while improving our point of presence (PoP) selection automation for our edge network. By co-developing the platform capabilities with NS1 to handle these outliers, we deliver positive Dropbox experiences to more users, more consistently.
Spoiler alert: BBRv2 is slower than BBRv1 but that’s a good thing.
BBRv1 Congestion Control
Three years have passed since “Bottleneck Bandwidth and Round-trip” (BBR) congestion control was released. Nowadays, it is considered production-ready and added to Linux, FreeBSD, and Chrome (as part of QUIC.) In our blogpost from 2017, “Optimizing web servers for high throughput and low latency,” we evaluated BBRv1 congestion control on our edge network and it showed awesome results:
Since then, BBRv1 has been deployed to Dropbox Edge Network and we got accustomed to some of its downsides.
Dropbox server-side software lives in a large monorepo. One lesson we’ve learned scaling the monorepo is to minimize the number of global operations that operate on the repository as a whole. Years ago, it was reasonable to run our entire test corpus on every commit to the repository. This scheme became untenable as we added more tests. One obvious inefficiency is the pointless and wasteful execution of tests that can’t possibly be affected by a particular change.
We addressed this problem with the help of our build system. Code in our monorepo is built and tested exclusively with Bazel.
At Dropbox, we use monitoring and alerting to gain insight into what our server applications are doing, how they’re performing, and to notify engineers when systems aren’t behaving as expected. Our monitoring systems operate across 1,000 machines, with 60,000 alerts evaluated continuously. In 2018, we reinvented our monitoring and alerting processes, obviating the need for manual recovery and repair. We boosted query speed by up to 10000x in some cases by caching more than 200,000 queries. This work has improved both user experience and trust in the platform, allowing our engineers to monitor, debug, and ship improvements with higher velocity and efficiency.
One of the biggest challenges of the mobile developer community at Dropbox in 2018 was our custom build system. Our build system was slow, hard to use, and didn’t support some use cases which were out of scope of the original design. After 4 months of work by our Mobile Platform team, we were able to remove our unicorn implementation for something much more modern and easy to maintain.
In our new build system, we wanted to improve on a couple of things that our current build system was hindering:
- Make it easy to create new modules
- Allow developers to easily modify the build files
- Improve on build times for local development
- Industry standard approaches and tooling,