description: cryptographic paradigm involving uninterrupted protection of data traveling between two communicating parties
50 results
The Art of Invisibility: The World's Most Famous Hacker Teaches You How to Be Safe in the Age of Big Brother and Big Data
by
Kevin Mitnick
,
Mikko Hypponen
and
Robert Vamosi
Published 14 Feb 2017
When you encrypt a message—an e-mail, text, or phone call—use end-to-end encryption. That means your message stays unreadable until it reaches its intended recipient. With end-to-end encryption, only you and your recipient have the keys to decode the message. Not the telecommunications carrier, website owner, or app developer—the parties that law enforcement or government will ask to turn over information about you. How do you know whether the encryption service you are using is end-to-end encryption? Do a Google search for “end-to-end encryption voice call.” If the app or service doesn’t use end-to-end encryption, then choose another.
…
Like the Tor browser, the app anonymizes your IP address, which means that messages are difficult to trace (however, please note that, like with the Tor browser, exit nodes are not by default under your control; see here). Instant messages are encrypted using end-to-end encryption. Like Tor, the app is a little difficult for the first-time user, but eventually it should work to provide truly private text messages.24 There are also commercial apps that provide end-to-end encryption. The only caveat is that their software is proprietary, and without independent review their security and integrity cannot be confirmed. Silent Phone offers end-to-end encryption text messaging. It does, however, log some data, but only to improve its services. The encryption keys are stored on the device.
…
The most important takeaways are: first, be aware of all the ways that someone can identify you even if you undertake some but not all of the precautions I’ve described. And if you do undertake all these precautions, know that you need to perform due diligence every time you use your anonymous accounts. No exceptions. It’s also worth reiterating that end-to-end encryption—keeping your message unreadable and secure until it reaches the recipient as opposed to simply encrypting it—is very important. End-to-end encryption can be used for other purposes, such as encrypted phone calls and instant messaging, which we’ll discuss in the next two chapters. CHAPTER THREE Wiretapping 101 You spend countless hours on your cell phone every day, chatting, texting, surfing the Internet.
Snowden's Box: Trust in the Age of Surveillance
by
Jessica Bruder
and
Dale Maharidge
Published 29 Mar 2020
As of this writing, one of the easiest options is ProtonMail, a free, open-source provider based in Switzerland with end-to-end encryption. It doesn’t offer all the bells and whistles found on commercial services like Gmail, Outlook, and YahooMail, but we’ve both found it tremendously useful as a secure, use-as-needed secondary account. (When we started writing the magazine article that led to this book, we convinced our Harper’s editor, James Marcus, to get a ProtonMail account too.) The service does have drawbacks: its end-to-end encryption only works if you’re emailing another ProtonMail user. The main benefit is that your messages remain encrypted on ProtonMail’s server.
…
Ranking 11 Technology Companies on Encryption and Human Rights,” Amnesty’s researchers found that many major firms are failing to provide their users with a reasonable standard of security. Their report noted that “only three of the companies assessed — Apple, LINE, Viber Media — apply end-to-end encryption as a default to all of their IM services. Of these, none are fully transparent about the system of encryption they are using.” The researchers also learned that some companies don’t practice what they preach. “For example, Microsoft has a clear stated commitment to human rights, but is not applying any form of end-to-end encryption on its Skype service,” they wrote. Other organizations are leading the charge to protect the rights of minorities and targeted groups.
…
‘‘By forming temporary alliances with Google, Facebook and [others], privacy activists gave them a PR platform to show the world that they are also concerned about their users’ privacy — rather than to actively challenging their business practices and informing the public about the essential role they have played — voluntarily or not — in the NSA’s regime,’’ wrote German media scholar Till Wäscher. In 2016, the popular messaging service WhatsApp began using end-to-end encryption to protect users’ communications. The same year, tensions rose as Apple defied a federal court order to help the FBI break into an iPhone belonging to one of the perpetrators of a San Bernardino mass shooting that killed fourteen people. “The same engineers who built strong encryption into the iPhone to protect our users would, ironically, be ordered to weaken those protections and make our users less safe,” said Apple CEO Tim Cook, who later made “privacy is a human right” his public mantra.
Applied Cryptography: Protocols, Algorithms, and Source Code in C
by
Bruce Schneier
Published 10 Nov 1993
Disadvantages: Data is exposed in the intermediate nodes. Figure 10.2 End-to-end encryption. Building end-to-end encryption equipment is difficult. Each particular communications system has its own protocols. Sometimes the interfaces between the levels are not well-defined, making the task even more difficult. If encryption takes place at a high layer of the communications architecture, like the applications layer or the presentation layer, then it can be independent of the type of communication network used. It is still end-to-end encryption, but the encryption implementation does not have to bother about line codes, synchronization between modems, physical interfaces, and so forth.
…
The major disadvantage of end-to-end encryption is that it allows traffic analysis. Traffic analysis is the analysis of encrypted messages: where they come from, where they go to, how long they are, when they are sent, how frequent or infrequent they are, whether they coincide with outside events like meetings, and more. A lot of good information is buried in that data, and a cryptanalyst will want to get his hands on it. Table 10.3 presents the positive and negative aspects of end-to-end encryption. Combining the Two Table 10.4, primarily from [1244], compares link-by-link and end-to-end encryption. Combining the two, while most expensive, is the most effective way of securing a network.
…
Combining the two, while most expensive, is the most effective way of securing a network. Encryption of each physical link makes any analysis of the routing information impossible, while end-to-end encryption reduces the threat of unencrypted data at the various nodes in the network. Key management for the two schemes can be completely separate: The network managers can take care of encryption at the physical level, while the individual users have responsibility for end-to-end encryption. Table 10.3 End-to-End Encryption: Advantages and Disadvantages Advantages: Higher secrecy level. Disadvantages: Requires a more complex key-management system. Traffic analysis is possible, since routing information is not encrypted.
Click Here to Kill Everybody: Security and Survival in a Hyper-Connected World
by
Bruce Schneier
Published 3 Sep 2018
I evaluated the standard in 1999 and concluded that its unnecessary complexity had a “devastating effect” on security. Today, end-to-end encryption still isn’t ubiquitous on the Internet, although it’s getting better. A second example: in the secret government-only standards process for digital cellular encryption, many believe that the NSA ensured that algorithms used to encrypt voice traffic between the handset and the tower are easily breakable, and that there is no end-to-end encryption between the two communicating parties. The result is that your cell phone conversations can easily be monitored. Both of these were probably part of NSA’s BULLRUN program, whose aim was to weaken public security standards.
…
ENCRYPT AS MUCH AS POSSIBLE Governments should have the goal of encrypting as much of the Internet+ as possible. There are many facets to this. One: we need end-to-end encryption for communications. This means that all communications should be encrypted from the sender’s device to the receiver’s device, and that no one in the middle should be able to read that communication. This is the encryption used by many messaging apps, like iMessage, WhatsApp, and Signal. This is how encryption in your browser works. In some cases, true end-to-end encryption isn’t desirable. Most of us want Google to be able to read our e-mail, because that’s how it sorts it into folders and deletes spam.
…
For instance, they could block all sorts of things: spam, child pornography, Internet attacks, and so on. All of these things, however, currently require ISPs to engage in bulk surveillance of Internet traffic, and they won’t work if the traffic is encrypted. And given the choice, we are much more secure if Internet traffic is end-to-end encrypted. I’ll talk more about that in Chapter 9. SECURE OUR CRITICAL INFRASTRUCTURE In 2008, unidentified hackers broke into the Baku-Tbilisi-Ceyhan oil pipeline in Turkey. They gained access to the pipeline’s control system and increased the pressure of the crude oil flowing inside, causing the pipe to explode.
System Error: Where Big Tech Went Wrong and How We Can Reboot
by
Rob Reich
,
Mehran Sahami
and
Jeremy M. Weinstein
Published 6 Sep 2021
In the world of digital communication, technologists have created much more powerful and secure techniques for encryption based on complex mathematics. So important is this technology to the field of computing that in recent years numerous cryptographers have received the A. M. Turing Award for their advancement of encryption technology. WhatsApp transmits messages using a system known as end-to-end encryption: the content of a message is encrypted on the sender’s device and decrypted only when it reaches the receiver’s device. As a result, no one except the sender and the receiver can read the message—not WhatsApp, not the internet service provider, not anyone who might be trying to tap the communications network—as the message travels over the internet and through WhatsApp’s servers while encrypted.
…
Within a few years, both Acton and Koum would leave Facebook citing concerns over user privacy and the way Facebook was monetizing WhatsApp’s user base. In turn, in 2019, Mark Zuckerberg, still grappling with the privacy concerns heightened as a result of the Cambridge Analytica scandal, announced an intention to provide end-to-end encryption on all Facebook messaging services, including Facebook Messenger and Instagram Direct. If you care about privacy, this all sounds appealing unless you are the head of the FBI, trying to track down terrorist sympathizers who are plotting an attack in a major US city or a human rights campaigner in India who has discovered that political gangs are using encrypted communication technologies to organize anti-Muslim violence in advance of an election.
…
If you care about privacy, this all sounds appealing unless you are the head of the FBI, trying to track down terrorist sympathizers who are plotting an attack in a major US city or a human rights campaigner in India who has discovered that political gangs are using encrypted communication technologies to organize anti-Muslim violence in advance of an election. Just to underscore the enormity of this challenge: After the siege of the US Capitol and the deplatforming of President Trump, downloads of end-to-end encrypted applications exploded. As the plotters of the insurrection decamped for smaller but completely private messaging platforms, the task of tracking and disrupting the activities of domestic terrorists became far more difficult. How should we weigh the value of privacy against other important benefits?
The Wires of War: Technology and the Global Struggle for Power
by
Jacob Helberg
Published 11 Oct 2021
“Senator, we run ads,” Zuckerberg patiently explained.54 In that 2020 hearing, Congressman Greg Steube, last seen telling Bezos he was on mute, demanded to know why his campaign emails were winding up in Gmail’s spam folder.55 Remember, these are the same legislators who are supposed to be regulating the tech industry. This ignorance comes at a cost. In the midst of the debate over Apple decrypting its iPhones after the San Bernardino attack, Senators Dianne Feinstein and Richard Burr proposed a bill that would have effectively outlawed end-to-end encryption. Kevin Bankston, the director of the Open Technology Institute at the New America foundation, pronounced it “easily the most ludicrous, dangerous, technically illiterate proposal I’ve ever seen” in nearly two decades of working in tech policy.56 While Congress once had in-house experts—an Office of Technology Assessment analyzing scientific legislation much like the Congressional Budget Office analyzes the financial ramifications of bills—that expertise was eliminated in 1995 as part of House Speaker Newt Gingrich’s shrinking of government.57 In the Gray War, ignorance isn’t bliss.
…
By April 2020, that number had spiked to 300 million.125 As Zoom struggled to manage this explosive growth, the company came under fire for a range of privacy and security lapses. Malicious hackers hijacked virtual classrooms to display pornography and swastikas across students’ screens, a practice now known as Zoombombing. Closer scrutiny revealed that the end-to-end encryption Zoom advertised was a misnomer, allowing Zoom itself to access unencrypted video and audio content from a call.126 Zoom even admitted that it had “mistakenly” routed non-China traffic through some of its Chinese servers.127 Most egregiously, in June 2020, several hundred American and Chinese activists were commemorating the 31st anniversary of the Tiananmen Square massacre when their Zoom videoconference suddenly cut out.128 The glitch, it turned out, was not technical but ideological.
…
What happens, for example, when a user in the United States converses across borders with a user in China? Or when a Chinese citizen based in Vietnam uses Zoom for academic purposes and discusses the Tiananmen massacre in Mandarin? It is also unclear how Zoom intends to continue complying with China’s local laws and government requests if it fulfills its recent promise of making end-to-end encryption available for all users globally. Then there are the legal obstacles. Under international law, litigants can use the legal discovery process to compel a foreign corporation to produce information through its domestic affiliates. This means that the Chinese government or any Chinese entity could compel Zoom’s U.S.
Talk Is Cheap: Switching to Internet Telephones
by
James E. Gaskin
Published 15 Mar 2005
The manual intervention part happens at the beginning when your recipient accepts the transfer. After that, it putters slowly along without needing attention. Two good features make this bearable. First, the file, just like a Skype conversation, is encrypted. This may be the easiest way to send files with end-to-end encryption on the Internet. Second, the transfer will complete as long as both partners leave the transfer windows open. Even if one partner gets disconnected or turns off their computer, the transfer will pick up where it left off when both parties are online again. Figure 6-12. A Skype relayed file transfer in slow motion The transfer window, as you can see in Figure 6-12, includes icons to initiate a phone conversation or chat session.
…
Sneak away to the coffee shop to read your favorite comics (www.comics.com) so the people in the surrounding cubicles won't hear you laugh? The Teleo service finds your laptop. More business friendly, Teleo allows companies to private-label their product. Since many companies have thousands of mobile or remote workers relying heavily on their laptops, that's an attractive offer. The end-to-end encryption provided by Teleo, similar to the encryption with Skype, makes an even stronger sales pitch for some industries and paranoid managers. Told you bits are bits, and Internet Telephony will turn voice streams into data managed just like any other stream. So pay the $2.95 or $4.95 per month depending on minutes for Teleo and have your cell phone and laptop nag you wherever you are.
…
The manual intervention part happens at the beginning when your recipient accepts the transfer. After that, it putters slowly along without needing attention. Two good features make this bearable. First, the file, just like a Skype conversation, is encrypted. This may be the easiest way to send files with end-to-end encryption on the Internet. Second, the transfer will complete as long as both partners leave the transfer windows open. Even if one partner gets disconnected or turns off their computer, the transfer will pick up where it left off when both parties are online again. Figure 6-12. A Skype relayed file transfer in slow motion The transfer window, as you can see in Figure 6-12, includes icons to initiate a phone conversation or chat session.
The Hype Machine: How Social Media Disrupts Our Elections, Our Economy, and Our Health--And How We Must Adapt
by
Sinan Aral
Published 14 Sep 2020
Microsoft estimated that 64 percent of Indians encountered fake news online ahead of Indian elections in 2019. In India, where 52 percent of people report getting news from WhatsApp, private messaging is a particularly insidious breeding ground for fake news, because people use private groups with end-to-end encryption, making it difficult to monitor or counteract the spread of falsity. In the Philippines, the spread of misinformation propagated to discredit Maria Ressa, the Filipino-American journalist working to expose corruption and a Time Person of the Year in 2018, was vast and swift. Similar to the Russian influence operation in Crimea, the misinformation campaign against Ressa mirrored the charges that were eventually brought against her in court.
…
A month after my interview, Mark Zuckerberg began his keynote at Facebook’s annual developer conference F8 by arguing that “the future is private,” explaining that “a private social platform will be even more important to our lives than our digital town squares” and outlining an about-face from “connecting the world” to “a privacy-focused vision for social networking.” Facebook announced it would be unifying its messaging services, from WhatsApp to Messenger to Instagram, and embracing private, secure, end-to-end encryption. “In the history of Facebook, there have been four major versions of the product so far and this is the fifth,” Zuckerberg said. “So we’re calling this FB 5.” Facebook was locking down and going private. Two shocks to the system forced this shift. First, Cambridge Analytica highlighted the dangers of freely sharing the Hype Machine’s private data for behavioral targeting, election manipulation, and the broader threat to democracy.
…
Do we want to entrust them with that responsibility? In October 2019, U.S. attorney general William Barr, U.K. secretary of state for the Home Department Priti Patel, and Australian minister of home affairs Peter Dutton sent a letter to Mark Zuckerberg asking him to halt plans for end-to-end encryption and requesting backdoor access for their governments to root out criminals on Facebook. A few days later FBI director Christopher Wray condemned the encryption plan, calling it “a dream come true for predators and child pornographers,” adding that making Facebook private and encrypted would produce “a lawless space created not by the American people or their representatives but by the owners of one big company.”
Docker in Action
by
Jeff Nickoloff
and
Stephen Kuenzli
Published 10 Dec 2019
These networks are described in the docker-compose.yml application descriptor as follows: version: '3.7' networks: public: driver: overlay driver_opts: encrypted: 'true' private: driver: overlay driver_opts: encrypted: 'true' attachable: true Note The value true is quoted for driver_opts because Docker requires a string or number. The value of true is unquoted for attachable because Docker requires a boolean. Both networks are defined by adding named entries to a top-level networks key in the application descriptor. The team that built this application has an end-to-end encryption requirement. The application definition satisfies a large portion of that requirement by encrypting all of the traffic on the networks used by the application. The only remaining work is to secure communications on the service’s published port by using TLS. Section 13.3 explains why applications should secure the published ports, and chapter 12’s greetings application showed one way to do this.
…
The ingress network’s only responsibility is to route traffic from external clients connected to ports published by Docker services within the cluster. This network is managed by Swarm, and only Swarm can attach containers to the ingress network. You should be aware that the default configuration of the ingress network is not encrypted. If your application needs end-to-end encryption, all services that publish ports should terminate their connections with TLS. TLS certificates can be stored as Docker secrets and retrieved by services on startup, just as we have demonstrated with passwords in this chapter and TLS certificates in chapter 12. Next, we will explore how Docker helps services discover and connect to each other on a shared network. 13.3.3.
…
Docker API docker build command, 2nd, 3rd, 4th docker CLI, 2nd Docker CloudStor docker command, 2nd docker command-line tool, 2nd docker config command, 2nd docker config create command docker container commit command, 2nd, 3rd docker container create command, 2nd, 3rd, 4th, 5th, 6th docker container diff command, 2nd docker container export command, 2nd docker container ps command, 2nd, 3rd docker container rm -f command docker container run command, 2nd, 3rd, 4th, 5th, 6th, 7th docker cp command, 2nd docker create command, 2nd, 3rd, 4th, 5th, 6th docker diff command Docker Engine API, 2nd Docker Enterprise Edition docker exec command, 2nd docker help command Docker Hub, 2nd, 3rd, 4th, 5th, 6th docker image build command, 2nd, 3rd, 4th, 5th, 6th, 7th docker image inspect command docker image load command docker image ls registry docker image pull command, 2nd docker image push command docker image save command, 2nd docker image tag command docker images command, 2nd docker import command docker inspect command, 2nd, 3rd, 4th, 5th docker kill command docker load command, 2nd, 3rd docker login command, 2nd docker logout command docker logs agent command docker logs command docker logs diaweb command docker logs mailer command docker network command docker network list command docker network ls command docker node ls command docker node update command docker plugin command docker port command docker ps -a command docker ps command, 2nd, 3rd, 4th, 5th docker pull command, 2nd, 3rd docker push command docker rename command docker restart agent command docker restart mailer command docker restart web command docker rm command docker rm -f command docker rm -v flag docker rmi command, 2nd docker rmi syntax docker run - -rm flag docker run command, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, 9th, 10th, 11th, 12th, 13th docker save command, 2nd, 3rd docker secret command docker secret create command docker service command docker service create command, 2nd docker service inspect command, 2nd docker service inspect hello-world command docker service inspect my-databases_postgres command docker service logs command docker service ls command, 2nd docker service ps command, 2nd, 3rd, 4th docker service ps greetings_prod_api command docker service ps hello-world command, 2nd docker service ps multi-tier-app_api docker service remove command docker service rollback subcommand docker service scale command docker stack command, 2nd docker stack deploy command, 2nd, 3rd, 4th, 5th docker stack ps command docker stack subcommands docker start command docker stats command docker stop command, 2nd Docker Swarm clustering with communicating with services running on Swarm cluster load balancing overlay networks routing client requests to services by using Swarm routing mesh deploying applications to Swarm clusters cluster resource types defining application and its dependencies using docker stack command placing service tasks on clusters, 2nd constraining where tasks run deploying real applications onto real clusters replicating services using global services for one task per node docker swarm init docker swarm join docker swarm join-token docker tag command, 2nd docker top command docker volume command docker volume create command, 2nd docker volume inspect command docker volume list command docker volume ls command docker volume prune command docker volume remove command Dockerfiles, 2nd, 3rd, 4th distributing projects with, on GitHub downstream build-time behavior, injecting filesystem instructions maintainable, creating metadata instructions naming Dockerfiles organizing metadata with labels overview overview packaging Git with docker_hello_world keyspace dockerinaction client dockerinaction username docker-machine docker-machine ip command docker-machine ssh command docker.sock dockremap double-quote style downstream build-time behavior, injecting downstream Dockerfile drain option driver_opts Drone dst parameter durable containers automatically restarting containers using PID, 2nd E echo command echo hello world value echo service, 2nd echo-global service - -email flag endpoint-mode property end-to-end encryption engine.labels attribute - -entrypoint flag, 2nd ENTRYPOINT instruction, 2nd, 3rd, 4th entrypoints, 2nd - -env (-e) flag, 2nd ENV instruction environment property environment variable injection environment-agnostic system, building environment variable injection read-only filesystems environmental preconditions validation env_specific_config resource, 2nd, 3rd env_specific_config_v1 env_specific_config_v2 env_specific_config_vNNN escape sequences Ethernet interface, 2nd exec command exec subcommand exit command expertise, required for distribution method export command, 2nd export subcommand EXPOSE instruction external: true property F f option - -file (-f) flag, 2nd, 3rd file extensions filesystems instructions, Dockerfiles structure union filesystems, weaknesses of firewalls, lack of First Secret Problem flag flat filesystems, exporting and importing flexible container identification flow collections foo network - -force (-f) flag - -format (-f) option FROM instruction, 2nd, 3rd, 4th, 5th, 6th, 7th, 8th, 9th FROM ubuntu:latest FTP (File Transport Protocol) FTP-based distribution infrastructure ftp-server container G Gathering Metadata message GID (group ID) Git packaging Dockerfiles with preparing packaging for GitHub, distributing projects with Dockerfiles on global mode, 2nd, 3rd, 4th global value Go programming language Golang, 2nd golang repository Google Container Registry gosu program greetings service, 2nd, 3rd, 4th greetings_dev service greetings_dev_env_specific_config resource group ID (GID) H hadolint linting tool hardening images content addressable image identifiers SUID and SGID permissions user permissions hash sign ( #) health checks HEALTHCHECK instruction, 2nd, 3rd - -health-cmd flag “Hello, World”, 2nd automated resurrection and replication automated rollout packaging service health and rollback hello-registry image hello-world service, 2nd, 3rd high-level system services Homebrew host driver host network host value .HostConfig.CapAdd member .HostConfig.CapDrop member hosted registries, publishing with private hosted repositories public repositories hosted registry - -hostname flag hostname flag, 2nd HTTP (Hypertext Transfer Protocol) HTTP POST message http-client image http-client program httpping program https I –i command id command identifier (unique ID) IMAGE ID column image layers image pipelines goals of orchestrating build with make patterns for building images all-in-one images image build pattern maturity separate build and runtime images variations of runtime image via multistage builds patterns for tagging images background configuration image per deployment stage continuous delivery with unique tags semantic versioning record metadata at image build time testing images in build pipelines image property, 2nd image source distribution, GitHub image source-distribution workflows image tag mutation image-dev container ImageMagick images as files building from containers committing new images configuring image attributes packaging “Hello, World” preparing packaging for Git reviewing filesystem changes hardening content addressable image identifiers SUID and SGID permissions user permissions startup scripts and multiprocess containers, using environmental preconditions validation health checks initialization processes. .
Reset
by
Ronald J. Deibert
Published 14 Aug 2020
Under the reign of prime minister Narendra Modi and his Hindu nationalist right-wing party, the BJP, the country has rapidly descended into authoritarianism.282 Disinformation and misinformation run rampant through social media, sparking frequent outbreaks of mob violence and ethnic animosity — the most recent of which was horrific sectarian violence unleashed principally by Hindu mobs against Muslims in northeast Delhi in February and March 2020. WhatsApp groups, which are end-to-end encrypted, making it more difficult for the platform to moderate content, are especially prone to these spikes of panic- or violence-inducing disinformation. Like those in many other countries, Indian authorities have rolled out technologically enhanced surveillance and other information controls.
…
It may seem laughable to think we could “slow down” this hypermedia environment, but there’s no practical reason why not. Consider WhatsApp, whose platform’s “group chat” function was widely associated with inflaming ethnic animosity, inciting violence, and spreading disinformation around the world. Since its messages are end-to-end encrypted, which limits the platform’s ability to screen actual content, WhatsApp took steps to limit the size of groups and the number of times content could be forwarded to others — a technical tweak a group of Brazilian researchers explicitly described as introducing friction into the application.442 Although the measures were taken by the company itself, there is no reason such technical tweaks could not be mandated by law.
…
Perhaps the most obvious but easy to overlook is encryption. By turning private communications into gibberish for all but those who hold the key, robust and peer-reviewed encryption provides a major check on the ability of outside forces to monitor what we do. That not only keeps us safe, it also protects against the abuse of power. End-to-end encryption messaging applications, such as those now standard on Signal and WhatsApp, will be essential to protecting privacy and human rights. Encryption of data at rest can help protect users against cybercrime, and provide another form of recessed restraint to prevent unauthorized access and use. Other technologies also provide recessed power and restraint on the abuse of power: circumvention technologies allow us to bypass internet censorship, while tools like Tor give users the means to anonymously browse the web.
Twitter and Tear Gas: The Power and Fragility of Networked Protest
by
Zeynep Tufekci
Published 14 May 2017
Censorship is often thought of as simply blocking individuals from accessing information, but information is experienced and disseminated collectively and socially. Information travels in social networks; hence circumvention of censorship is also a collective, networked undertaking. Many users are networked in friendship and social groups, especially on easy, widely used chat applications that are encrypted end-to-end, such as WhatsApp. End-to-end encryption means that even Facebook (WhatsApp’s owner) cannot read the content of the communication. News travels far and wide in such groups, and this type of communication does not require that every person practice active circumvention. All that is needed is one person to circumvent censorship in accessing the information and then share the information on a network.
…
See also false news; hoaxes disruptive capacity, 192, 197–99, 202, 204–5, 213, 222 doubt, credibility denials or hoax claims creating, 244–51 doxing, 178 Durkheim, Émile, 89–90, 127 eBay, network effects of, 135–36 education, national public sphere evolution impacting, 3–4 Egypt: capacity signaling in, 190–91, 197, 202–3, 210 censorship and repression in, 13, 18, 141–42, 226–28, 231 culture of protest in, 83, 84, 85–86, 93, 99, 102, 105 dismissal of networked power in, 18, 133–34 election boycotts in, 80 Facebook usage in, 18, 20, 22–24, 27, 132, 133–34, 227 false news in, 266 Mubarak’s resignation in, 8, 23, 77, 190 networked protests in, 8–10, 16, 22–24, 27, 53–60, 77–78, 79–80, 83, 84, 85–86, 93, 99, 102, 105, 133–34, 197, 202–3, 210, 226–28, 231 Said murder protests in, 22–23, 139–42 soccer fans as protesters in, 107 social-technological interactions in, 118 Tahrir Square protests in, 8–10, 23–24, 77–78, 79–80, 83, 84, 85–86, 93, 99, 102, 105, 133–34, 197, 202–3, 210, 226–28, 231 Tahrir Supplies in, 38, 53–60 Twitter usage in, 53, 54–60, 118, 139, 227 Ekşi Sözlük, 44, 47 electoral or institutional capacity, 192–93, 195–97, 205, 208–9, 214–16, 217–18, 219–20, 274–77 elite unity or disunity, 220 end-to-end encryption, 230 Europe: antiausterity protests in, 99 capacity signaling in, 196, 210 People’s Spring in, 7 platforms and algorithms in, 134, 147 public sphere evolution in, 5. See also specific countries Facebook: acquisitions of smaller platforms by, 137 ad-financing model for, 128, 136–37, 138, 140 algorithms and algorithmic control of, 133, 154–63, 207, 265 Arabic version of, 13–14, 20 attention garnered via, 47, 270 backchannel communication via, 138 bias in, 160–62 bridge ties between weak ties via, 21–22, 24 capacity signaling via, 202, 211, 214, 218 censorship by, 20, 141–42, 149–54 coffee shop or gathering place function of, 138 communication architecture of, 55–56 community policing of, 143–45, 146, 153 content governance and restrictions on, 146–54, 292n22 credibility denials of posting via, 249 “cute cat theory” of activism via, 20 demonizing, 242 employees of, 143, 151–52 experiments on power of, 157–58 false news via, 264–67 founding of, 30 gatekeeping by, 162 government censorship of, 227, 238 “Like” button, 125, 128, 133, 149, 158–59, 160, 161–62 mass media criticism of, 147 networked public sphere via, 13–15, 18, 19–24, 27, 133–34, 162 network effects of, 20–21, 135–36, 137 number of users, 135 organizational coordination and communication via, 51 platform expansion, 132, 133–34, 135–38 reaction buttons, 161–62, 294n37 (see also “Like” button subentry) “real-name” policy of, 139–46, 171, 182–83 report and takedown policing model for, 143–45, 146, 153 reputation development on, 171 social-technological interactions via, 118, 128 surveillance of postings via, 251 technodeterminism due to, 119 “We Are All Khaled Said” page on, 22–23, 139, 140–41 FaceTime app, 257 false news, 39–40, 41, 43, 183–85, 239–40, 264–68.
…
See also Hong Kong democracy protests United States: ACT UP AIDS awareness movement in, 204–5 anonymous social interaction in, 174–75 antiwar movement in, 100–101, 123, 189, 190, 204, 205, 221 Black Lives Matter movement in, 154–56, 177–78, 197, 205–9, 275, 298nn23–24 capacity signaling in, 189, 190, 193–94, 196, 197, 204–18, 221–22, 274–75 civil rights movement in, 61–70, 81–82, 94, 96–99, 134, 140, 193, 197 Confederate flag debate in, 197–98 credibility denials in, 250 culture of protest in, 83, 86, 87, 91, 93–94, 95–100 Ferguson protests in, 93, 154–56, 158–60, 161–62, 206–8 government countermeasures in, 240–41, 243, 250 Hurricane Katrina in, 104 Hurricane Sandy in, 215 information inundation and disinformation in, 240–41 Occupy movement in, 38, 81, 83, 86, 87, 91, 93, 95–100, 209–16, 217, 221–22, 275, 276 platforms and algorithms in, 134, 143, 144, 152–53, 181–82 political sabotage and false news in, 254, 264–65, 266 report and takedown platform liability in, 143, 144, 181 Sanders’s campaign in, 81, 216, 275 Seattle WTO protests in, 86, 213 September 11th terrorist attacks in, 104, 213 social construction of race in, 126–27 State Department terrorist organization list in, 150 Tea Party movement in, 11–12, 216–18 Wisconsin anti-union bill protests in, 93, 210 women’s movement in, 193–94 Upworthy, 161 urban areas. See cities virtual private networks (VPNs), 221, 229–30 “vulgar” name restrictions, 145–46 “We Are All Khaled Said” Facebook page, 22–23, 139, 140–41 Weibo, 203, 232 WhatsApp: attention garnered via, 47 end-to-end encryption of, 230 Facebook acquisition of, 137 government countermeasure knowledge via, 224, 255, 257–58, 260 organizational coordination and communication via, 52 platform expansion of, 137 “real-name” policies of, 171 reputation development on, 171 Wikipedia, 219 Wisconsin anti-union bill protests, 93, 210 women’s movement, capacity signaling by, 193–94 writing, as technology: printing press and, 261–62, 263–64 public sphere impacts of, 7 WUNC (worthiness, unity, numbers, and commitment), 296n5 Yahoo, 136, 137, 153 Yerevan protests, 93 YouBeMom, 170, 171, 174–75 YouTube: attention garnered via, 270 content governance and restrictions on, 146, 147–48 demonizing, 242 platform expansion, 134 speed of news aggregation via, 29 Zapatista protests and culture, 28, 85, 94, 109–10, 277, 283n1 Zuckerberg, Mark, 140, 142
Facebook: The Inside Story
by
Steven Levy
Published 25 Feb 2020
Messenger was already well along on that path. According to Acton, “The question Mark kept raising was, If we have end-to-end encryption, are we leaving money on the table?” The problem wasn’t that the actual messages between businesses and customers would be hampered, but that Facebook itself would not be able to scan the messages to see what was in them and use the information to make a better user experience, or even to serve users better ads or add-on services. “People were questioning end-to-end encryption in terms of its business value,” says Acton. WhatsApp kept its encryption. But the conflicts about making money from WhatsApp became increasingly heated.
…
So it was entirely consistent with that mission that WhatsApp pursue a scheme to encrypt all its messages by default. Co-founder Brian Acton especially felt WhatsApp’s users should be able to communicate in a way where government eavesdroppers could never access the secrets they shared with friends, family, and business associates. In the summer of 2013, Acton began working on an end-to-end encryption model for WhatsApp. Creating a cryptosystem to protect the communications of more than a billion people, and withstand the attacks of everyone from wizardly hackers to sophisticated state intelligence agencies, is the ultimate don’t-try-this-at-home enterprise. It was a blessing when Acton connected with Moxie Marlinspike, a crypto-activist and master cryptographer who believed that encryption was core to freedom in the digital age.
…
Facebook could be slapped with fines, or, if it turned out that one of those encrypted messages involved the planning of a murderous attack, the company could be shunned, or worse. The purchase deal still had not closed when Acton informed Zuckerberg—he pointedly did not ask permission—that WhatsApp was pursuing end-to-end, and the CEO took it in with his typical inscrutable form of assent. “We were like, Mark, we are building end-to-end encryption,” says Acton. “He’s like, Okay, okay fine, you guys go ahead and do that, I don’t care.” Actually, Zuckerberg had done a considerable amount of thinking on the subject. He had been outraged in 2014 when Facebook learned, via Edward Snowden’s leaks, that the US government was snatching its communications from Facebook’s data centers.
Cult of the Dead Cow: How the Original Hacking Supergroup Might Just Save the World
by
Joseph Menn
Published 3 Jun 2019
It had realized the NSA was the enemy after Snowden documents showed the agency had been breaking into its networks overseas, where it did not need court approval. Google moved to encrypt far more deeply, even if it maintained the ability to recover all users’ emails. The two companies also fought against proposed government-mandated back doors and bans on end-to-end encryption, which by 2018 were popping up around the globe. There was still fighting to be done inside the big companies. But leading lights in the encryption fight were also spending more time helping the start-ups. Others were beginning to think more about the meaning of free speech when the immediate problem in many countries was not the inability to speak but the propensity to get drowned out by manufactured voices directed by governments and big economic forces.
…
One offshoot from those meetings, led by Slack engineer and Jake Appelbaum victim Leigh Honeywell, created a public “Never Again” pledge to oppose immoral conduct and go public if necessary, which has been signed by more than 2,800 employees. Among other things, the signatories promised to advocate against retaining data that could be used for ethnic or religious targeting and advocate for deploying end-to-end encryption. The Solidarity meetings raised money for immigrants’ lawyers and coordinated volunteer coding projects. As the 2018 midterms approached, confronted with billionaires on the other end of the spectrum spending untraceable “dark money” to push right-wing candidates, Ceglowski fought back with what he called “dork money,” funding a slate of progressive candidates around the country in districts he thought he could flip.
The Digital Party: Political Organisation and Online Democracy
by
Paolo Gerbaudo
Published 19 Jul 2018
Voting operations are supported by a software called Nvotes, previously known as Agora Voting, which self-describes as ‘a project for open source, cryptographically secure voting’, developed by the company Agora Voting, which offers a ‘block-chain-based digital voting solution for governments and organizations’.250 Significantly, some of the Agora Voting team, involving David Ruescas, Eduardo Robles and Lucas Cervera, hailed from hacker groups that were involved in the 15-M. The programme is written in Shell, with some components in Ruby on Rails, Python and Javascript. The main claim of this system is that it is tamper-proof, as it is based on mix-net and end-to-end encryption of voting operations, ensuring strong security. This system has been used several times for a number of purposes, including online primaries, election to party positions (e.g. secretary general) and internal referenda on policy and strategy. Agora voting services include the external verification of online votes, a function that has only been rarely used by the Five Star Movement.
…
More generally, some people, including the German Computer Chaos Club and free software activist Richard Stallman, question the very desirability of online democracy because it does not guarantee the same anonymity that is available with physical ballots. Although the more radical cypherpunk-oriented people will continue to distrust digital democracy, end-to-end encryption systems, such as the one used by Agora Voting, provide a reasonable degree of security against these risks. To ensure more security, some of these formations have also introduced two-step verification, whereby in order to vote, users have to enter a one-time password sent via a short message service (SMS), in the same way used for double-step verification on Gmail, Twitter or Telegram.
Dark Mirror: Edward Snowden and the Surveillance State
by
Barton Gellman
Published 20 May 2020
Sanger and Nicole Perlroth, “Internet Giants Erect Barriers to Spy Agencies,” New York Times, June 6, 2014, www.nytimes.com/2014/06/07/technology/internet-giants-erect-barriers-to-spy-agencies.html. End-to-End: Source code for the encryption library, which has yet to be released in final form, is at https://github.com/google/end-to-end. Google’s announcement may be found at “Making End-to-End Encryption Easier to Use,” Google Security Blog, June 3, 2014, https://security.googleblog.com/2014/06/making-end-to-end-encryption-easier-to.html. comment embedded in the source code: Brittany A. Roston, “Google Takes a Dig at NSA with Easter Egg,” SlashGear, June 4, 2014, www.slashgear.com/google-takes-a-dig-at-nsa-with-easter-egg-04332176/. “He’s already said”: See “NSA Speaks Out on Snowden, Spying,” CBS News, December 15, 2013, transcript at https://cbsn.ws/2P4ZkfI.
…
. § 798, “Disclosure of classified information,” 101–2 election, U.S., of 2016, 322 Electronic Frontier Foundation, 7, 65, 365 Ellard, George, 247, 276, 323 Ellsberg, Daniel: ES compared with, 295–96 espionage charge against, 288 ES’s online conversation with, 289–95 Guardian piece on ES by, 290 lifelong preoccupation with whistleblowers of, 294–95 on NSA’s blackmail capability, 290 Pentagon Papers revealed by, 288, 380 prosecution of, 308 on rarity of whistleblowers, 295 Emo Cat, 191–92, 195 encryption, see cryptography End-to-End encryption, 352 Energy Department, U.S., 39 EPICSHELTER (proposed backup and recovery system), 59–60, 61 espionage: breakdown of distinction between foreign and domestic, xii, 338–39 news leaks vs., 275–76 see also surveillance Espionage Act (1917), 96, 99, 101, 261, 275, 308, 381 ES on, 292 Executive Order 12333, 84, 282, 287, 302, 338, 410 impact on U.S. persons of, 315–16, 317, 318, 335–36 see also surveillance, foreign Executive Order 13526, 265 Expeditionary Access Operations, see S3283 Facebook, 112 illegal spying by, 198 Face the Nation, BG’s appearance on, 229–30 FASCIA II, 172 Federal Bureau of Investigation (FBI), 14, 75, 86 culture of, 205 encryption technology as hindrance to, 312 illegal surveillance by, 180–81 Verizon metadata collected by, 142 Federal Trade Commission, 198 Felten, Ed, 232–33 on secrets revealed by metadata, 162–63 Firefox, Tor Browser Bundle of, 80 First Amendment, of U.S.
Future Politics: Living Together in a World Transformed by Tech
by
Jamie Susskind
Published 3 Sep 2018
‘Increasingly,’ as William Mitchell put it, ‘we can do unto others at a distance, and they can do unto us.’61 Cryptography offers some protection. The political function of cryptographic methods will be one of the most important political issues of the digital lifeworld. Interestingly, the recent trend has been toward greater encryption of digital platforms. To take a well-known example, the messages you send on WhatsApp are now ‘end-to-end’ encrypted, meaning they can’t easily be intercepted or inspected either by WhatsApp or any other third party. Increased encryption by social media and news platforms has made it harder for authoritarian regimes to selectively filter the flow of information. Previously, they could remove individual ‘accounts, web pages, and stories’.62 Increasingly, encryption has forced them to choose between blocking the entire platform or none of it.
…
For every plucky dissident or journalist who uses cryptography as a shield against tyranny, there’s a terrorist organization, human-trafficking syndicate, drug OUP CORRECTED PROOF – FINAL, 30/05/18, SPi РЕЛИЗ ПОДГОТОВИЛА ГРУППА "What's News" VK.COM/WSNWS 184 FUTURE POLITICS cartel, or fraudster who uses it to conceal criminality. End-to-end encryption naturally causes concern in the intelligence community because it makes it harder for state agencies to detect terrorist plotting. It’s plainly not in the interests of liberty for dangerous groups to flourish unmolested. And there’s a risk that private encryption will encourage states to develop their own ‘home-grown platforms’ that can be more easily controlled.
…
Iran has developed its own version of YouTube, and Turkey is making its own Turkish search engine and email platform.64 A recent study by Harvard’s Berkman Klein Center for Internet and Society predicted that encryption probably won’t become a ubiquitous feature of technology in the future, mainly because businesses themselves will want to be able to retain easy access to the platforms that we use, partly for commercial reasons (that is, data harvesting) and partly because excessive encryption can make it harder to detect and correct problems.65 Even the powerful can’t agree on the politics of encryption. In late 2017, the British government indicated that it might crack down on end-to-end encryption, while, in complete contrast, the European Parliament was mulling a prohibition on member states securing ‘backdoor access’ to encrypted technologies.66 To preserve our liberty there will have to be a balance: individuals, corporations, and governments all have competing and overlapping priorities, but at their heart should be freedom of thought, freedom of action, and freedom of community.
News and How to Use It: What to Believe in a Fake News World
by
Alan Rusbridger
Published 26 Nov 2020
If the state can find good reason to eavesdrop on a confessional, or legislator’s surgery, or medical consultation or legal conference, it will. And if it can follow a journalist around to see who they’re talking to, then why not? Why not trawl through oceans of metadata to see who’s been speaking to whom? All kinds of countermeasures are imagined or trialled. Secure drop boxes for the safe depositing of secrets. End-to-end encryption for messages. Apps that train your phone to detect intrusion. Secure email keys. Reporters are turned into cryptographers. Many of these new shields would foil most hackers, if not a determinedly curious intelligence agency. But sources are undoubtedly discouraged. Why risk everything to help good information triumph over bad?
…
But those days are gone. A nosy government or intelligence agency or police officer ought, in theory, to have to follow due process. But nearly all our actions now leave digital fingerprints, and it has become harder and harder for any journalist to talk to a source in a truly confidential way. End-to-end encryption helps – but perhaps not if the spooks or the cops are determinedly involved. The camera on your laptop? The location tracking apps on your phone? Your phone itself? How difficult is it to triangulate the phones and phone masts near you in that supposedly discreet greasy spoon where you agreed to meet your hush-hush source?
Surveillance Valley: The Rise of the Military-Digital Complex
by
Yasha Levine
Published 6 Feb 2018
“Now that ARPA has demonstrated the feasibility of distributed network operation, the Defense Communications Agency has ordered three of ARPA’s Interface Message Processors so that a prototype ARPANET operation can be established between three of the WWMCCS computers. When this is completed and the network operation has been proven as operationally effective, the WWMCCS network will be expanded to worldwide operation,” he said. Ibid. 57. ARPA began developing end-to-end encryption over the ARPANET in 1976. As a report to the Senate explains, this was superior because it did not require physically securing the lines and allowed a general-purpose defense network to be used for both classified and nonclassified tasks. “Department of Defense Appropriations for FY78 Part 5: Research, Development, Test, and Evaluation,” Hearings, Subcommittee of the Comm. on Appropriations, 95th Cong., 1st sess.
…
Edward Snowden (@Snowden), Twitter post, September 21, 2016, 6:50 a.m., https://twitter.com/snowden/status/778592275144314884; Edward Snowden (@Snowden), Twitter post, November 2, 2015, 2:46 p.m., https://twitter.com/snowden/status/661313394906161152. 130. Edward Snowden (@Snowden), Twitter post, October 12, 2015, 5:05 p.m., https://twitter.com/snowden/status/653723172953583617?lang=en. 131. moxie, “Open Whisper Systems Partners with Google on End-to-End Encryption for Allo,” Open Whisper Systems (blog), May 18, 2016, https://whisper systems.org/blog/allo/. Google also entered into a partnership with the Open Technology Fund on a privacy project called Simply Secure, which promised to make privacy and encryption easy to use for even the most technically incompetent user. 132. 2014 Annual Report (Washington, DC: Open Technology Fund, 2014), https://www.opentech.fund/sites/default/files/attachments/otf_fy2014_annualreport.pdf.
Money, Real Quick: The Story of M-PESA
by
Tonny K. Omwansa
,
Nicholas P. Sullivan
and
The Guardian
Published 28 Feb 2012
When customers wanted their money, they went to an agent and cashed out. The Central Bank had also engaged Consult Hyperion, an IT consultancy firm, to conduct an operational risk audit, according to a case study by the Alliance for Financial Inclusion, an international group of Central Bankers and financial regulators. Consult Hyperion tested the end-to-end encryption of the SIM card functionality, which held all of the confidential customer data; reviewed the use of hardware security modules at the M-PESA servers; and ensured that all business processes had embedded security procedures, including live backup. The consultants also checked to ensure that all of the M-PESA systems allowed for comprehensive reporting and management so every transaction could be monitored, individually and en masse.
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by
Martin Kleppmann
Published 16 Mar 2017
A similar argument applies with encryption [55]: the password on your home WiFi network protects against people snooping your WiFi traffic, but not against attackers elsewhere on the internet; TLS/SSL between your client and the server protects against network attackers, but not against compromises of the server. Only end-to-end encryption and authentication can protect against all of these things. Although the low-level features (TCP duplicate suppression, Ethernet checksums, WiFi encryption) cannot provide the desired end-to-end features by themselves, they are still useful, since they reduce the probability of problems at the higher levels.
…
Merkle: “A Digital Signature Based on a Conventional Encryption Function,” at CRYPTO ’87, August 1987. doi:10.1007/3-540-48184-2_32 [75] Ben Laurie: “Certificate Transparency,” ACM Queue, volume 12, number 8, pages 10-19, August 2014. doi:10.1145/2668152.2668154 [76] Mark D. Ryan: “Enhanced Certificate Transparency and End-to-End Encrypted Mail,” at Network and Distributed System Security Symposium (NDSS), February 2014. doi:10.14722/ndss.2014.23379 [77] “Software Engineering Code of Ethics and Professional Practice,” Association for Computing Machinery, acm.org, 1999. [78] François Chollet: “Software development is starting to involve important ethical choices,” twitter.com, October 30, 2016
…
avoiding conflicts, Conflict avoidance lost updates, Preventing Lost Updates-Conflict resolution and replication materializing, Materializing conflicts relation to operation ordering, Ordering Guarantees write skew (transaction isolation), Write Skew and Phantoms-Materializing conflicts congestion (networks)avoidance, Network congestion and queueing limiting accuracy of clocks, Clock readings have a confidence interval queueing delays, Network congestion and queueing consensus, Consistency and Consensus, Fault-Tolerant Consensus-Summary, Glossaryalgorithms, Consensus algorithms and total order broadcast-Epoch numbering and quorumspreventing split brain, Single-leader replication and consensus safety and liveness properties, Fault-Tolerant Consensus using linearizable operations, Implementing total order broadcast using linearizable storage cost of, Limitations of consensus distributed transactions, Distributed Transactions and Consensus-Summaryin practice, Distributed Transactions in Practice-Limitations of distributed transactions two-phase commit, Atomic Commit and Two-Phase Commit (2PC)-Three-phase commit XA transactions, XA transactions-Limitations of distributed transactions impossibility of, Distributed Transactions and Consensus membership and coordination services, Membership and Coordination Services-Membership services relation to compare-and-set, Linearizability and quorums, Implementing linearizable storage using total order broadcast, Implementing total order broadcast using linearizable storage, Summary relation to replication, Synchronous Versus Asynchronous Replication, Using total order broadcast relation to uniqueness constraints, Uniqueness constraints require consensus consistency, Consistency, Timeliness and Integrityacross different databases, Leader failure: Failover, Keeping Systems in Sync, Deriving several views from the same event log, Derived data versus distributed transactions causal, Ordering and Causality-Timestamp ordering is not sufficient, Ordering events to capture causality consistent prefix reads, Consistent Prefix Reads-Consistent Prefix Reads consistent snapshots, Setting Up New Followers, Snapshot Isolation and Repeatable Read-Repeatable read and naming confusion, Synchronized clocks for global snapshots, Initial snapshot, Creating an index(see also snapshots) crash recovery, Making B-trees reliable enforcing constraints (see constraints) eventual, Problems with Replication Lag, Consistency Guarantees(see also eventual consistency) in ACID transactions, Consistency, Maintaining integrity in the face of software bugs in CAP theorem, The CAP theorem linearizability, Linearizability-Linearizability and network delays meanings of, Consistency monotonic reads, Monotonic Reads-Monotonic Reads of secondary indexes, The need for multi-object transactions, Indexes and snapshot isolation, Atomic Commit and Two-Phase Commit (2PC), Reasoning about dataflows, Creating an index ordering guarantees, Ordering Guarantees-Implementing total order broadcast using linearizable storage read-after-write, Reading Your Own Writes-Reading Your Own Writes sequential, Implementing linearizable storage using total order broadcast strong (see linearizability) timeliness and integrity, Timeliness and Integrity using quorums, Limitations of Quorum Consistency, Linearizability and quorums consistent hashing, Partitioning by Hash of Key consistent prefix reads, Consistent Prefix Reads constraints (databases), Consistency, Characterizing write skewasynchronously checked, Loosely interpreted constraints coordination avoidance, Coordination-avoiding data systems ensuring idempotence, Operation identifiers in log-based systems, Enforcing Constraints-Multi-partition request processingacross multiple partitions, Multi-partition request processing in two-phase commit, From single-node to distributed atomic commit, A system of promises relation to consensus, Summary, Uniqueness constraints require consensus relation to event ordering, Timestamp ordering is not sufficient requiring linearizability, Constraints and uniqueness guarantees Consul (service discovery), Service discovery consumers (message streams), Message brokers, Transmitting Event Streamsbackpressure, Messaging Systems consumer offsets in logs, Consumer offsets failures, Acknowledgments and redelivery, Consumer offsets fan-out, Describing Load, Multiple consumers, Logs compared to traditional messaging load balancing, Multiple consumers, Logs compared to traditional messaging not keeping up with producers, Messaging Systems, Disk space usage, Making unbundling work context switches, Describing Performance, Process Pauses convergence (conflict resolution), Converging toward a consistent state-Custom conflict resolution logic, Consistency Guarantees coordinationavoidance, Coordination-avoiding data systems cross-datacenter, Multi-datacenter operation, The limits of total ordering cross-partition ordering, Partitioning, Synchronized clocks for global snapshots, Total Order Broadcast, Multi-partition request processing services, Locking and leader election, Membership and Coordination Services-Membership services coordinator (in 2PC), Introduction to two-phase commitfailure, Coordinator failure in XA transactions, XA transactions-Limitations of distributed transactions recovery, Recovering from coordinator failure copy-on-write (B-trees), B-tree optimizations, Indexes and snapshot isolation CORBA (Common Object Request Broker Architecture), The problems with remote procedure calls (RPCs) correctness, Thinking About Data Systemsauditability, Trust, but Verify-Tools for auditable data systems Byzantine fault tolerance, Byzantine Faults, Tools for auditable data systems dealing with partial failures, Faults and Partial Failures in log-based systems, Enforcing Constraints-Multi-partition request processing of algorithm within system model, Correctness of an algorithm of compensating transactions, From single-node to distributed atomic commit of consensus, Epoch numbering and quorums of derived data, The lambda architecture, Designing for auditability of immutable data, Advantages of immutable events of personal data, Responsibility and accountability, Privacy and use of data of time, Multi-Leader Replication Topologies, Clock Synchronization and Accuracy-Synchronized clocks for global snapshots of transactions, Consistency, Aiming for Correctness, Maintaining integrity in the face of software bugs timeliness and integrity, Timeliness and Integrity-Coordination-avoiding data systems corruption of datadetecting, The end-to-end argument, Don’t just blindly trust what they promise-Tools for auditable data systems due to pathological memory access, Trust, but Verify due to radiation, Byzantine Faults due to split brain, Leader failure: Failover, The leader and the lock due to weak transaction isolation, Weak Isolation Levels formalization in consensus, Consensus algorithms and total order broadcast integrity as absence of, Timeliness and Integrity network packets, Weak forms of lying on disks, Durability preventing using write-ahead logs, Making B-trees reliable recovering from, Philosophy of batch process outputs, Advantages of immutable events Couchbase (database)durability, Keeping everything in memory hash partitioning, Partitioning by Hash of Key-Partitioning by Hash of Key, Fixed number of partitions rebalancing, Operations: Automatic or Manual Rebalancing request routing, Request Routing CouchDB (database)B-tree storage, Indexes and snapshot isolation change feed, API support for change streams document data model, The Object-Relational Mismatch join support, Many-to-One and Many-to-Many Relationships MapReduce support, MapReduce Querying, Distributed execution of MapReduce replication, Clients with offline operation, Custom conflict resolution logic covering indexes, Storing values within the index CPUscache coherence and memory barriers, Linearizability and network delays caching and pipelining, Memory bandwidth and vectorized processing, The move toward declarative query languages increasing parallelism, Query Languages for Data CRDTs (see conflict-free replicated datatypes) CREATE INDEX statement (SQL), Other Indexing Structures, Creating an index credit rating agencies, Responsibility and accountability Crunch (batch processing), Beyond MapReduce, High-Level APIs and Languageshash joins, Broadcast hash joins sharded joins, Handling skew workflows, MapReduce workflows cryptographydefense against attackers, Byzantine Faults end-to-end encryption and authentication, The end-to-end argument, Legislation and self-regulation proving integrity of data, Tools for auditable data systems CSS (Cascading Style Sheets), Declarative Queries on the Web CSV (comma-separated values), Data Structures That Power Your Database, JSON, XML, and Binary Variants, A uniform interface Curator (ZooKeeper recipes), Locking and leader election, Allocating work to nodes curl (Unix tool), Current directions for RPC, Separation of logic and wiring cursor stability, Atomic write operations Cypher (query language), The Cypher Query Languagecomparison to SPARQL, The SPARQL query language D data corruption (see corruption of data) data cubes, Aggregation: Data Cubes and Materialized Views data formats (see encoding) data integration, Data Integration-Unifying batch and stream processing, Summarybatch and stream processing, Batch and Stream Processing-Unifying batch and stream processinglambda architecture, The lambda architecture maintaining derived state, Maintaining derived state reprocessing data, Reprocessing data for application evolution unifying, Unifying batch and stream processing by unbundling databases, Unbundling Databases-Multi-partition data processingcomparison to federated databases, The meta-database of everything combining tools by deriving data, Combining Specialized Tools by Deriving Data-Ordering events to capture causalityderived data versus distributed transactions, Derived data versus distributed transactions limits of total ordering, The limits of total ordering ordering events to capture causality, Ordering events to capture causality reasoning about dataflows, Reasoning about dataflows need for, Derived Data data lakes, Diversity of storage data locality (see locality) data models, Data Models and Query Languages-Summarygraph-like models, Graph-Like Data Models-The Foundation: DatalogDatalog language, The Foundation: Datalog-The Foundation: Datalog property graphs, Property Graphs RDF and triple-stores, Triple-Stores and SPARQL-The SPARQL query language query languages, Query Languages for Data-MapReduce Querying relational model versus document model, Relational Model Versus Document Model-Convergence of document and relational databases data protection regulations, Legislation and self-regulation data systems, Reliable, Scalable, and Maintainable Applicationsabout, Thinking About Data Systems concerns when designing, Thinking About Data Systems future of, The Future of Data Systems-Summarycorrectness, constraints, and integrity, Aiming for Correctness-Tools for auditable data systems data integration, Data Integration-Unifying batch and stream processing unbundling databases, Unbundling Databases-Multi-partition data processing heterogeneous, keeping in sync, Keeping Systems in Sync maintainability, Maintainability-Evolvability: Making Change Easy possible faults in, Transactions reliability, Reliability-How Important Is Reliability?
Data and Goliath: The Hidden Battles to Collect Your Data and Control Your World
by
Bruce Schneier
Published 2 Mar 2015
a chat encryption program: Nikita Borisov, Ian Goldberg, and Eric Brewer (28 Oct 2004), “Off-the-record communication, or, Why not to use PGP,” ACM Workshop on Privacy in the Electronic Society (WPES’04), Washington, D.C., https://otr.cypherpunks.ca/otr-wpes.pdf. Google is now offering encrypted e-mail: Stephan Somogyi (3 Jun 2014), “Making end-to-end encryption easier to use,” Google Online Security Blog, http://googleonlinesecurity.blogspot.com/2014/06/making-end-to-end-encryption-easier-to.html. TLS—formerly SSL—is a protocol: Tim Dierks and Eric Rescorla (17 Apr 2014), “The Transport Layer Security (TLS) Protocol Version 1.3,” Internet Engineering Task Force Trust, Network Working Group, http://tools.ietf.org/html/draft-ietf-tls-rfc5246-bis-00.
Ours to Hack and to Own: The Rise of Platform Cooperativism, a New Vision for the Future of Work and a Fairer Internet
by
Trebor Scholz
and
Nathan Schneider
Published 14 Aug 2017
Which approaches to privacy design are appropriate for the different practices? This is the big one. Within it, consider the three approaches that follow. The privacy-as-confidentiality approach offers publicly vetted techniques that will allow platforms to minimize data collection and avoid single points of failure. Offering members end-to-end encrypted private messaging, for example, provides the co-op a protection against coercion or unreasonable search-and-seizure requests from law enforcement. Techniques based on encryption and secure protocols may be indispensable for guaranteeing that platform administrators cannot single-handedly compromise sensitive information related to practices such as voting, anonymous participation, and reputation systems.
Whistleblower: My Journey to Silicon Valley and Fight for Justice at Uber
by
Susan Fowler
Published 18 Feb 2020
I kept my mouth shut. “No,” I answered, shaking my head. * * * — In the weeks after my meeting with Holder and Albarrán, one by one my old friends from Uber stopped talking to me. One of them told me later that Uber had found out she was talking to me—even though we’d been careful to use end-to-end encryption and send messages that self-destructed—and she was afraid they were going to retaliate against her for it. I had difficulty connecting with many of my friends outside Uber, too. Shalon was one of the few people who understood, who didn’t look at me like I was crazy when I tried to explain things that were going on, like being followed or having my social media accounts hacked.
This Is How They Tell Me the World Ends: The Cyberweapons Arms Race
by
Nicole Perlroth
Published 9 Feb 2021
He went on 60 Minutes and, in an hour-long at the Brookings Institution, told an audience that the “post-Snowden pendulum” had “swung too far,” noting that Apple’s new encryption “threatens to lead us to a very dark place.” Comey was essentially making the same arguments the White House had two decades earlier after a programmer named Phil Zimmermann released end-to-end encryption software to the masses. Zimmermann’s Pretty Good Privacy (PGP) software made it far easier for people to communicate over end-to-end encryption, which scrambles messages in such a way that they could only be deciphered by the sender and recipient. Fearing that PGP would make surveillance impossible, the Clinton administration proposed a “Clipper Chip,” a backdoor for law enforcement and security agencies.
Uncanny Valley: A Memoir
by
Anna Wiener
Published 14 Jan 2020
“Email is about as secure as a postcard,” he’d remind me, as we wandered between families at the farmers market in Fort Greene Park. “You don’t expect your mailman to read it, but he could.” I had listened patiently as he tried to teach me about cryptocurrencies and the promise of the blockchain, the shortcomings of two-factor authentication, the necessity of end-to-end encryption, the inevitability of data breaches. The romance didn’t last, but in its wake we had fallen into a rhythm of exchanging insecure emails on niche topics, like 1980s interface design, binary code, and public-domain art, and occasionally meeting for chaste, geriatric cultural activities. The concert hall was a quarter full.
Culture Warlords: My Journey Into the Dark Web of White Supremacy
by
Talia Lavin
Published 14 Jul 2020
I utilized a fake phone number generated by an app to obscure my own identity still further. A Southern Poverty Law Center report, published on June 27, 2019, revealed that the neo-Nazi website the Daily Stormer had warned its fans in August 2018 that the “SPLC Is Monitoring You” on the chat app Discord, and lauded Telegram’s end-to-end encryption as an alternative for white nationalists. The SPLC report added that Telegram posed particular dangers, compared to message boards like 8chan: On the app, “extremists can connect in channels that post publicly facing propaganda and then organize privately on the same app by using its encrypted chat feature, where plans to commit acts of terror can go undetected by law enforcement agencies.”
The Dark Net
by
Jamie Bartlett
Published 20 Aug 2014
Jitsi is a free, secure, open-source voice, video-conferencing and instant-messaging application which started as a student project at the University of Strasbourg. Jabber, another instant-messaging service, is encrypted with industry-standard Secure Sockets Layer, run by volunteers and physically hosted in a secure data centre. Phil Zimmermann is currently working on a project called Darkmail, an automatically end-to-end encrypted email service. Today there are hundreds of people like Amir and Miguel working on ingenious ways of keeping online secrets or preventing censorship, often in their own time, and frequently crowdfunded by users sympathetic to the cause. One is Smári McCarthy. Smári is unashamedly geeky: a computer whizz and founding member of the radical Icelandic Pirate Party.
There's a War Going on but No One Can See It
by
Huib Modderkolk
Published 1 Sep 2021
We bought a couple of refurbished second-hand machines, to be connected to the internet only in emergencies. Either we sent emails encrypted, or not at all. And we chatted over secure channels, which meant sticking a USB drive in the computer, launching a separate operating system and then opening a browser to shield our unique IP addresses. From there, we used a chat program with end-to-end encryption. This, we hoped, would let us get closer to the right people. * From the Netherlands we took the autobahn to Hamburg, where Der Spiegel occupies a vast, shimmering edifice on a branch of the Norderelbe. The German newspaper had been working with Laura Poitras, an American documentary filmmaker.
@War: The Rise of the Military-Internet Complex
by
Shane Harris
Published 14 Sep 2014
Google, like many other large companies that were frequent targets of hackers, had its own sources of threat intelligence from private security companies—such as Endgame, which sells zero day information—and had begun its own intelligence-gathering operations on hackers in China. But the company was also using other tactics, such as implementing stronger encryption for its users, and moving toward a “secure sockets layer” service that would set end-to-end encryption by default for everyone logged in to their Google account. Threat signatures alone “don’t work anymore,” Schmidt said. “The threats don’t just come where the NSA points its sensors.” Hackers were constantly changing their techniques and looking for new points of entry. They knew that the government was monitoring them—that’s why they changed up their tactics.
Black Code: Inside the Battle for Cyberspace
by
Ronald J. Deibert
Published 13 May 2013
From that moment on, their chats were intercepted, as were those with whom they were communicating, and uploaded to a server in China, presumably to be shared with Chinese security services. The interception directly contravened Skype’s explicit terms of service, which promised state-of-the-art “end-to-end encryption,” allowing it to be widely promoted as a secure tool for dissidents and others at risk. The scandalous tale was covered by John Markoff in the New York Times, and Skype later apologized. A few years later, however, University of New Mexico researchers found the exact same content-filtering and interception system was still in place on TOM-Skype.
Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems
by
Martin Kleppmann
Published 17 Apr 2017
Merkle: “A Digital Signature Based on a Conventional Encryption Function,” at CRYPTO ’87, August 1987. doi:10.1007/3-540-48184-2_32 [75] Ben Laurie: “Certificate Transparency,” ACM Queue, volume 12, number 8, pages 10-19, August 2014. doi:10.1145/2668152.2668154 Summary | 549 [76] Mark D. Ryan: “Enhanced Certificate Transparency and End-to-End Encrypted Mail,” at Network and Distributed System Security Symposium (NDSS), February 2014. doi:10.14722/ndss.2014.23379 [77] “Software Engineering Code of Ethics and Professional Practice,” Association for Computing Machinery, acm.org, 1999. [78] François Chollet: “Software development is starting to involve important ethical choices,” twitter.com, October 30, 2016. [79] Igor Perisic: “Making Hard Choices: The Quest for Ethics in Machine Learning,” engineering.linkedin.com, November 2016. [80] John Naughton: “Algorithm Writers Need a Code of Conduct,” theguar‐ dian.com, December 6, 2015. [81] Logan Kugler: “What Happens When Big Data Blunders?
…
The opposite of bounded. 558 | Glossary Index A aborts (transactions), 222, 224 in two-phase commit, 356 performance of optimistic concurrency con‐ trol, 266 retrying aborted transactions, 231 abstraction, 21, 27, 222, 266, 321 access path (in network model), 37, 60 accidental complexity, removing, 21 accountability, 535 ACID properties (transactions), 90, 223 atomicity, 223, 228 consistency, 224, 529 durability, 226 isolation, 225, 228 acknowledgements (messaging), 445 active/active replication (see multi-leader repli‐ cation) active/passive replication (see leader-based rep‐ lication) ActiveMQ (messaging), 137, 444 distributed transaction support, 361 ActiveRecord (object-relational mapper), 30, 232 actor model, 138 (see also message-passing) comparison to Pregel model, 425 comparison to stream processing, 468 Advanced Message Queuing Protocol (see AMQP) aerospace systems, 6, 10, 305, 372 aggregation data cubes and materialized views, 101 in batch processes, 406 in stream processes, 466 aggregation pipeline query language, 48 Agile, 22 minimizing irreversibility, 414, 497 moving faster with confidence, 532 Unix philosophy, 394 agreement, 365 (see also consensus) Airflow (workflow scheduler), 402 Ajax, 131 Akka (actor framework), 139 algorithms algorithm correctness, 308 B-trees, 79-83 for distributed systems, 306 hash indexes, 72-75 mergesort, 76, 402, 405 red-black trees, 78 SSTables and LSM-trees, 76-79 all-to-all replication topologies, 175 AllegroGraph (database), 50 ALTER TABLE statement (SQL), 40, 111 Amazon Dynamo (database), 177 Amazon Web Services (AWS), 8 Kinesis Streams (messaging), 448 network reliability, 279 postmortems, 9 RedShift (database), 93 S3 (object storage), 398 checking data integrity, 530 amplification of bias, 534 of failures, 364, 495 Index | 559 of tail latency, 16, 207 write amplification, 84 AMQP (Advanced Message Queuing Protocol), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 message ordering, 446 analytics, 90 comparison to transaction processing, 91 data warehousing (see data warehousing) parallel query execution in MPP databases, 415 predictive (see predictive analytics) relation to batch processing, 411 schemas for, 93-95 snapshot isolation for queries, 238 stream analytics, 466 using MapReduce, analysis of user activity events (example), 404 anti-caching (in-memory databases), 89 anti-entropy, 178 Apache ActiveMQ (see ActiveMQ) Apache Avro (see Avro) Apache Beam (see Beam) Apache BookKeeper (see BookKeeper) Apache Cassandra (see Cassandra) Apache CouchDB (see CouchDB) Apache Curator (see Curator) Apache Drill (see Drill) Apache Flink (see Flink) Apache Giraph (see Giraph) Apache Hadoop (see Hadoop) Apache HAWQ (see HAWQ) Apache HBase (see HBase) Apache Helix (see Helix) Apache Hive (see Hive) Apache Impala (see Impala) Apache Jena (see Jena) Apache Kafka (see Kafka) Apache Lucene (see Lucene) Apache MADlib (see MADlib) Apache Mahout (see Mahout) Apache Oozie (see Oozie) Apache Parquet (see Parquet) Apache Qpid (see Qpid) Apache Samza (see Samza) Apache Solr (see Solr) Apache Spark (see Spark) 560 | Index Apache Storm (see Storm) Apache Tajo (see Tajo) Apache Tez (see Tez) Apache Thrift (see Thrift) Apache ZooKeeper (see ZooKeeper) Apama (stream analytics), 466 append-only B-trees, 82, 242 append-only files (see logs) Application Programming Interfaces (APIs), 5, 27 for batch processing, 403 for change streams, 456 for distributed transactions, 361 for graph processing, 425 for services, 131-136 (see also services) evolvability, 136 RESTful, 133 SOAP, 133 application state (see state) approximate search (see similarity search) archival storage, data from databases, 131 arcs (see edges) arithmetic mean, 14 ASCII text, 119, 395 ASN.1 (schema language), 127 asynchronous networks, 278, 553 comparison to synchronous networks, 284 formal model, 307 asynchronous replication, 154, 553 conflict detection, 172 data loss on failover, 157 reads from asynchronous follower, 162 Asynchronous Transfer Mode (ATM), 285 atomic broadcast (see total order broadcast) atomic clocks (caesium clocks), 294, 295 (see also clocks) atomicity (concurrency), 553 atomic increment-and-get, 351 compare-and-set, 245, 327 (see also compare-and-set operations) replicated operations, 246 write operations, 243 atomicity (transactions), 223, 228, 553 atomic commit, 353 avoiding, 523, 528 blocking and nonblocking, 359 in stream processing, 360, 477 maintaining derived data, 453 for multi-object transactions, 229 for single-object writes, 230 auditability, 528-533 designing for, 531 self-auditing systems, 530 through immutability, 460 tools for auditable data systems, 532 availability, 8 (see also fault tolerance) in CAP theorem, 337 in service level agreements (SLAs), 15 Avro (data format), 122-127 code generation, 127 dynamically generated schemas, 126 object container files, 125, 131, 414 reader determining writer’s schema, 125 schema evolution, 123 use in Hadoop, 414 awk (Unix tool), 391 AWS (see Amazon Web Services) Azure (see Microsoft) B B-trees (indexes), 79-83 append-only/copy-on-write variants, 82, 242 branching factor, 81 comparison to LSM-trees, 83-85 crash recovery, 82 growing by splitting a page, 81 optimizations, 82 similarity to dynamic partitioning, 212 backpressure, 441, 553 in TCP, 282 backups database snapshot for replication, 156 integrity of, 530 snapshot isolation for, 238 use for ETL processes, 405 backward compatibility, 112 BASE, contrast to ACID, 223 bash shell (Unix), 70, 395, 503 batch processing, 28, 389-431, 553 combining with stream processing lambda architecture, 497 unifying technologies, 498 comparison to MPP databases, 414-418 comparison to stream processing, 464 comparison to Unix, 413-414 dataflow engines, 421-423 fault tolerance, 406, 414, 422, 442 for data integration, 494-498 graphs and iterative processing, 424-426 high-level APIs and languages, 403, 426-429 log-based messaging and, 451 maintaining derived state, 495 MapReduce and distributed filesystems, 397-413 (see also MapReduce) measuring performance, 13, 390 outputs, 411-413 key-value stores, 412 search indexes, 411 using Unix tools (example), 391-394 Bayou (database), 522 Beam (dataflow library), 498 bias, 534 big ball of mud, 20 Bigtable data model, 41, 99 binary data encodings, 115-128 Avro, 122-127 MessagePack, 116-117 Thrift and Protocol Buffers, 117-121 binary encoding based on schemas, 127 by network drivers, 128 binary strings, lack of support in JSON and XML, 114 BinaryProtocol encoding (Thrift), 118 Bitcask (storage engine), 72 crash recovery, 74 Bitcoin (cryptocurrency), 532 Byzantine fault tolerance, 305 concurrency bugs in exchanges, 233 bitmap indexes, 97 blockchains, 532 Byzantine fault tolerance, 305 blocking atomic commit, 359 Bloom (programming language), 504 Bloom filter (algorithm), 79, 466 BookKeeper (replicated log), 372 Bottled Water (change data capture), 455 bounded datasets, 430, 439, 553 (see also batch processing) bounded delays, 553 in networks, 285 process pauses, 298 broadcast hash joins, 409 Index | 561 brokerless messaging, 442 Brubeck (metrics aggregator), 442 BTM (transaction coordinator), 356 bulk synchronous parallel (BSP) model, 425 bursty network traffic patterns, 285 business data processing, 28, 90, 390 byte sequence, encoding data in, 112 Byzantine faults, 304-306, 307, 553 Byzantine fault-tolerant systems, 305, 532 Byzantine Generals Problem, 304 consensus algorithms and, 366 C caches, 89, 553 and materialized views, 101 as derived data, 386, 499-504 database as cache of transaction log, 460 in CPUs, 99, 338, 428 invalidation and maintenance, 452, 467 linearizability, 324 CAP theorem, 336-338, 554 Cascading (batch processing), 419, 427 hash joins, 409 workflows, 403 cascading failures, 9, 214, 281 Cascalog (batch processing), 60 Cassandra (database) column-family data model, 41, 99 compaction strategy, 79 compound primary key, 204 gossip protocol, 216 hash partitioning, 203-205 last-write-wins conflict resolution, 186, 292 leaderless replication, 177 linearizability, lack of, 335 log-structured storage, 78 multi-datacenter support, 184 partitioning scheme, 213 secondary indexes, 207 sloppy quorums, 184 cat (Unix tool), 391 causal context, 191 (see also causal dependencies) causal dependencies, 186-191 capturing, 191, 342, 494, 514 by total ordering, 493 causal ordering, 339 in transactions, 262 sending message to friends (example), 494 562 | Index causality, 554 causal ordering, 339-343 linearizability and, 342 total order consistent with, 344, 345 consistency with, 344-347 consistent snapshots, 340 happens-before relationship, 186 in serializable transactions, 262-265 mismatch with clocks, 292 ordering events to capture, 493 violations of, 165, 176, 292, 340 with synchronized clocks, 294 CEP (see complex event processing) certificate transparency, 532 chain replication, 155 linearizable reads, 351 change data capture, 160, 454 API support for change streams, 456 comparison to event sourcing, 457 implementing, 454 initial snapshot, 455 log compaction, 456 changelogs, 460 change data capture, 454 for operator state, 479 generating with triggers, 455 in stream joins, 474 log compaction, 456 maintaining derived state, 452 Chaos Monkey, 7, 280 checkpointing in batch processors, 422, 426 in high-performance computing, 275 in stream processors, 477, 523 chronicle data model, 458 circuit-switched networks, 284 circular buffers, 450 circular replication topologies, 175 clickstream data, analysis of, 404 clients calling services, 131 pushing state changes to, 512 request routing, 214 stateful and offline-capable, 170, 511 clocks, 287-299 atomic (caesium) clocks, 294, 295 confidence interval, 293-295 for global snapshots, 294 logical (see logical clocks) skew, 291-294, 334 slewing, 289 synchronization and accuracy, 289-291 synchronization using GPS, 287, 290, 294, 295 time-of-day versus monotonic clocks, 288 timestamping events, 471 cloud computing, 146, 275 need for service discovery, 372 network glitches, 279 shared resources, 284 single-machine reliability, 8 Cloudera Impala (see Impala) clustered indexes, 86 CODASYL model, 36 (see also network model) code generation with Avro, 127 with Thrift and Protocol Buffers, 118 with WSDL, 133 collaborative editing multi-leader replication and, 170 column families (Bigtable), 41, 99 column-oriented storage, 95-101 column compression, 97 distinction between column families and, 99 in batch processors, 428 Parquet, 96, 131, 414 sort order in, 99-100 vectorized processing, 99, 428 writing to, 101 comma-separated values (see CSV) command query responsibility segregation (CQRS), 462 commands (event sourcing), 459 commits (transactions), 222 atomic commit, 354-355 (see also atomicity; transactions) read committed isolation, 234 three-phase commit (3PC), 359 two-phase commit (2PC), 355-359 commutative operations, 246 compaction of changelogs, 456 (see also log compaction) for stream operator state, 479 of log-structured storage, 73 issues with, 84 size-tiered and leveled approaches, 79 CompactProtocol encoding (Thrift), 119 compare-and-set operations, 245, 327 implementing locks, 370 implementing uniqueness constraints, 331 implementing with total order broadcast, 350 relation to consensus, 335, 350, 352, 374 relation to transactions, 230 compatibility, 112, 128 calling services, 136 properties of encoding formats, 139 using databases, 129-131 using message-passing, 138 compensating transactions, 355, 461, 526 complex event processing (CEP), 465 complexity distilling in theoretical models, 310 hiding using abstraction, 27 of software systems, managing, 20 composing data systems (see unbundling data‐ bases) compute-intensive applications, 3, 275 concatenated indexes, 87 in Cassandra, 204 Concord (stream processor), 466 concurrency actor programming model, 138, 468 (see also message-passing) bugs from weak transaction isolation, 233 conflict resolution, 171, 174 detecting concurrent writes, 184-191 dual writes, problems with, 453 happens-before relationship, 186 in replicated systems, 161-191, 324-338 lost updates, 243 multi-version concurrency control (MVCC), 239 optimistic concurrency control, 261 ordering of operations, 326, 341 reducing, through event logs, 351, 462, 507 time and relativity, 187 transaction isolation, 225 write skew (transaction isolation), 246-251 conflict-free replicated datatypes (CRDTs), 174 conflicts conflict detection, 172 causal dependencies, 186, 342 in consensus algorithms, 368 in leaderless replication, 184 Index | 563 in log-based systems, 351, 521 in nonlinearizable systems, 343 in serializable snapshot isolation (SSI), 264 in two-phase commit, 357, 364 conflict resolution automatic conflict resolution, 174 by aborting transactions, 261 by apologizing, 527 convergence, 172-174 in leaderless systems, 190 last write wins (LWW), 186, 292 using atomic operations, 246 using custom logic, 173 determining what is a conflict, 174, 522 in multi-leader replication, 171-175 avoiding conflicts, 172 lost updates, 242-246 materializing, 251 relation to operation ordering, 339 write skew (transaction isolation), 246-251 congestion (networks) avoidance, 282 limiting accuracy of clocks, 293 queueing delays, 282 consensus, 321, 364-375, 554 algorithms, 366-368 preventing split brain, 367 safety and liveness properties, 365 using linearizable operations, 351 cost of, 369 distributed transactions, 352-375 in practice, 360-364 two-phase commit, 354-359 XA transactions, 361-364 impossibility of, 353 membership and coordination services, 370-373 relation to compare-and-set, 335, 350, 352, 374 relation to replication, 155, 349 relation to uniqueness constraints, 521 consistency, 224, 524 across different databases, 157, 452, 462, 492 causal, 339-348, 493 consistent prefix reads, 165-167 consistent snapshots, 156, 237-242, 294, 455, 500 (see also snapshots) 564 | Index crash recovery, 82 enforcing constraints (see constraints) eventual, 162, 322 (see also eventual consistency) in ACID transactions, 224, 529 in CAP theorem, 337 linearizability, 324-338 meanings of, 224 monotonic reads, 164-165 of secondary indexes, 231, 241, 354, 491, 500 ordering guarantees, 339-352 read-after-write, 162-164 sequential, 351 strong (see linearizability) timeliness and integrity, 524 using quorums, 181, 334 consistent hashing, 204 consistent prefix reads, 165 constraints (databases), 225, 248 asynchronously checked, 526 coordination avoidance, 527 ensuring idempotence, 519 in log-based systems, 521-524 across multiple partitions, 522 in two-phase commit, 355, 357 relation to consensus, 374, 521 relation to event ordering, 347 requiring linearizability, 330 Consul (service discovery), 372 consumers (message streams), 137, 440 backpressure, 441 consumer offsets in logs, 449 failures, 445, 449 fan-out, 11, 445, 448 load balancing, 444, 448 not keeping up with producers, 441, 450, 502 context switches, 14, 297 convergence (conflict resolution), 172-174, 322 coordination avoidance, 527 cross-datacenter, 168, 493 cross-partition ordering, 256, 294, 348, 523 services, 330, 370-373 coordinator (in 2PC), 356 failure, 358 in XA transactions, 361-364 recovery, 363 copy-on-write (B-trees), 82, 242 CORBA (Common Object Request Broker Architecture), 134 correctness, 6 auditability, 528-533 Byzantine fault tolerance, 305, 532 dealing with partial failures, 274 in log-based systems, 521-524 of algorithm within system model, 308 of compensating transactions, 355 of consensus, 368 of derived data, 497, 531 of immutable data, 461 of personal data, 535, 540 of time, 176, 289-295 of transactions, 225, 515, 529 timeliness and integrity, 524-528 corruption of data detecting, 519, 530-533 due to pathological memory access, 529 due to radiation, 305 due to split brain, 158, 302 due to weak transaction isolation, 233 formalization in consensus, 366 integrity as absence of, 524 network packets, 306 on disks, 227 preventing using write-ahead logs, 82 recovering from, 414, 460 Couchbase (database) durability, 89 hash partitioning, 203-204, 211 rebalancing, 213 request routing, 216 CouchDB (database) B-tree storage, 242 change feed, 456 document data model, 31 join support, 34 MapReduce support, 46, 400 replication, 170, 173 covering indexes, 86 CPUs cache coherence and memory barriers, 338 caching and pipelining, 99, 428 increasing parallelism, 43 CRDTs (see conflict-free replicated datatypes) CREATE INDEX statement (SQL), 85, 500 credit rating agencies, 535 Crunch (batch processing), 419, 427 hash joins, 409 sharded joins, 408 workflows, 403 cryptography defense against attackers, 306 end-to-end encryption and authentication, 519, 543 proving integrity of data, 532 CSS (Cascading Style Sheets), 44 CSV (comma-separated values), 70, 114, 396 Curator (ZooKeeper recipes), 330, 371 curl (Unix tool), 135, 397 cursor stability, 243 Cypher (query language), 52 comparison to SPARQL, 59 D data corruption (see corruption of data) data cubes, 102 data formats (see encoding) data integration, 490-498, 543 batch and stream processing, 494-498 lambda architecture, 497 maintaining derived state, 495 reprocessing data, 496 unifying, 498 by unbundling databases, 499-515 comparison to federated databases, 501 combining tools by deriving data, 490-494 derived data versus distributed transac‐ tions, 492 limits of total ordering, 493 ordering events to capture causality, 493 reasoning about dataflows, 491 need for, 385 data lakes, 415 data locality (see locality) data models, 27-64 graph-like models, 49-63 Datalog language, 60-63 property graphs, 50 RDF and triple-stores, 55-59 query languages, 42-48 relational model versus document model, 28-42 data protection regulations, 542 data systems, 3 about, 4 Index | 565 concerns when designing, 5 future of, 489-544 correctness, constraints, and integrity, 515-533 data integration, 490-498 unbundling databases, 499-515 heterogeneous, keeping in sync, 452 maintainability, 18-22 possible faults in, 221 reliability, 6-10 hardware faults, 7 human errors, 9 importance of, 10 software errors, 8 scalability, 10-18 unreliable clocks, 287-299 data warehousing, 91-95, 554 comparison to data lakes, 415 ETL (extract-transform-load), 92, 416, 452 keeping data systems in sync, 452 schema design, 93 slowly changing dimension (SCD), 476 data-intensive applications, 3 database triggers (see triggers) database-internal distributed transactions, 360, 364, 477 databases archival storage, 131 comparison of message brokers to, 443 dataflow through, 129 end-to-end argument for, 519-520 checking integrity, 531 inside-out, 504 (see also unbundling databases) output from batch workflows, 412 relation to event streams, 451-464 (see also changelogs) API support for change streams, 456, 506 change data capture, 454-457 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 unbundling, 499-515 composing data storage technologies, 499-504 designing applications around dataflow, 504-509 566 | Index observing derived state, 509-515 datacenters geographically distributed, 145, 164, 278, 493 multi-tenancy and shared resources, 284 network architecture, 276 network faults, 279 replication across multiple, 169 leaderless replication, 184 multi-leader replication, 168, 335 dataflow, 128-139, 504-509 correctness of dataflow systems, 525 differential, 504 message-passing, 136-139 reasoning about, 491 through databases, 129 through services, 131-136 dataflow engines, 421-423 comparison to stream processing, 464 directed acyclic graphs (DAG), 424 partitioning, approach to, 429 support for declarative queries, 427 Datalog (query language), 60-63 datatypes binary strings in XML and JSON, 114 conflict-free, 174 in Avro encodings, 122 in Thrift and Protocol Buffers, 121 numbers in XML and JSON, 114 Datomic (database) B-tree storage, 242 data model, 50, 57 Datalog query language, 60 excision (deleting data), 463 languages for transactions, 255 serial execution of transactions, 253 deadlocks detection, in two-phase commit (2PC), 364 in two-phase locking (2PL), 258 Debezium (change data capture), 455 declarative languages, 42, 554 Bloom, 504 CSS and XSL, 44 Cypher, 52 Datalog, 60 for batch processing, 427 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 delays bounded network delays, 285 bounded process pauses, 298 unbounded network delays, 282 unbounded process pauses, 296 deleting data, 463 denormalization (data representation), 34, 554 costs, 39 in derived data systems, 386 materialized views, 101 updating derived data, 228, 231, 490 versus normalization, 462 derived data, 386, 439, 554 from change data capture, 454 in event sourcing, 458-458 maintaining derived state through logs, 452-457, 459-463 observing, by subscribing to streams, 512 outputs of batch and stream processing, 495 through application code, 505 versus distributed transactions, 492 deterministic operations, 255, 274, 554 accidental nondeterminism, 423 and fault tolerance, 423, 426 and idempotence, 478, 492 computing derived data, 495, 526, 531 in state machine replication, 349, 452, 458 joins, 476 DevOps, 394 differential dataflow, 504 dimension tables, 94 dimensional modeling (see star schemas) directed acyclic graphs (DAGs), 424 dirty reads (transaction isolation), 234 dirty writes (transaction isolation), 235 discrimination, 534 disks (see hard disks) distributed actor frameworks, 138 distributed filesystems, 398-399 decoupling from query engines, 417 indiscriminately dumping data into, 415 use by MapReduce, 402 distributed systems, 273-312, 554 Byzantine faults, 304-306 cloud versus supercomputing, 275 detecting network faults, 280 faults and partial failures, 274-277 formalization of consensus, 365 impossibility results, 338, 353 issues with failover, 157 limitations of distributed transactions, 363 multi-datacenter, 169, 335 network problems, 277-286 quorums, relying on, 301 reasons for using, 145, 151 synchronized clocks, relying on, 291-295 system models, 306-310 use of clocks and time, 287 distributed transactions (see transactions) Django (web framework), 232 DNS (Domain Name System), 216, 372 Docker (container manager), 506 document data model, 30-42 comparison to relational model, 38-42 document references, 38, 403 document-oriented databases, 31 many-to-many relationships and joins, 36 multi-object transactions, need for, 231 versus relational model convergence of models, 41 data locality, 41 document-partitioned indexes, 206, 217, 411 domain-driven design (DDD), 457 DRBD (Distributed Replicated Block Device), 153 drift (clocks), 289 Drill (query engine), 93 Druid (database), 461 Dryad (dataflow engine), 421 dual writes, problems with, 452, 507 duplicates, suppression of, 517 (see also idempotence) using a unique ID, 518, 522 durability (transactions), 226, 554 duration (time), 287 measurement with monotonic clocks, 288 dynamic partitioning, 212 dynamically typed languages analogy to schema-on-read, 40 code generation and, 127 Dynamo-style databases (see leaderless replica‐ tion) E edges (in graphs), 49, 403 property graph model, 50 edit distance (full-text search), 88 effectively-once semantics, 476, 516 Index | 567 (see also exactly-once semantics) preservation of integrity, 525 elastic systems, 17 Elasticsearch (search server) document-partitioned indexes, 207 partition rebalancing, 211 percolator (stream search), 467 usage example, 4 use of Lucene, 79 ElephantDB (database), 413 Elm (programming language), 504, 512 encodings (data formats), 111-128 Avro, 122-127 binary variants of JSON and XML, 115 compatibility, 112 calling services, 136 using databases, 129-131 using message-passing, 138 defined, 113 JSON, XML, and CSV, 114 language-specific formats, 113 merits of schemas, 127 representations of data, 112 Thrift and Protocol Buffers, 117-121 end-to-end argument, 277, 519-520 checking integrity, 531 publish/subscribe streams, 512 enrichment (stream), 473 Enterprise JavaBeans (EJB), 134 entities (see vertices) epoch (consensus algorithms), 368 epoch (Unix timestamps), 288 equi-joins, 403 erasure coding (error correction), 398 Erlang OTP (actor framework), 139 error handling for network faults, 280 in transactions, 231 error-correcting codes, 277, 398 Esper (CEP engine), 466 etcd (coordination service), 370-373 linearizable operations, 333 locks and leader election, 330 quorum reads, 351 service discovery, 372 use of Raft algorithm, 349, 353 Ethereum (blockchain), 532 Ethernet (networks), 276, 278, 285 packet checksums, 306, 519 568 | Index Etherpad (collaborative editor), 170 ethics, 533-543 code of ethics and professional practice, 533 legislation and self-regulation, 542 predictive analytics, 533-536 amplifying bias, 534 feedback loops, 536 privacy and tracking, 536-543 consent and freedom of choice, 538 data as assets and power, 540 meaning of privacy, 539 surveillance, 537 respect, dignity, and agency, 543, 544 unintended consequences, 533, 536 ETL (extract-transform-load), 92, 405, 452, 554 use of Hadoop for, 416 event sourcing, 457-459 commands and events, 459 comparison to change data capture, 457 comparison to lambda architecture, 497 deriving current state from event log, 458 immutability and auditability, 459, 531 large, reliable data systems, 519, 526 Event Store (database), 458 event streams (see streams) events, 440 deciding on total order of, 493 deriving views from event log, 461 difference to commands, 459 event time versus processing time, 469, 477, 498 immutable, advantages of, 460, 531 ordering to capture causality, 493 reads as, 513 stragglers, 470, 498 timestamp of, in stream processing, 471 EventSource (browser API), 512 eventual consistency, 152, 162, 308, 322 (see also conflicts) and perpetual inconsistency, 525 evolvability, 21, 111 calling services, 136 graph-structured data, 52 of databases, 40, 129-131, 461, 497 of message-passing, 138 reprocessing data, 496, 498 schema evolution in Avro, 123 schema evolution in Thrift and Protocol Buffers, 120 schema-on-read, 39, 111, 128 exactly-once semantics, 360, 476, 516 parity with batch processors, 498 preservation of integrity, 525 exclusive mode (locks), 258 eXtended Architecture transactions (see XA transactions) extract-transform-load (see ETL) F Facebook Presto (query engine), 93 React, Flux, and Redux (user interface libra‐ ries), 512 social graphs, 49 Wormhole (change data capture), 455 fact tables, 93 failover, 157, 554 (see also leader-based replication) in leaderless replication, absence of, 178 leader election, 301, 348, 352 potential problems, 157 failures amplification by distributed transactions, 364, 495 failure detection, 280 automatic rebalancing causing cascading failures, 214 perfect failure detectors, 359 timeouts and unbounded delays, 282, 284 using ZooKeeper, 371 faults versus, 7 partial failures in distributed systems, 275-277, 310 fan-out (messaging systems), 11, 445 fault tolerance, 6-10, 555 abstractions for, 321 formalization in consensus, 365-369 use of replication, 367 human fault tolerance, 414 in batch processing, 406, 414, 422, 425 in log-based systems, 520, 524-526 in stream processing, 476-479 atomic commit, 477 idempotence, 478 maintaining derived state, 495 microbatching and checkpointing, 477 rebuilding state after a failure, 478 of distributed transactions, 362-364 transaction atomicity, 223, 354-361 faults, 6 Byzantine faults, 304-306 failures versus, 7 handled by transactions, 221 handling in supercomputers and cloud computing, 275 hardware, 7 in batch processing versus distributed data‐ bases, 417 in distributed systems, 274-277 introducing deliberately, 7, 280 network faults, 279-281 asymmetric faults, 300 detecting, 280 tolerance of, in multi-leader replication, 169 software errors, 8 tolerating (see fault tolerance) federated databases, 501 fence (CPU instruction), 338 fencing (preventing split brain), 158, 302-304 generating fencing tokens, 349, 370 properties of fencing tokens, 308 stream processors writing to databases, 478, 517 Fibre Channel (networks), 398 field tags (Thrift and Protocol Buffers), 119-121 file descriptors (Unix), 395 financial data, 460 Firebase (database), 456 Flink (processing framework), 421-423 dataflow APIs, 427 fault tolerance, 422, 477, 479 Gelly API (graph processing), 425 integration of batch and stream processing, 495, 498 machine learning, 428 query optimizer, 427 stream processing, 466 flow control, 282, 441, 555 FLP result (on consensus), 353 FlumeJava (dataflow library), 403, 427 followers, 152, 555 (see also leader-based replication) foreign keys, 38, 403 forward compatibility, 112 forward decay (algorithm), 16 Index | 569 Fossil (version control system), 463 shunning (deleting data), 463 FoundationDB (database) serializable transactions, 261, 265, 364 fractal trees, 83 full table scans, 403 full-text search, 555 and fuzzy indexes, 88 building search indexes, 411 Lucene storage engine, 79 functional reactive programming (FRP), 504 functional requirements, 22 futures (asynchronous operations), 135 fuzzy search (see similarity search) G garbage collection immutability and, 463 process pauses for, 14, 296-299, 301 (see also process pauses) genome analysis, 63, 429 geographically distributed datacenters, 145, 164, 278, 493 geospatial indexes, 87 Giraph (graph processing), 425 Git (version control system), 174, 342, 463 GitHub, postmortems, 157, 158, 309 global indexes (see term-partitioned indexes) GlusterFS (distributed filesystem), 398 GNU Coreutils (Linux), 394 GoldenGate (change data capture), 161, 170, 455 (see also Oracle) Google Bigtable (database) data model (see Bigtable data model) partitioning scheme, 199, 202 storage layout, 78 Chubby (lock service), 370 Cloud Dataflow (stream processor), 466, 477, 498 (see also Beam) Cloud Pub/Sub (messaging), 444, 448 Docs (collaborative editor), 170 Dremel (query engine), 93, 96 FlumeJava (dataflow library), 403, 427 GFS (distributed file system), 398 gRPC (RPC framework), 135 MapReduce (batch processing), 390 570 | Index (see also MapReduce) building search indexes, 411 task preemption, 418 Pregel (graph processing), 425 Spanner (see Spanner) TrueTime (clock API), 294 gossip protocol, 216 government use of data, 541 GPS (Global Positioning System) use for clock synchronization, 287, 290, 294, 295 GraphChi (graph processing), 426 graphs, 555 as data models, 49-63 example of graph-structured data, 49 property graphs, 50 RDF and triple-stores, 55-59 versus the network model, 60 processing and analysis, 424-426 fault tolerance, 425 Pregel processing model, 425 query languages Cypher, 52 Datalog, 60-63 recursive SQL queries, 53 SPARQL, 59-59 Gremlin (graph query language), 50 grep (Unix tool), 392 GROUP BY clause (SQL), 406 grouping records in MapReduce, 406 handling skew, 407 H Hadoop (data infrastructure) comparison to distributed databases, 390 comparison to MPP databases, 414-418 comparison to Unix, 413-414, 499 diverse processing models in ecosystem, 417 HDFS distributed filesystem (see HDFS) higher-level tools, 403 join algorithms, 403-410 (see also MapReduce) MapReduce (see MapReduce) YARN (see YARN) happens-before relationship, 340 capturing, 187 concurrency and, 186 hard disks access patterns, 84 detecting corruption, 519, 530 faults in, 7, 227 sequential write throughput, 75, 450 hardware faults, 7 hash indexes, 72-75 broadcast hash joins, 409 partitioned hash joins, 409 hash partitioning, 203-205, 217 consistent hashing, 204 problems with hash mod N, 210 range queries, 204 suitable hash functions, 203 with fixed number of partitions, 210 HAWQ (database), 428 HBase (database) bug due to lack of fencing, 302 bulk loading, 413 column-family data model, 41, 99 dynamic partitioning, 212 key-range partitioning, 202 log-structured storage, 78 request routing, 216 size-tiered compaction, 79 use of HDFS, 417 use of ZooKeeper, 370 HDFS (Hadoop Distributed File System), 398-399 (see also distributed filesystems) checking data integrity, 530 decoupling from query engines, 417 indiscriminately dumping data into, 415 metadata about datasets, 410 NameNode, 398 use by Flink, 479 use by HBase, 212 use by MapReduce, 402 HdrHistogram (numerical library), 16 head (Unix tool), 392 head vertex (property graphs), 51 head-of-line blocking, 15 heap files (databases), 86 Helix (cluster manager), 216 heterogeneous distributed transactions, 360, 364 heuristic decisions (in 2PC), 363 Hibernate (object-relational mapper), 30 hierarchical model, 36 high availability (see fault tolerance) high-frequency trading, 290, 299 high-performance computing (HPC), 275 hinted handoff, 183 histograms, 16 Hive (query engine), 419, 427 for data warehouses, 93 HCatalog and metastore, 410 map-side joins, 409 query optimizer, 427 skewed joins, 408 workflows, 403 Hollerith machines, 390 hopping windows (stream processing), 472 (see also windows) horizontal scaling (see scaling out) HornetQ (messaging), 137, 444 distributed transaction support, 361 hot spots, 201 due to celebrities, 205 for time-series data, 203 in batch processing, 407 relieving, 205 hot standbys (see leader-based replication) HTTP, use in APIs (see services) human errors, 9, 279, 414 HyperDex (database), 88 HyperLogLog (algorithm), 466 I I/O operations, waiting for, 297 IBM DB2 (database) distributed transaction support, 361 recursive query support, 54 serializable isolation, 242, 257 XML and JSON support, 30, 42 electromechanical card-sorting machines, 390 IMS (database), 36 imperative query APIs, 46 InfoSphere Streams (CEP engine), 466 MQ (messaging), 444 distributed transaction support, 361 System R (database), 222 WebSphere (messaging), 137 idempotence, 134, 478, 555 by giving operations unique IDs, 518, 522 idempotent operations, 517 immutability advantages of, 460, 531 Index | 571 deriving state from event log, 459-464 for crash recovery, 75 in B-trees, 82, 242 in event sourcing, 457 inputs to Unix commands, 397 limitations of, 463 Impala (query engine) for data warehouses, 93 hash joins, 409 native code generation, 428 use of HDFS, 417 impedance mismatch, 29 imperative languages, 42 setting element styles (example), 45 in doubt (transaction status), 358 holding locks, 362 orphaned transactions, 363 in-memory databases, 88 durability, 227 serial transaction execution, 253 incidents cascading failures, 9 crashes due to leap seconds, 290 data corruption and financial losses due to concurrency bugs, 233 data corruption on hard disks, 227 data loss due to last-write-wins, 173, 292 data on disks unreadable, 309 deleted items reappearing, 174 disclosure of sensitive data due to primary key reuse, 157 errors in transaction serializability, 529 gigabit network interface with 1 Kb/s throughput, 311 network faults, 279 network interface dropping only inbound packets, 279 network partitions and whole-datacenter failures, 275 poor handling of network faults, 280 sending message to ex-partner, 494 sharks biting undersea cables, 279 split brain due to 1-minute packet delay, 158, 279 vibrations in server rack, 14 violation of uniqueness constraint, 529 indexes, 71, 555 and snapshot isolation, 241 as derived data, 386, 499-504 572 | Index B-trees, 79-83 building in batch processes, 411 clustered, 86 comparison of B-trees and LSM-trees, 83-85 concatenated, 87 covering (with included columns), 86 creating, 500 full-text search, 88 geospatial, 87 hash, 72-75 index-range locking, 260 multi-column, 87 partitioning and secondary indexes, 206-209, 217 secondary, 85 (see also secondary indexes) problems with dual writes, 452, 491 SSTables and LSM-trees, 76-79 updating when data changes, 452, 467 Industrial Revolution, 541 InfiniBand (networks), 285 InfiniteGraph (database), 50 InnoDB (storage engine) clustered index on primary key, 86 not preventing lost updates, 245 preventing write skew, 248, 257 serializable isolation, 257 snapshot isolation support, 239 inside-out databases, 504 (see also unbundling databases) integrating different data systems (see data integration) integrity, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 in consensus formalization, 365 integrity checks, 530 (see also auditing) end-to-end, 519, 531 use of snapshot isolation, 238 maintaining despite software bugs, 529 Interface Definition Language (IDL), 117, 122 intermediate state, materialization of, 420-423 internet services, systems for implementing, 275 invariants, 225 (see also constraints) inversion of control, 396 IP (Internet Protocol) unreliability of, 277 ISDN (Integrated Services Digital Network), 284 isolation (in transactions), 225, 228, 555 correctness and, 515 for single-object writes, 230 serializability, 251-266 actual serial execution, 252-256 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 violating, 228 weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-237 snapshot isolation, 237-242 iterative processing, 424-426 J Java Database Connectivity (JDBC) distributed transaction support, 361 network drivers, 128 Java Enterprise Edition (EE), 134, 356, 361 Java Message Service (JMS), 444 (see also messaging systems) comparison to log-based messaging, 448, 451 distributed transaction support, 361 message ordering, 446 Java Transaction API (JTA), 355, 361 Java Virtual Machine (JVM) bytecode generation, 428 garbage collection pauses, 296 process reuse in batch processors, 422 JavaScript in MapReduce querying, 46 setting element styles (example), 45 use in advanced queries, 48 Jena (RDF framework), 57 Jepsen (fault tolerance testing), 515 jitter (network delay), 284 joins, 555 by index lookup, 403 expressing as relational operators, 427 in relational and document databases, 34 MapReduce map-side joins, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 MapReduce reduce-side joins, 403-408 handling skew, 407 sort-merge joins, 405 parallel execution of, 415 secondary indexes and, 85 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 support in document databases, 42 JOTM (transaction coordinator), 356 JSON Avro schema representation, 122 binary variants, 115 for application data, issues with, 114 in relational databases, 30, 42 representing a résumé (example), 31 Juttle (query language), 504 K k-nearest neighbors, 429 Kafka (messaging), 137, 448 Kafka Connect (database integration), 457, 461 Kafka Streams (stream processor), 466, 467 fault tolerance, 479 leader-based replication, 153 log compaction, 456, 467 message offsets, 447, 478 request routing, 216 transaction support, 477 usage example, 4 Ketama (partitioning library), 213 key-value stores, 70 as batch process output, 412 hash indexes, 72-75 in-memory, 89 partitioning, 201-205 by hash of key, 203, 217 by key range, 202, 217 dynamic partitioning, 212 skew and hot spots, 205 Kryo (Java), 113 Kubernetes (cluster manager), 418, 506 L lambda architecture, 497 Lamport timestamps, 345 Index | 573 Large Hadron Collider (LHC), 64 last write wins (LWW), 173, 334 discarding concurrent writes, 186 problems with, 292 prone to lost updates, 246 late binding, 396 latency instability under two-phase locking, 259 network latency and resource utilization, 286 response time versus, 14 tail latency, 15, 207 leader-based replication, 152-161 (see also replication) failover, 157, 301 handling node outages, 156 implementation of replication logs change data capture, 454-457 (see also changelogs) statement-based, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 linearizability of operations, 333 locking and leader election, 330 log sequence number, 156, 449 read-scaling architecture, 161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 leaderless replication, 177-191 (see also replication) detecting concurrent writes, 184-191 capturing happens-before relationship, 187 happens-before relationship and concur‐ rency, 186 last write wins, 186 merging concurrently written values, 190 version vectors, 191 multi-datacenter, 184 quorums, 179-182 consistency limitations, 181-183, 334 sloppy quorums and hinted handoff, 183 read repair and anti-entropy, 178 leap seconds, 8, 290 in time-of-day clocks, 288 leases, 295 implementation with ZooKeeper, 370 574 | Index need for fencing, 302 ledgers, 460 distributed ledger technologies, 532 legacy systems, maintenance of, 18 less (Unix tool), 397 LevelDB (storage engine), 78 leveled compaction, 79 Levenshtein automata, 88 limping (partial failure), 311 linearizability, 324-338, 555 cost of, 335-338 CAP theorem, 336 memory on multi-core CPUs, 338 definition, 325-329 implementing with total order broadcast, 350 in ZooKeeper, 370 of derived data systems, 492, 524 avoiding coordination, 527 of different replication methods, 332-335 using quorums, 334 relying on, 330-332 constraints and uniqueness, 330 cross-channel timing dependencies, 331 locking and leader election, 330 stronger than causal consistency, 342 using to implement total order broadcast, 351 versus serializability, 329 LinkedIn Azkaban (workflow scheduler), 402 Databus (change data capture), 161, 455 Espresso (database), 31, 126, 130, 153, 216 Helix (cluster manager) (see Helix) profile (example), 30 reference to company entity (example), 34 Rest.li (RPC framework), 135 Voldemort (database) (see Voldemort) Linux, leap second bug, 8, 290 liveness properties, 308 LMDB (storage engine), 82, 242 load approaches to coping with, 17 describing, 11 load testing, 16 load balancing (messaging), 444 local indexes (see document-partitioned indexes) locality (data access), 32, 41, 555 in batch processing, 400, 405, 421 in stateful clients, 170, 511 in stream processing, 474, 478, 508, 522 location transparency, 134 in the actor model, 138 locks, 556 deadlock, 258 distributed locking, 301-304, 330 fencing tokens, 303 implementation with ZooKeeper, 370 relation to consensus, 374 for transaction isolation in snapshot isolation, 239 in two-phase locking (2PL), 257-261 making operations atomic, 243 performance, 258 preventing dirty writes, 236 preventing phantoms with index-range locks, 260, 265 read locks (shared mode), 236, 258 shared mode and exclusive mode, 258 in two-phase commit (2PC) deadlock detection, 364 in-doubt transactions holding locks, 362 materializing conflicts with, 251 preventing lost updates by explicit locking, 244 log sequence number, 156, 449 logic programming languages, 504 logical clocks, 293, 343, 494 for read-after-write consistency, 164 logical logs, 160 logs (data structure), 71, 556 advantages of immutability, 460 compaction, 73, 79, 456, 460 for stream operator state, 479 creating using total order broadcast, 349 implementing uniqueness constraints, 522 log-based messaging, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 disk space usage, 450 replaying old messages, 451, 496, 498 slow consumers, 450 using logs for message storage, 447 log-structured storage, 71-79 log-structured merge tree (see LSMtrees) replication, 152, 158-161 change data capture, 454-457 (see also changelogs) coordination with snapshot, 156 logical (row-based) replication, 160 statement-based replication, 158 trigger-based replication, 161 write-ahead log (WAL) shipping, 159 scalability limits, 493 loose coupling, 396, 419, 502 lost updates (see updates) LSM-trees (indexes), 78-79 comparison to B-trees, 83-85 Lucene (storage engine), 79 building indexes in batch processes, 411 similarity search, 88 Luigi (workflow scheduler), 402 LWW (see last write wins) M machine learning ethical considerations, 534 (see also ethics) iterative processing, 424 models derived from training data, 505 statistical and numerical algorithms, 428 MADlib (machine learning toolkit), 428 magic scaling sauce, 18 Mahout (machine learning toolkit), 428 maintainability, 18-22, 489 defined, 23 design principles for software systems, 19 evolvability (see evolvability) operability, 19 simplicity and managing complexity, 20 many-to-many relationships in document model versus relational model, 39 modeling as graphs, 49 many-to-one and many-to-many relationships, 33-36 many-to-one relationships, 34 MapReduce (batch processing), 390, 399-400 accessing external services within job, 404, 412 comparison to distributed databases designing for frequent faults, 417 diversity of processing models, 416 diversity of storage, 415 Index | 575 comparison to stream processing, 464 comparison to Unix, 413-414 disadvantages and limitations of, 419 fault tolerance, 406, 414, 422 higher-level tools, 403, 426 implementation in Hadoop, 400-403 the shuffle, 402 implementation in MongoDB, 46-48 machine learning, 428 map-side processing, 408-410 broadcast hash joins, 409 merge joins, 410 partitioned hash joins, 409 mapper and reducer functions, 399 materialization of intermediate state, 419-423 output of batch workflows, 411-413 building search indexes, 411 key-value stores, 412 reduce-side processing, 403-408 analysis of user activity events (exam‐ ple), 404 grouping records by same key, 406 handling skew, 407 sort-merge joins, 405 workflows, 402 marshalling (see encoding) massively parallel processing (MPP), 216 comparison to composing storage technolo‐ gies, 502 comparison to Hadoop, 414-418, 428 master-master replication (see multi-leader replication) master-slave replication (see leader-based repli‐ cation) materialization, 556 aggregate values, 101 conflicts, 251 intermediate state (batch processing), 420-423 materialized views, 101 as derived data, 386, 499-504 maintaining, using stream processing, 467, 475 Maven (Java build tool), 428 Maxwell (change data capture), 455 mean, 14 media monitoring, 467 median, 14 576 | Index meeting room booking (example), 249, 259, 521 membership services, 372 Memcached (caching server), 4, 89 memory in-memory databases, 88 durability, 227 serial transaction execution, 253 in-memory representation of data, 112 random bit-flips in, 529 use by indexes, 72, 77 memory barrier (CPU instruction), 338 MemSQL (database) in-memory storage, 89 read committed isolation, 236 memtable (in LSM-trees), 78 Mercurial (version control system), 463 merge joins, MapReduce map-side, 410 mergeable persistent data structures, 174 merging sorted files, 76, 402, 405 Merkle trees, 532 Mesos (cluster manager), 418, 506 message brokers (see messaging systems) message-passing, 136-139 advantages over direct RPC, 137 distributed actor frameworks, 138 evolvability, 138 MessagePack (encoding format), 116 messages exactly-once semantics, 360, 476 loss of, 442 using total order broadcast, 348 messaging systems, 440-451 (see also streams) backpressure, buffering, or dropping mes‐ sages, 441 brokerless messaging, 442 event logs, 446-451 comparison to traditional messaging, 448, 451 consumer offsets, 449 replaying old messages, 451, 496, 498 slow consumers, 450 message brokers, 443-446 acknowledgements and redelivery, 445 comparison to event logs, 448, 451 multiple consumers of same topic, 444 reliability, 442 uniqueness in log-based messaging, 522 Meteor (web framework), 456 microbatching, 477, 495 microservices, 132 (see also services) causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 Microsoft Azure Service Bus (messaging), 444 Azure Storage, 155, 398 Azure Stream Analytics, 466 DCOM (Distributed Component Object Model), 134 MSDTC (transaction coordinator), 356 Orleans (see Orleans) SQL Server (see SQL Server) migrating (rewriting) data, 40, 130, 461, 497 modulus operator (%), 210 MongoDB (database) aggregation pipeline, 48 atomic operations, 243 BSON, 41 document data model, 31 hash partitioning (sharding), 203-204 key-range partitioning, 202 lack of join support, 34, 42 leader-based replication, 153 MapReduce support, 46, 400 oplog parsing, 455, 456 partition splitting, 212 request routing, 216 secondary indexes, 207 Mongoriver (change data capture), 455 monitoring, 10, 19 monotonic clocks, 288 monotonic reads, 164 MPP (see massively parallel processing) MSMQ (messaging), 361 multi-column indexes, 87 multi-leader replication, 168-177 (see also replication) handling write conflicts, 171 conflict avoidance, 172 converging toward a consistent state, 172 custom conflict resolution logic, 173 determining what is a conflict, 174 linearizability, lack of, 333 replication topologies, 175-177 use cases, 168 clients with offline operation, 170 collaborative editing, 170 multi-datacenter replication, 168, 335 multi-object transactions, 228 need for, 231 Multi-Paxos (total order broadcast), 367 multi-table index cluster tables (Oracle), 41 multi-tenancy, 284 multi-version concurrency control (MVCC), 239, 266 detecting stale MVCC reads, 263 indexes and snapshot isolation, 241 mutual exclusion, 261 (see also locks) MySQL (database) binlog coordinates, 156 binlog parsing for change data capture, 455 circular replication topology, 175 consistent snapshots, 156 distributed transaction support, 361 InnoDB storage engine (see InnoDB) JSON support, 30, 42 leader-based replication, 153 performance of XA transactions, 360 row-based replication, 160 schema changes in, 40 snapshot isolation support, 242 (see also InnoDB) statement-based replication, 159 Tungsten Replicator (multi-leader replica‐ tion), 170 conflict detection, 177 N nanomsg (messaging library), 442 Narayana (transaction coordinator), 356 NATS (messaging), 137 near-real-time (nearline) processing, 390 (see also stream processing) Neo4j (database) Cypher query language, 52 graph data model, 50 Nephele (dataflow engine), 421 netcat (Unix tool), 397 Netflix Chaos Monkey, 7, 280 Network Attached Storage (NAS), 146, 398 network model, 36 Index | 577 graph databases versus, 60 imperative query APIs, 46 Network Time Protocol (see NTP) networks congestion and queueing, 282 datacenter network topologies, 276 faults (see faults) linearizability and network delays, 338 network partitions, 279, 337 timeouts and unbounded delays, 281 next-key locking, 260 nodes (in graphs) (see vertices) nodes (processes), 556 handling outages in leader-based replica‐ tion, 156 system models for failure, 307 noisy neighbors, 284 nonblocking atomic commit, 359 nondeterministic operations accidental nondeterminism, 423 partial failures in distributed systems, 275 nonfunctional requirements, 22 nonrepeatable reads, 238 (see also read skew) normalization (data representation), 33, 556 executing joins, 39, 42, 403 foreign key references, 231 in systems of record, 386 versus denormalization, 462 NoSQL, 29, 499 transactions and, 223 Notation3 (N3), 56 npm (package manager), 428 NTP (Network Time Protocol), 287 accuracy, 289, 293 adjustments to monotonic clocks, 289 multiple server addresses, 306 numbers, in XML and JSON encodings, 114 O object-relational mapping (ORM) frameworks, 30 error handling and aborted transactions, 232 unsafe read-modify-write cycle code, 244 object-relational mismatch, 29 observer pattern, 506 offline systems, 390 (see also batch processing) 578 | Index stateful, offline-capable clients, 170, 511 offline-first applications, 511 offsets consumer offsets in partitioned logs, 449 messages in partitioned logs, 447 OLAP (online analytic processing), 91, 556 data cubes, 102 OLTP (online transaction processing), 90, 556 analytics queries versus, 411 workload characteristics, 253 one-to-many relationships, 30 JSON representation, 32 online systems, 389 (see also services) Oozie (workflow scheduler), 402 OpenAPI (service definition format), 133 OpenStack Nova (cloud infrastructure) use of ZooKeeper, 370 Swift (object storage), 398 operability, 19 operating systems versus databases, 499 operation identifiers, 518, 522 operational transformation, 174 operators, 421 flow of data between, 424 in stream processing, 464 optimistic concurrency control, 261 Oracle (database) distributed transaction support, 361 GoldenGate (change data capture), 161, 170, 455 lack of serializability, 226 leader-based replication, 153 multi-table index cluster tables, 41 not preventing write skew, 248 partitioned indexes, 209 PL/SQL language, 255 preventing lost updates, 245 read committed isolation, 236 Real Application Clusters (RAC), 330 recursive query support, 54 snapshot isolation support, 239, 242 TimesTen (in-memory database), 89 WAL-based replication, 160 XML support, 30 ordering, 339-352 by sequence numbers, 343-348 causal ordering, 339-343 partial order, 341 limits of total ordering, 493 total order broadcast, 348-352 Orleans (actor framework), 139 outliers (response time), 14 Oz (programming language), 504 P package managers, 428, 505 packet switching, 285 packets corruption of, 306 sending via UDP, 442 PageRank (algorithm), 49, 424 paging (see virtual memory) ParAccel (database), 93 parallel databases (see massively parallel pro‐ cessing) parallel execution of graph analysis algorithms, 426 queries in MPP databases, 216 Parquet (data format), 96, 131 (see also column-oriented storage) use in Hadoop, 414 partial failures, 275, 310 limping, 311 partial order, 341 partitioning, 199-218, 556 and replication, 200 in batch processing, 429 multi-partition operations, 514 enforcing constraints, 522 secondary index maintenance, 495 of key-value data, 201-205 by key range, 202 skew and hot spots, 205 rebalancing partitions, 209-214 automatic or manual rebalancing, 213 problems with hash mod N, 210 using dynamic partitioning, 212 using fixed number of partitions, 210 using N partitions per node, 212 replication and, 147 request routing, 214-216 secondary indexes, 206-209 document-based partitioning, 206 term-based partitioning, 208 serial execution of transactions and, 255 Paxos (consensus algorithm), 366 ballot number, 368 Multi-Paxos (total order broadcast), 367 percentiles, 14, 556 calculating efficiently, 16 importance of high percentiles, 16 use in service level agreements (SLAs), 15 Percona XtraBackup (MySQL tool), 156 performance describing, 13 of distributed transactions, 360 of in-memory databases, 89 of linearizability, 338 of multi-leader replication, 169 perpetual inconsistency, 525 pessimistic concurrency control, 261 phantoms (transaction isolation), 250 materializing conflicts, 251 preventing, in serializability, 259 physical clocks (see clocks) pickle (Python), 113 Pig (dataflow language), 419, 427 replicated joins, 409 skewed joins, 407 workflows, 403 Pinball (workflow scheduler), 402 pipelined execution, 423 in Unix, 394 point in time, 287 polyglot persistence, 29 polystores, 501 PostgreSQL (database) BDR (multi-leader replication), 170 causal ordering of writes, 177 Bottled Water (change data capture), 455 Bucardo (trigger-based replication), 161, 173 distributed transaction support, 361 foreign data wrappers, 501 full text search support, 490 leader-based replication, 153 log sequence number, 156 MVCC implementation, 239, 241 PL/pgSQL language, 255 PostGIS geospatial indexes, 87 preventing lost updates, 245 preventing write skew, 248, 261 read committed isolation, 236 recursive query support, 54 representing graphs, 51 Index | 579 serializable snapshot isolation (SSI), 261 snapshot isolation support, 239, 242 WAL-based replication, 160 XML and JSON support, 30, 42 pre-splitting, 212 Precision Time Protocol (PTP), 290 predicate locks, 259 predictive analytics, 533-536 amplifying bias, 534 ethics of (see ethics) feedback loops, 536 preemption of datacenter resources, 418 of threads, 298 Pregel processing model, 425 primary keys, 85, 556 compound primary key (Cassandra), 204 primary-secondary replication (see leaderbased replication) privacy, 536-543 consent and freedom of choice, 538 data as assets and power, 540 deleting data, 463 ethical considerations (see ethics) legislation and self-regulation, 542 meaning of, 539 surveillance, 537 tracking behavioral data, 536 probabilistic algorithms, 16, 466 process pauses, 295-299 processing time (of events), 469 producers (message streams), 440 programming languages dataflow languages, 504 for stored procedures, 255 functional reactive programming (FRP), 504 logic programming, 504 Prolog (language), 61 (see also Datalog) promises (asynchronous operations), 135 property graphs, 50 Cypher query language, 52 Protocol Buffers (data format), 117-121 field tags and schema evolution, 120 provenance of data, 531 publish/subscribe model, 441 publishers (message streams), 440 punch card tabulating machines, 390 580 | Index pure functions, 48 putting computation near data, 400 Q Qpid (messaging), 444 quality of service (QoS), 285 Quantcast File System (distributed filesystem), 398 query languages, 42-48 aggregation pipeline, 48 CSS and XSL, 44 Cypher, 52 Datalog, 60 Juttle, 504 MapReduce querying, 46-48 recursive SQL queries, 53 relational algebra and SQL, 42 SPARQL, 59 query optimizers, 37, 427 queueing delays (networks), 282 head-of-line blocking, 15 latency and response time, 14 queues (messaging), 137 quorums, 179-182, 556 for leaderless replication, 179 in consensus algorithms, 368 limitations of consistency, 181-183, 334 making decisions in distributed systems, 301 monitoring staleness, 182 multi-datacenter replication, 184 relying on durability, 309 sloppy quorums and hinted handoff, 183 R R-trees (indexes), 87 RabbitMQ (messaging), 137, 444 leader-based replication, 153 race conditions, 225 (see also concurrency) avoiding with linearizability, 331 caused by dual writes, 452 dirty writes, 235 in counter increments, 235 lost updates, 242-246 preventing with event logs, 462, 507 preventing with serializable isolation, 252 write skew, 246-251 Raft (consensus algorithm), 366 sensitivity to network problems, 369 term number, 368 use in etcd, 353 RAID (Redundant Array of Independent Disks), 7, 398 railways, schema migration on, 496 RAMCloud (in-memory storage), 89 ranking algorithms, 424 RDF (Resource Description Framework), 57 querying with SPARQL, 59 RDMA (Remote Direct Memory Access), 276 read committed isolation level, 234-237 implementing, 236 multi-version concurrency control (MVCC), 239 no dirty reads, 234 no dirty writes, 235 read path (derived data), 509 read repair (leaderless replication), 178 for linearizability, 335 read replicas (see leader-based replication) read skew (transaction isolation), 238, 266 as violation of causality, 340 read-after-write consistency, 163, 524 cross-device, 164 read-modify-write cycle, 243 read-scaling architecture, 161 reads as events, 513 real-time collaborative editing, 170 near-real-time processing, 390 (see also stream processing) publish/subscribe dataflow, 513 response time guarantees, 298 time-of-day clocks, 288 rebalancing partitions, 209-214, 556 (see also partitioning) automatic or manual rebalancing, 213 dynamic partitioning, 212 fixed number of partitions, 210 fixed number of partitions per node, 212 problems with hash mod N, 210 recency guarantee, 324 recommendation engines batch process outputs, 412 batch workflows, 403, 420 iterative processing, 424 statistical and numerical algorithms, 428 records, 399 events in stream processing, 440 recursive common table expressions (SQL), 54 redelivery (messaging), 445 Redis (database) atomic operations, 243 durability, 89 Lua scripting, 255 single-threaded execution, 253 usage example, 4 redundancy hardware components, 7 of derived data, 386 (see also derived data) Reed–Solomon codes (error correction), 398 refactoring, 22 (see also evolvability) regions (partitioning), 199 register (data structure), 325 relational data model, 28-42 comparison to document model, 38-42 graph queries in SQL, 53 in-memory databases with, 89 many-to-one and many-to-many relation‐ ships, 33 multi-object transactions, need for, 231 NoSQL as alternative to, 29 object-relational mismatch, 29 relational algebra and SQL, 42 versus document model convergence of models, 41 data locality, 41 relational databases eventual consistency, 162 history, 28 leader-based replication, 153 logical logs, 160 philosophy compared to Unix, 499, 501 schema changes, 40, 111, 130 statement-based replication, 158 use of B-tree indexes, 80 relationships (see edges) reliability, 6-10, 489 building a reliable system from unreliable components, 276 defined, 6, 22 hardware faults, 7 human errors, 9 importance of, 10 of messaging systems, 442 Index | 581 software errors, 8 Remote Method Invocation (Java RMI), 134 remote procedure calls (RPCs), 134-136 (see also services) based on futures, 135 data encoding and evolution, 136 issues with, 134 using Avro, 126, 135 using Thrift, 135 versus message brokers, 137 repeatable reads (transaction isolation), 242 replicas, 152 replication, 151-193, 556 and durability, 227 chain replication, 155 conflict resolution and, 246 consistency properties, 161-167 consistent prefix reads, 165 monotonic reads, 164 reading your own writes, 162 in distributed filesystems, 398 leaderless, 177-191 detecting concurrent writes, 184-191 limitations of quorum consistency, 181-183, 334 sloppy quorums and hinted handoff, 183 monitoring staleness, 182 multi-leader, 168-177 across multiple datacenters, 168, 335 handling write conflicts, 171-175 replication topologies, 175-177 partitioning and, 147, 200 reasons for using, 145, 151 single-leader, 152-161 failover, 157 implementation of replication logs, 158-161 relation to consensus, 367 setting up new followers, 155 synchronous versus asynchronous, 153-155 state machine replication, 349, 452 using erasure coding, 398 with heterogeneous data systems, 453 replication logs (see logs) reprocessing data, 496, 498 (see also evolvability) from log-based messaging, 451 request routing, 214-216 582 | Index approaches to, 214 parallel query execution, 216 resilient systems, 6 (see also fault tolerance) response time as performance metric for services, 13, 389 guarantees on, 298 latency versus, 14 mean and percentiles, 14 user experience, 15 responsibility and accountability, 535 REST (Representational State Transfer), 133 (see also services) RethinkDB (database) document data model, 31 dynamic partitioning, 212 join support, 34, 42 key-range partitioning, 202 leader-based replication, 153 subscribing to changes, 456 Riak (database) Bitcask storage engine, 72 CRDTs, 174, 191 dotted version vectors, 191 gossip protocol, 216 hash partitioning, 203-204, 211 last-write-wins conflict resolution, 186 leaderless replication, 177 LevelDB storage engine, 78 linearizability, lack of, 335 multi-datacenter support, 184 preventing lost updates across replicas, 246 rebalancing, 213 search feature, 209 secondary indexes, 207 siblings (concurrently written values), 190 sloppy quorums, 184 ring buffers, 450 Ripple (cryptocurrency), 532 rockets, 10, 36, 305 RocksDB (storage engine), 78 leveled compaction, 79 rollbacks (transactions), 222 rolling upgrades, 8, 112 routing (see request routing) row-oriented storage, 96 row-based replication, 160 rowhammer (memory corruption), 529 RPCs (see remote procedure calls) Rubygems (package manager), 428 rules (Datalog), 61 S safety and liveness properties, 308 in consensus algorithms, 366 in transactions, 222 sagas (see compensating transactions) Samza (stream processor), 466, 467 fault tolerance, 479 streaming SQL support, 466 sandboxes, 9 SAP HANA (database), 93 scalability, 10-18, 489 approaches for coping with load, 17 defined, 22 describing load, 11 describing performance, 13 partitioning and, 199 replication and, 161 scaling up versus scaling out, 146 scaling out, 17, 146 (see also shared-nothing architecture) scaling up, 17, 146 scatter/gather approach, querying partitioned databases, 207 SCD (slowly changing dimension), 476 schema-on-read, 39 comparison to evolvable schema, 128 in distributed filesystems, 415 schema-on-write, 39 schemaless databases (see schema-on-read) schemas, 557 Avro, 122-127 reader determining writer’s schema, 125 schema evolution, 123 dynamically generated, 126 evolution of, 496 affecting application code, 111 compatibility checking, 126 in databases, 129-131 in message-passing, 138 in service calls, 136 flexibility in document model, 39 for analytics, 93-95 for JSON and XML, 115 merits of, 127 schema migration on railways, 496 Thrift and Protocol Buffers, 117-121 schema evolution, 120 traditional approach to design, fallacy in, 462 searches building search indexes in batch processes, 411 k-nearest neighbors, 429 on streams, 467 partitioned secondary indexes, 206 secondaries (see leader-based replication) secondary indexes, 85, 557 partitioning, 206-209, 217 document-partitioned, 206 index maintenance, 495 term-partitioned, 208 problems with dual writes, 452, 491 updating, transaction isolation and, 231 secondary sorts, 405 sed (Unix tool), 392 self-describing files, 127 self-joins, 480 self-validating systems, 530 semantic web, 57 semi-synchronous replication, 154 sequence number ordering, 343-348 generators, 294, 344 insufficiency for enforcing constraints, 347 Lamport timestamps, 345 use of timestamps, 291, 295, 345 sequential consistency, 351 serializability, 225, 233, 251-266, 557 linearizability versus, 329 pessimistic versus optimistic concurrency control, 261 serial execution, 252-256 partitioning, 255 using stored procedures, 253, 349 serializable snapshot isolation (SSI), 261-266 detecting stale MVCC reads, 263 detecting writes that affect prior reads, 264 distributed execution, 265, 364 performance of SSI, 265 preventing write skew, 262-265 two-phase locking (2PL), 257-261 index-range locks, 260 performance, 258 Serializable (Java), 113 Index | 583 serialization, 113 (see also encoding) service discovery, 135, 214, 372 using DNS, 216, 372 service level agreements (SLAs), 15 service-oriented architecture (SOA), 132 (see also services) services, 131-136 microservices, 132 causal dependencies across services, 493 loose coupling, 502 relation to batch/stream processors, 389, 508 remote procedure calls (RPCs), 134-136 issues with, 134 similarity to databases, 132 web services, 132, 135 session windows (stream processing), 472 (see also windows) sessionization, 407 sharding (see partitioning) shared mode (locks), 258 shared-disk architecture, 146, 398 shared-memory architecture, 146 shared-nothing architecture, 17, 146-147, 557 (see also replication) distributed filesystems, 398 (see also distributed filesystems) partitioning, 199 use of network, 277 sharks biting undersea cables, 279 counting (example), 46-48 finding (example), 42 website about (example), 44 shredding (in relational model), 38 siblings (concurrent values), 190, 246 (see also conflicts) similarity search edit distance, 88 genome data, 63 k-nearest neighbors, 429 single-leader replication (see leader-based rep‐ lication) single-threaded execution, 243, 252 in batch processing, 406, 421, 426 in stream processing, 448, 463, 522 size-tiered compaction, 79 skew, 557 584 | Index clock skew, 291-294, 334 in transaction isolation read skew, 238, 266 write skew, 246-251, 262-265 (see also write skew) meanings of, 238 unbalanced workload, 201 compensating for, 205 due to celebrities, 205 for time-series data, 203 in batch processing, 407 slaves (see leader-based replication) sliding windows (stream processing), 472 (see also windows) sloppy quorums, 183 (see also quorums) lack of linearizability, 334 slowly changing dimension (data warehouses), 476 smearing (leap seconds adjustments), 290 snapshots (databases) causal consistency, 340 computing derived data, 500 in change data capture, 455 serializable snapshot isolation (SSI), 261-266, 329 setting up a new replica, 156 snapshot isolation and repeatable read, 237-242 implementing with MVCC, 239 indexes and MVCC, 241 visibility rules, 240 synchronized clocks for global snapshots, 294 snowflake schemas, 95 SOAP, 133 (see also services) evolvability, 136 software bugs, 8 maintaining integrity, 529 solid state drives (SSDs) access patterns, 84 detecting corruption, 519, 530 faults in, 227 sequential write throughput, 75 Solr (search server) building indexes in batch processes, 411 document-partitioned indexes, 207 request routing, 216 usage example, 4 use of Lucene, 79 sort (Unix tool), 392, 394, 395 sort-merge joins (MapReduce), 405 Sorted String Tables (see SSTables) sorting sort order in column storage, 99 source of truth (see systems of record) Spanner (database) data locality, 41 snapshot isolation using clocks, 295 TrueTime API, 294 Spark (processing framework), 421-423 bytecode generation, 428 dataflow APIs, 427 fault tolerance, 422 for data warehouses, 93 GraphX API (graph processing), 425 machine learning, 428 query optimizer, 427 Spark Streaming, 466 microbatching, 477 stream processing on top of batch process‐ ing, 495 SPARQL (query language), 59 spatial algorithms, 429 split brain, 158, 557 in consensus algorithms, 352, 367 preventing, 322, 333 using fencing tokens to avoid, 302-304 spreadsheets, dataflow programming capabili‐ ties, 504 SQL (Structured Query Language), 21, 28, 43 advantages and limitations of, 416 distributed query execution, 48 graph queries in, 53 isolation levels standard, issues with, 242 query execution on Hadoop, 416 résumé (example), 30 SQL injection vulnerability, 305 SQL on Hadoop, 93 statement-based replication, 158 stored procedures, 255 SQL Server (database) data warehousing support, 93 distributed transaction support, 361 leader-based replication, 153 preventing lost updates, 245 preventing write skew, 248, 257 read committed isolation, 236 recursive query support, 54 serializable isolation, 257 snapshot isolation support, 239 T-SQL language, 255 XML support, 30 SQLstream (stream analytics), 466 SSDs (see solid state drives) SSTables (storage format), 76-79 advantages over hash indexes, 76 concatenated index, 204 constructing and maintaining, 78 making LSM-Tree from, 78 staleness (old data), 162 cross-channel timing dependencies, 331 in leaderless databases, 178 in multi-version concurrency control, 263 monitoring for, 182 of client state, 512 versus linearizability, 324 versus timeliness, 524 standbys (see leader-based replication) star replication topologies, 175 star schemas, 93-95 similarity to event sourcing, 458 Star Wars analogy (event time versus process‐ ing time), 469 state derived from log of immutable events, 459 deriving current state from the event log, 458 interplay between state changes and appli‐ cation code, 507 maintaining derived state, 495 maintenance by stream processor in streamstream joins, 473 observing derived state, 509-515 rebuilding after stream processor failure, 478 separation of application code and, 505 state machine replication, 349, 452 statement-based replication, 158 statically typed languages analogy to schema-on-write, 40 code generation and, 127 statistical and numerical algorithms, 428 StatsD (metrics aggregator), 442 stdin, stdout, 395, 396 Stellar (cryptocurrency), 532 Index | 585 stock market feeds, 442 STONITH (Shoot The Other Node In The Head), 158 stop-the-world (see garbage collection) storage composing data storage technologies, 499-504 diversity of, in MapReduce, 415 Storage Area Network (SAN), 146, 398 storage engines, 69-104 column-oriented, 95-101 column compression, 97-99 defined, 96 distinction between column families and, 99 Parquet, 96, 131 sort order in, 99-100 writing to, 101 comparing requirements for transaction processing and analytics, 90-96 in-memory storage, 88 durability, 227 row-oriented, 70-90 B-trees, 79-83 comparing B-trees and LSM-trees, 83-85 defined, 96 log-structured, 72-79 stored procedures, 161, 253-255, 557 and total order broadcast, 349 pros and cons of, 255 similarity to stream processors, 505 Storm (stream processor), 466 distributed RPC, 468, 514 Trident state handling, 478 straggler events, 470, 498 stream processing, 464-481, 557 accessing external services within job, 474, 477, 478, 517 combining with batch processing lambda architecture, 497 unifying technologies, 498 comparison to batch processing, 464 complex event processing (CEP), 465 fault tolerance, 476-479 atomic commit, 477 idempotence, 478 microbatching and checkpointing, 477 rebuilding state after a failure, 478 for data integration, 494-498 586 | Index maintaining derived state, 495 maintenance of materialized views, 467 messaging systems (see messaging systems) reasoning about time, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 types of windows, 472 relation to databases (see streams) relation to services, 508 search on streams, 467 single-threaded execution, 448, 463 stream analytics, 466 stream joins, 472-476 stream-stream join, 473 stream-table join, 473 table-table join, 474 time-dependence of, 475 streams, 440-451 end-to-end, pushing events to clients, 512 messaging systems (see messaging systems) processing (see stream processing) relation to databases, 451-464 (see also changelogs) API support for change streams, 456 change data capture, 454-457 derivative of state by time, 460 event sourcing, 457-459 keeping systems in sync, 452-453 philosophy of immutable events, 459-464 topics, 440 strict serializability, 329 strong consistency (see linearizability) strong one-copy serializability, 329 subjects, predicates, and objects (in triplestores), 55 subscribers (message streams), 440 (see also consumers) supercomputers, 275 surveillance, 537 (see also privacy) Swagger (service definition format), 133 swapping to disk (see virtual memory) synchronous networks, 285, 557 comparison to asynchronous networks, 284 formal model, 307 synchronous replication, 154, 557 chain replication, 155 conflict detection, 172 system models, 300, 306-310 assumptions in, 528 correctness of algorithms, 308 mapping to the real world, 309 safety and liveness, 308 systems of record, 386, 557 change data capture, 454, 491 treating event log as, 460 systems thinking, 536 T t-digest (algorithm), 16 table-table joins, 474 Tableau (data visualization software), 416 tail (Unix tool), 447 tail vertex (property graphs), 51 Tajo (query engine), 93 Tandem NonStop SQL (database), 200 TCP (Transmission Control Protocol), 277 comparison to circuit switching, 285 comparison to UDP, 283 connection failures, 280 flow control, 282, 441 packet checksums, 306, 519, 529 reliability and duplicate suppression, 517 retransmission timeouts, 284 use for transaction sessions, 229 telemetry (see monitoring) Teradata (database), 93, 200 term-partitioned indexes, 208, 217 termination (consensus), 365 Terrapin (database), 413 Tez (dataflow engine), 421-423 fault tolerance, 422 support by higher-level tools, 427 thrashing (out of memory), 297 threads (concurrency) actor model, 138, 468 (see also message-passing) atomic operations, 223 background threads, 73, 85 execution pauses, 286, 296-298 memory barriers, 338 preemption, 298 single (see single-threaded execution) three-phase commit, 359 Thrift (data format), 117-121 BinaryProtocol, 118 CompactProtocol, 119 field tags and schema evolution, 120 throughput, 13, 390 TIBCO, 137 Enterprise Message Service, 444 StreamBase (stream analytics), 466 time concurrency and, 187 cross-channel timing dependencies, 331 in distributed systems, 287-299 (see also clocks) clock synchronization and accuracy, 289 relying on synchronized clocks, 291-295 process pauses, 295-299 reasoning about, in stream processors, 468-472 event time versus processing time, 469, 477, 498 knowing when window is ready, 470 timestamp of events, 471 types of windows, 472 system models for distributed systems, 307 time-dependence in stream joins, 475 time-of-day clocks, 288 timeliness, 524 coordination-avoiding data systems, 528 correctness of dataflow systems, 525 timeouts, 279, 557 dynamic configuration of, 284 for failover, 158 length of, 281 timestamps, 343 assigning to events in stream processing, 471 for read-after-write consistency, 163 for transaction ordering, 295 insufficiency for enforcing constraints, 347 key range partitioning by, 203 Lamport, 345 logical, 494 ordering events, 291, 345 Titan (database), 50 tombstones, 74, 191, 456 topics (messaging), 137, 440 total order, 341, 557 limits of, 493 sequence numbers or timestamps, 344 total order broadcast, 348-352, 493, 522 consensus algorithms and, 366-368 Index | 587 implementation in ZooKeeper and etcd, 370 implementing with linearizable storage, 351 using, 349 using to implement linearizable storage, 350 tracking behavioral data, 536 (see also privacy) transaction coordinator (see coordinator) transaction manager (see coordinator) transaction processing, 28, 90-95 comparison to analytics, 91 comparison to data warehousing, 93 transactions, 221-267, 558 ACID properties of, 223 atomicity, 223 consistency, 224 durability, 226 isolation, 225 compensating (see compensating transac‐ tions) concept of, 222 distributed transactions, 352-364 avoiding, 492, 502, 521-528 failure amplification, 364, 495 in doubt/uncertain status, 358, 362 two-phase commit, 354-359 use of, 360-361 XA transactions, 361-364 OLTP versus analytics queries, 411 purpose of, 222 serializability, 251-266 actual serial execution, 252-256 pessimistic versus optimistic concur‐ rency control, 261 serializable snapshot isolation (SSI), 261-266 two-phase locking (2PL), 257-261 single-object and multi-object, 228-232 handling errors and aborts, 231 need for multi-object transactions, 231 single-object writes, 230 snapshot isolation (see snapshots) weak isolation levels, 233-251 preventing lost updates, 242-246 read committed, 234-238 transitive closure (graph algorithm), 424 trie (data structure), 88 triggers (databases), 161, 441 implementing change data capture, 455 implementing replication, 161 588 | Index triple-stores, 55-59 SPARQL query language, 59 tumbling windows (stream processing), 472 (see also windows) in microbatching, 477 tuple spaces (programming model), 507 Turtle (RDF data format), 56 Twitter constructing home timelines (example), 11, 462, 474, 511 DistributedLog (event log), 448 Finagle (RPC framework), 135 Snowflake (sequence number generator), 294 Summingbird (processing library), 497 two-phase commit (2PC), 353, 355-359, 558 confusion with two-phase locking, 356 coordinator failure, 358 coordinator recovery, 363 how it works, 357 issues in practice, 363 performance cost, 360 transactions holding locks, 362 two-phase locking (2PL), 257-261, 329, 558 confusion with two-phase commit, 356 index-range locks, 260 performance of, 258 type checking, dynamic versus static, 40 U UDP (User Datagram Protocol) comparison to TCP, 283 multicast, 442 unbounded datasets, 439, 558 (see also streams) unbounded delays, 558 in networks, 282 process pauses, 296 unbundling databases, 499-515 composing data storage technologies, 499-504 federation versus unbundling, 501 need for high-level language, 503 designing applications around dataflow, 504-509 observing derived state, 509-515 materialized views and caching, 510 multi-partition data processing, 514 pushing state changes to clients, 512 uncertain (transaction status) (see in doubt) uniform consensus, 365 (see also consensus) uniform interfaces, 395 union type (in Avro), 125 uniq (Unix tool), 392 uniqueness constraints asynchronously checked, 526 requiring consensus, 521 requiring linearizability, 330 uniqueness in log-based messaging, 522 Unix philosophy, 394-397 command-line batch processing, 391-394 Unix pipes versus dataflow engines, 423 comparison to Hadoop, 413-414 comparison to relational databases, 499, 501 comparison to stream processing, 464 composability and uniform interfaces, 395 loose coupling, 396 pipes, 394 relation to Hadoop, 499 UPDATE statement (SQL), 40 updates preventing lost updates, 242-246 atomic write operations, 243 automatically detecting lost updates, 245 compare-and-set operations, 245 conflict resolution and replication, 246 using explicit locking, 244 preventing write skew, 246-251 V validity (consensus), 365 vBuckets (partitioning), 199 vector clocks, 191 (see also version vectors) vectorized processing, 99, 428 verification, 528-533 avoiding blind trust, 530 culture of, 530 designing for auditability, 531 end-to-end integrity checks, 531 tools for auditable data systems, 532 version control systems, reliance on immutable data, 463 version vectors, 177, 191 capturing causal dependencies, 343 versus vector clocks, 191 Vertica (database), 93 handling writes, 101 replicas using different sort orders, 100 vertical scaling (see scaling up) vertices (in graphs), 49 property graph model, 50 Viewstamped Replication (consensus algo‐ rithm), 366 view number, 368 virtual machines, 146 (see also cloud computing) context switches, 297 network performance, 282 noisy neighbors, 284 reliability in cloud services, 8 virtualized clocks in, 290 virtual memory process pauses due to page faults, 14, 297 versus memory management by databases, 89 VisiCalc (spreadsheets), 504 vnodes (partitioning), 199 Voice over IP (VoIP), 283 Voldemort (database) building read-only stores in batch processes, 413 hash partitioning, 203-204, 211 leaderless replication, 177 multi-datacenter support, 184 rebalancing, 213 reliance on read repair, 179 sloppy quorums, 184 VoltDB (database) cross-partition serializability, 256 deterministic stored procedures, 255 in-memory storage, 89 output streams, 456 secondary indexes, 207 serial execution of transactions, 253 statement-based replication, 159, 479 transactions in stream processing, 477 W WAL (write-ahead log), 82 web services (see services) Web Services Description Language (WSDL), 133 webhooks, 443 webMethods (messaging), 137 WebSocket (protocol), 512 Index | 589 windows (stream processing), 466, 468-472 infinite windows for changelogs, 467, 474 knowing when all events have arrived, 470 stream joins within a window, 473 types of windows, 472 winners (conflict resolution), 173 WITH RECURSIVE syntax (SQL), 54 workflows (MapReduce), 402 outputs, 411-414 key-value stores, 412 search indexes, 411 with map-side joins, 410 working set, 393 write amplification, 84 write path (derived data), 509 write skew (transaction isolation), 246-251 characterizing, 246-251, 262 examples of, 247, 249 materializing conflicts, 251 occurrence in practice, 529 phantoms, 250 preventing in snapshot isolation, 262-265 in two-phase locking, 259-261 options for, 248 write-ahead log (WAL), 82, 159 writes (database) atomic write operations, 243 detecting writes affecting prior reads, 264 preventing dirty writes with read commit‐ ted, 235 WS-* framework, 133 (see also services) WS-AtomicTransaction (2PC), 355 590 | Index X XA transactions, 355, 361-364 heuristic decisions, 363 limitations of, 363 xargs (Unix tool), 392, 396 XML binary variants, 115 encoding RDF data, 57 for application data, issues with, 114 in relational databases, 30, 41 XSL/XPath, 45 Y Yahoo!
The End of Secrecy: The Rise and Fall of WikiLeaks
by
The "Guardian"
,
David Leigh
and
Luke Harding
Published 1 Feb 2011
A few weeks later, in August 2007, a Swedish Tor expert, Dan Egerstad, told Wired magazine that he had confirmed it was possible to harvest documents, email contents, user names and passwords for various diplomats and organisations by operating a volunteer Tor “exit” node. This was the final server at the edge of the Tor system through which documents without end-to-end encryption were bounced before emerging. The magazine reported that Egerstad “found accounts belonging to the foreign ministry of Iran, the UK’s visa office in Nepal and the Defence Research and Development Organisation in India’s Ministry of Defence. In addition, Egerstad was able to read correspondence belonging to the Indian ambassador to China, various politicians in Hong Kong, workers in the Dalai Lama’s liaison office and several human rights groups in Hong Kong.
Hacking Exposed: Network Security Secrets and Solutions
by
Stuart McClure
,
Joel Scambray
and
George Kurtz
Published 15 Feb 2001
Unfortunately, the first version runs only on Windows, but the technical underpinnings look sound enough to provide a central point from which to scan a network for promiscuous mode interfaces. In addition to AntiSniff, sentinel (http://www.packetfactory.net/ Projects/Sentinel/) can be run from a UNIX system and has advanced network-based promiscuous mode detection features. Encryption (SSH, IPSec) The long-term solution to network eavesdropping is encryption. Only if end-to-end encryption is employed can near-complete confidence in the integrity of communication be achieved. Encryption key length should be determined based on the amount of time the data remains sensitive—shorter encryption key lengths (40 bits) are permissible for encrypting data streams that contain rapidly outdated data and will also boost performance.
No Is Not Enough: Resisting Trump’s Shock Politics and Winning the World We Need
by
Naomi Klein
Published 12 Jun 2017
In the immediate aftermath of the Westminster terror attacks in London in March 2017, when a driver plowed into a crowd of pedestrians, deliberately killing four people and injuring dozens more, the Conservative government wasted no time declaring that any expectation of privacy in digital communications was now a threat to national security. Home Secretary Amber Rudd went on the BBC and declared the end-to-end encryption provided by programs like WhatsApp to be “completely unacceptable.” And she said that they were meeting with the large tech firms “to ask them to work with us” on providing backdoor access to these platforms. In France in 2015, after the coordinated attacks in Paris that killed 130 people, the government of François Hollande declared a “state of emergency” that banned political protests.
An Ugly Truth: Inside Facebook's Battle for Domination
by
Sheera Frenkel
and
Cecilia Kang
Published 12 Jul 2021
Ahuja’s team searched email and phone records of Facebook employees to see who might have been in contact with Nuñez. The group could easily view messages written on Facebook, but few employees were likely to have been naïve enough to message a member of the press from their own Facebook pages. They might, however, have used WhatsApp. Those messages were end-to-end encrypted, so the investigations team could not see the contents of messages, but they could retrieve data, such as which two numbers had messaged each other and when the correspondence had taken place. The team had other tools as well. Through the location permissions on phone apps, Ahuja and her team had a fairly accurate record of where Facebook employees traveled.
Digital Empires: The Global Battle to Regulate Technology
by
Anu Bradford
Published 25 Sep 2023
Alan Rozenshtein—a law professor who formerly served in the US Department of Justice National Security Division—has argued that tech companies have both financial and ideological incentives to resist government surveillance.158 They often choose minimal compliance and aggressive litigation to resist government requests for user data. These companies also use product design such as end-to-end encryption or offshore data storage as a way to make surveillance harder, in addition to rallying public opinion against surveillance. In its 2022 Comprehensive Cyber Review, the US Department of Justice affirmed the claim that powerful tech companies have on several occasions resisted government requests for assistance.159 When the Obama administration sought US tech companies’ collaboration on cyber defense, the Chamber of Commerce and other business associations blocked the proposed cybersecurity bill, criticizing it for being too interventionist and hence “un-American” in that it would have mandated information sharing about cyber hacks or even joint defense strategies that are “antithetical to free-market capitalist ideals.”160 In another example, Apple engaged in a high-profile court fight with the FBI in 2016 when the company challenged a court order compelling it to help unlock the iPhone of one of the perpetrators involved in the 2015 terrorist attack in San Bernardino, California, that left fourteen people dead and twenty-two seriously injured.161 While condemning the terrorist attacks, Apple CEO Tim Cook published an open letter on the company’s website defending Apple’s decision not to unlock the phone and “hack our own users,” describing the FBI’s request as “undermin[ing] the very freedoms and liberty our government is meant to protect.”162 These are examples of tech companies’ practices that constrain, as opposed to simply enable, government’s surveillance programs.
…
The killing followed harassment on social media, which helped the perpetrator to identify and to target Paty in an act of domestic terrorism.150 Critics have also pointed out how these platforms have been deployed to recruit terrorists or to plot terrorist attacks. For example, Meta’s WhatsApp platform was used to plan the London Bridge Terror Attack that took place in June 2017 and that led to the killing of four people in Westminster.151 The attack was planned between three perpetrators on a WhatsApp group across several weeks. WhatsApp uses end-to-end encryption to hide the users’ messages even from the company itself. Amber Rudd, the UK Home Secretary at the time, called upon technology firms to open up encrypted data for high-risk individuals or for messages that trigger a terrorism alert.152 Still today, the police are unable to access the group chat.153 Even when the platforms try, they often fail to adequately remove the harmful content.
No Filter: The Inside Story of Instagram
by
Sarah Frier
Published 13 Apr 2020
In early 2014 he had a sushi dinner with Jan Koum, WhatsApp’s CEO, at Nihon Whisky Lounge in San Francisco. Systrom helped reassure him that Facebook was a good partner, unlikely to ruin what made WhatsApp special. Koum was notoriously untrusting, after growing up under surveillance by the USSR in Ukraine. He built an app that was end-to-end encrypted, so the records of what people were saying to each other weren’t readable by anyone—not the police, and not even his company. He promised his users “no ads, no games, no gimmicks,” just a simple tool they could pay $1 a year to use. It would be a stretch to join Facebook, where surveillance of users powered the advertising engine.
The Great Firewall of China
by
James Griffiths;
Published 15 Jan 2018
Glenn Greenwald, who helped break the NSA spying story, almost missed out on it due to his lack of a secure email account and initial unwillingness to use the clunky PGP encryption method to communicate with Snowden.39 Telegram, along with a similar app, Signal, which was endorsed by Snowden himself, helped make encrypted communications easy and straightforward.40 Increased competition from these apps in turn forced larger tech companies to adopt similar security protocols, with Facebook-owned WhatsApp and Microsoft-owned Skype both adopting end-to-end encryption for fear of losing market share. Encryption doesn’t only hamper spies; it can also help bypass filtering, making DPI impossible and forcing censors to either block all traffic from the app or allow it through. China swiftly banned Telegram, with an editorial in the People’s Daily saying it had been used by human rights lawyers to coordinate “attacks on the party and government”.41 Chinese dissidents weren’t the only ones using the platform, however, and if the timing of the Snowden revelations helped Telegram, another geopolitical development proved almost fatal.
Super Pumped: The Battle for Uber
by
Mike Isaac
Published 2 Sep 2019
One Lyft executive grew so paranoid about being followed by Uber that he walked out onto his porch, lifted both middle fingers in the air and waved them around, sending a message to the spies he was absolutely sure were watching. Internal communications within SSG were carried out over an enterprise version of an app called Wickr. Because of its architecture, Wickr end-to-end encrypted every message, meaning only the sender and recipient would be able to read them. All messages were automatically deleted after a certain period of time, undermining any future legal discovery. Craig Clark and Sullivan, both licensed lawyers, would often designate documents as attorney-client privileged, another safeguard against potential legal threats.
The Chaos Machine: The Inside Story of How Social Media Rewired Our Minds and Our World
by
Max Fisher
Published 5 Sep 2022
The Facebook-owned messaging app enables rapid-fire communication, akin to group text messaging for hundreds of people at once, with some viral-friendly twists. Users can forward content from one group to another, enabling posts to spread exponentially. A large WhatsApp group can resemble a mishmash of Facebook, Twitter, and YouTube, filled by viral content copied in from all three. WhatsApp sells itself especially on privacy: end-to-end encryption keeps out prying authorities. There are no fact-checkers or moderators. The digital researchers joined some of the groups. It wasn’t difficult; group names were posted on Facebook hate pages, which operated as openly as newspapers. In one viral WhatsApp video, a man dressed as a monk yelled, “The knife at home is no longer to cut jackfruit.
The Future of the Internet: And How to Stop It
by
Jonathan Zittrain
Published 27 May 2009
id_article=15013#2 (last visited May 15, 2007) (providing an overview of different technologies that can be used to avoid censorship); Anonymizer: Free Web Proxy, Free Anonymizers and the List of Web Anonymizers List, http://www.freeproxy.ru/en/free_proxy/cgi-proxy.htm (last visited May 15, 2007). For some skepticism that users can circumvent network neutrality restrictions, see William H. Lehr et al., Scenarios for the Network Neutrality Arms Race, 1 INT’LJ. COMMC’NS 607 (2007) (describing “technical and non-technical countermeasures” ranging from letter-writing campaigns to end-to-end encryption that prevents an ISP from discerning the activity in which a user is engaging). 21. See Skype, http://skype.com (last visited May 15, 2007); Wikipedia, Skype, http://en.wikipedia.org/wiki/Skype (as of May 15, 2007, 17:45 GMT). 22. Notably, the Nintendo Wii has been configured in this manner.
Active Measures: The Secret History of Disinformation and Political Warfare
by
Thomas Rid
Journalists and opinion leaders were now more willing than ever to embrace anonymous leaks without spending too much time on checking their provenance or veracity. By mid-2014, major magazines and newspapers, including The New Yorker and The Guardian, were competing with activist websites and encouraging anonymous submissions by mail or dedicated end-to-end encrypted submission portals with fortified anonymity.10 Yet the leaks could also be a problem for journalists, especially Snowden’s material. It was often exceedingly difficult to assess leaked documents on their own merits, and checking secret facts was sometimes impossible. Even the most dogged and well-connected investigative journalist would have a hard time telling whether a specific leak was the outcome of an active measure or of genuine whistle-blowing.
Spies, Lies, and Algorithms: The History and Future of American Intelligence
by
Amy B. Zegart
Published 6 Nov 2021
After Edward Snowden illegally revealed these highly classified programs in 2013, Apple, Google, Yahoo, and other companies raced to develop “strong crypto.”137 Companies saw both commercial and privacy benefits to better encryption.138 New iPhones and Android devices suddenly had default encryption linked to passwords. Yahoo encrypted messages in Yahoo mail. Instant messaging apps like WhatsApp and Signal began using end-to-end encryption.139 By 2015, when Syed Rizwan Farook and his wife shot and killed fourteen people in San Bernardino, the FBI couldn’t get into his phone. It was an iPhone 5C locked with a passcode that would delete all locally stored data after ten unsuccessful attempts to unlock it.140 The FBI demanded that Apple write new software to undermine its own security features and unlock Farook’s phone.
The Quiet Coup: Neoliberalism and the Looting of America
by
Mehrsa Baradaran
Published 7 May 2024
The explosion of cryptocurrencies and the so-called blockchain revolution began with the launch of bitcoin, which began as a reaction to the bank bailouts. In 2009, Satoshi Nakamoto—a pseudonym for the actual creator—laid out the code and logic of bitcoin. Bitcoin was a new form of currency, “mined” by computer algorithm through end-to-end encryption that would be a fixed unit of exchange existing on the blockchain. Embedded in the genesis block of code upon which all bitcoin could be “mined” was a single line of text: “The Times 03/Jan/2009 Chancellor on brink of second bailout for banks.” Nakamoto’s pitch was that the supply of bitcoin, unlike the U.S. dollar, would be finite; bitcoin’s code makes it impossible to “mine” more bitcoin than what was initially written into the code.
The Debian Administrator's Handbook, Debian Wheezy From Discovery to Mastery
by
Raphaal Hertzog
and
Roland Mas
Published 24 Dec 2013
This system is more centered around the concept of channels, the name of which starts with a hash sign #. Each channel is usually targeted at a specific topic and any number of people can join a channel to discuss it (but users can still have one-to-one private conversations if needed). The IRC protocol is older, and does not allow end-to-end encryption of the messages; it is still possible to encrypt the communications between the users and the server by tunneling the IRC protocol inside SSL. IRC clients are a bit more complex, and they usually provide many features that are of limited use in a corporate environment. For instance, channel “operators” are users endowed with the ability to kick other users from a channel, or even ban them permanently, when the normal discussion is disrupted.
Thank You for Being Late: An Optimist's Guide to Thriving in the Age of Accelerations
by
Thomas L. Friedman
Published 22 Nov 2016
CNN.com reported on December 17, 2015, in the wake of the Paris jihadist suicide attacks that “investigators of the Paris attacks have found evidence they believe shows some of the terrorists used encrypted apps to hide plotting for the attacks … Among the apps officials found used by the terrorists were WhatsApp and Telegram, both of which boast of end-to-end encryption that protects the privacy of their users and are difficult to decrypt.” And then there was the celebrated case in April 2016, when the FBI demanded that Apple give it the keys to a cyberlocker in the iPhone used by Syed Rizwan Farook, the gunman in the December 2, 2015, shooting in San Bernardino, California, that killed fourteen people.
The Art of UNIX Programming
by
Eric S. Raymond
Published 22 Sep 2003
A project website provides access to standards and open-source implementations in several languages. BEEP has features to support both client-server and peer-to-peer modes. The authors designed the BEEP protocol and support library so that picking the right options abstracts away messy issues like data encoding, flow control, congestion-handling, support of end-to-end encryption, and assembling a large response composed of multiple transmissions, Internally, BEEP peers exchange sequences of self-describing binary packets not unlike chunk types in PNG. The design is tuned more for economy and less for transparency than the classical Internet protocols or HTTP, and might be a better choice when data volumes are large.
Building Secure and Reliable Systems: Best Practices for Designing, Implementing, and Maintaining Systems
by
Heather Adkins
,
Betsy Beyer
,
Paul Blankinship
,
Ana Oprea
,
Piotr Lewandowski
and
Adam Stubblefield
Published 29 Mar 2020
Features that are no longer safe to use should be explicitly disabled. For example, adding an offline mode to an online collaboration application can preserve core functionality despite temporary loss of online storage, the ability to show updates from others, or integration with chat features. In a chat application with end-to-end encryption, users might occasionally change their encryption key used for protecting communications. Such an application would keep all previous communications accessible, because their authenticity is not affected by this change. In contrast, an example of a poor design would be a situation where the entire GUI becomes unresponsive because one of its RPCs to a backend has timed out.
Exim: The Mail Transfer Agent
by
Philip Hazel
Published 7 Jul 2001
Encrypted SMTP Connections RFC 2487 defines how SMTP connections can be set up so that the data that passes between two hosts is encrypted in transit. Once a connection is established, the client issues a STARTTLS command. If the server accepts this, they negotiate an encryption mechanism to be used for all subsequent data transfers. Note that this provides security only when data is in transit between two hosts. It does not provide end-to-end encryption from the original sender to the final mailbox. 9 October 2001 09:11 368 Chapter 15: Authentication, Encryption, and Other SMTP Processing Support for Transport Layer Security (TLS), otherwise known as Secure Sockets Layer (SSL), is implemented in Exim by making use of the OpenSSL library.* There is no cryptographic code in the Exim distribution itself.
The Codebreakers: The Comprehensive History of Secret Communication From Ancient Times to the Internet
by
David Kahn
Published 1 Feb 1963
This is part of a more basic Air Force aim of a communications complex that will “provide full protection for information flowing within Air Force communications channels, including the exclusion of unauthorized entry into the systems. This goal will be approached, first, by providing COMSEC protection to each of the individual communications networks and later by providing total end-to-end encryption throughout the complex.” The Air Force drive toward total end-to-end encipherment carries with it a tendency toward a single all-purpose cipher, for such encipherment can most easily and most safely be applied by such a cipher. A single all-purpose cipher, simple enough for the lowest echelon, secure enough for the highest, variable enough to nullify the danger of capture or compromise, would eliminate or reduce many of the problems produced by the present multiplicity of systems—the need sometimes to reencipher a message in a system the ultimate recipient holds, the difficulties of storing, distributing, and accounting for half a dozen different sets of ciphers instead of for just one.