Hey, it’s HighScalability time!


The top 10,000 most spoken words in English represented by a point in hundreds of dimensions where the distance and direction between points encodes the relationship between words. (roadmaps)

Do you like this sort of Stuff? Without your support on Patreon this kind of Stuff can’t happen. You are that important to the fate of the intelligent world.

Know someone who wants to understand the cloud? I wrote Explain the Cloud Like I’m 10 just for them. On Amazon it has 100 (!!!) mostly 5 star reviews. Here’s a recent authentic unfaked review:

Number Stuff: 

  • 1/3: of well designed features at Microsoft deliver value.
  • 93: of the 100 most watch shows in 2019 were live sports events. Twenty years ago that number was more like 20-25.
  • 14,500: containers in Riot-operated regions alone. 
  • <20: grams of CO2 used when watching 30 minutes of Netflix, which is like driving a car 160 meters. 
  • <1%: blockchain adoption rate.
  • 30-60 billion: records loaded by NASDAQ every night.
  • 500%: increase in requests to Google from state and federal law enforcement authorities. 
  • 20%: higher CTR for personalized results. CTR is 74% higher results when the evidence that a search is personal was strong and personalized results were moved to the top of the list. 
  • 2.7K: number of websites in 1997. Lycos search engine 54k documents.
  • $10k: enterprise companies are spending an average of more than $10,000 per employee per year on software. 
  • 1: number of starships Elon Musk wants to build a week.
  • 2 exaflops: El Capitan supercomputer will use AMD CPUs & GPUs. 
  • 14 million: Uber trips per day.
  • 84%: of Cloud Native Computing Foundation survey respondents use containers in production, up 15%. 78% use k8s. Over 10% of users have more than 50 production clusters. 
  • 170,000: years it takes for a photon to travel from the sun’s core to earth.
  • $252 million: donated to UC Berkely to build a…datacenter…called the Data Hub.
  • $3.92 million: average cost of a cloud breach for a company.
  • 1.7 million: lines of code in the old Facebook Messenger. Now it’s 360,000.
  • ~2 million: QuickBook searches handled per day on Elasticsearch. ~50 million business transactions. 10 stateless application servers fronted by a Load Balancer fronted by Gateway.

Quote Stuff:

  • spectramax: Just curious – what’s wrong with buying a large instance (24 cores) and running it for < 10,000 users? Kubernetes feels like an insane complexity that doesn’t need to be taken on and managed. You’re gonna spend more time managing Kubernetes than writing actual software. Also, it feels like if something goes wrong in prod with your cluster – you’re gonna need external help to get you back on the feet. If you’re not going to build the next Facebook, why would you need so much complexity?
  • zie: tldr; If you can’t afford 6+ full-time people to babysit k8s, you shouldn’t be using it.
  • @ryan: People: Wes just threw >200k requests at a @Begin AWS app in a few mins; responded in ms, never missed a beat. (We had no idea it was coming and didn’t optimize.) Just something to think about when anyone talks about whether serverless apps are fast, scalable, or easy to create.
  • @amandaksilver: An asynchronous callback is the programming equivalent of when a toaster bell goes off. It means the toast is ready for your attention. When you’re ready, you can stop what you were doing and butter the toast.
  • Elon Musk: If you have a high production rate, you have a high iteration rate. For pretty much any technology whatsoever, the progress is a function of how many iterations do you have, and how much progress do you make between each iteration. If you have a high production rate then you have many iterations. You can make progress from one to the next…I’ll probably be long dead before Mars becomes self-sustaining, but I’d like to at least be around to see a bunch of ships land on Mars.
  • @swyx: My three key realizations of #serverless: – Sooner or later you’re going to scale horizontally. – It’s easier & cheaper & more reliable to horizontally scale small fn’s (“cattle”) than big (“pets”) – But CAP theorem! so a low latency, constant-time store like @DynamoDB is impt.
  • Vanalli: Almost All Crypto Projects Are Complete Bullshit and Will Fail
  • Sid: Azure and Digital Ocean don’t charge for the [Managed Kubernetes] compute resources used for the control plane, making AKS and DO the cheapest for running many, smaller clusters. For running fewer, larger clusters GKE is the most affordable option. Also, running on spot/preemptible/low-priority nodes or long-term committed nodes makes a massive impact across all of the platforms.
  • Nicholas A. Christakis: Skinner’s objective in this work of fiction was not to advocate for these practices (though many of the practices in Walden Two were similar to extant communes’), but rather to advance a triumphalist belief that behavioral science could be collectively applied by ordinary people themselves to improve their lives. As Frazier observes: “The main thing is, we encourage our people to view every habit and custom with an eye to possible improvement. A constantly experimental attitude toward everything—that’s all we need.” 
  • @mims: Natural gas power plant is using 14 megawatts a day to mine bitcoin in upstate New York. That’s enough power for 11,000 homes. Is this really the best use of fossil fuels that were probably fracked out of the ground?
  • @martinkl: In 1665, the University of Cambridge temporarily closed due to the bubonic plague. Isaac Newton had to work from home, and he used this time to develop calculus and the theory of gravity.
  • @etherealmind: Big claims. I’m very interested. “In a conference yesterday, Elon Musk said SpaceX’s Starlink satellite broadband will have latency below 20 milliseconds”
  • Cosmo~ I called Netflix and it was so easy,” he chuckles. “They said, ‘What’s your name?’ and I said, ‘Todd [Redacted],’ gave them his e-mail, and they said, ‘Alright your password is 12345,’ and I was signed in. I saw the last four digits of his credit card. That’s when I filled out the Windows Live password-reset form, which just required the first name and last name of the credit card holder, the last four digits, and the expiration date.
  • Daniel Lemire: What is clear, however, is that creating a thread may cost thousands of CPU cycles. If you have a cheap function that requires only hundreds of cycles, it is almost surely wasteful to create a thread to execute it. The overhead alone is going to set you back.
  • @arungupta: 84% are using containers in production, 78% are using #Kubernetes in production, Amazon EKS is leading the pack, says the latest @CloudNativeFdn  survey results 
  • kmod: I worked on the design of Dropbox’s exabyte-scale storage system, and from that experience I can say that these numbers are all extremely optimistic, even with their “you can do it cheaper if you only target 95% uptime” caveat. Networking is much more expensive, labor is much more expensive, space is much more expensive, depreciation is faster than they say, etc etc. I don’t think the authors have ever done any actual hardware provisioning before. I didn’t read all their math but I expect their final result to be off by a factor of 2-5x. Hard drives are a surprisingly low percentage of the cost of a storage system.
  • 3Blue1Brown: If people are sufficiently worried, then there’s a lot less to worry about. But if no one is worried – that’s when you should worry
  • Xiao: My company started to use an on prem version of K8s shortly after I was hired and it’s been challenging but rewarding too. I think we hit the sweet spot with our company size. We have tons of people managing it, but my company is too short term thinking, so we don’t have as much tooling as devs may like. 
  • Catalin Cimpanu: According to Microsoft, during a recent 58-day investigation, its engineers tracked one single Necurs-infected computer sending out more than 3.8 million emails to more than 40.6 million victims. The emails usually carry malware-laced attachments, but the Necurs is also used to spread pump-and-dump stock scams, fake pharmaceutical spam email and “Russian bride” dating scams.
  • @benedictevans: Zoom, Shopify and Stripe are three fun examples of companies that were obviously impossible because the ‘tech giants can easily expand into new areas and squash competition’. And Zoom wasn’t even doing anything ‘new’. Combined value of $123bn.
  • @podcastnotes: All physical technology follows a similar trend. Over time, its size drops from: Large —> Medium Medium —> Small Small —> Invisible. Take AirPods, which are now in the “small” category; one day they’ll be invisible. Notes from @BrianNorgard’s chat with @APompliano
  • Carsten Murawsk: A lot of problems we face in life, be it business, finance, including logistics, container ship loading, aircraft loading — these are all knapsack problems. From a practical perspective, the knapsack problem is ubiquitous in everyday life.
  • Noah Stephens-Davidowitz: We think you could cover the entire Earth with processors and run them until the heat death of the universe and still fail to solve relatively small instances of appropriate versions of these [knapsack] problems
  • moystard: My company has always been highly dependent on the Amazon ecosystem. When we made the move to a microservice architecture three years ago, we opted for Amazon ECS as it was the simplest way to achieve container orchestration. With new business constraints, we have to migrate from Amazon to a different cloud provider, breaking away from the Amazon ecosystem, including ECS. We are still a relatively small company and cannot afford to spend months on the infrastructure instead of focusing on delivering on the business front.
  • @JoshuaKerievsky: We dropped storypointing, velocity calculations and fixed-length time boxes way back in 2007. By 2010, we stopped teaching clients that stuff too. It’s been options, slices, flow, storymaps and forecasting ever since.
  • @gregpmiller: The CIA couldn’t stand NSA “dithering” and dismal tradecraft. The NSA thought the CIA was reckless, more likely to “ask forgiveness than permission” to ratchet up risk. The German BND marveled at how much time both sides spent in  bureaucratic pissing match.
  • @_jesus_rafael: 1. Stop hiring SRE’s to fix CI/CD. 2. Stop telling SRE’s that performance, on-call, reliability work are not a priority. 3. Stop asking SRE candidates to do coding, algorithms, and math on white-board in front of ppl. 4. Stop thinking you’re Google, check yourself!
  • @cloudfront: We recently made changes that reduced the time to ~5 min consistently. We’re not done yet as this continues to be a priority for us this year. When we say a config is deployed we reference p100 metrics; however, most edge locations are updated in seconds.
  • alfalfacat: Seems like lots of people are confused about ARM vs Apple. Smartphone-class chips are every different from serve-class chips. A server-class chip can be thought of as a small datacenter: there are a bunch of independent cores each with their own small slice of memory, linked into a network node. This network connects cores with other cores, memory controllers, PCIe, etc. Much of the complexity has to do with hooking all that stuff together.
  • wwkeyboard: Yeah, this is the kind of non value added activity that just beg to be outsourced to specialists. I have a friend who work in a bakery. I learned the other day that they outsourced a crucial activity to a contractor: handling their cleaning cloths. Everyday, a guy come to pick up a couple garbage bag full of dirty cleaning cloth, then dump the same number of bag full of cleans one. This is crucial: one day the guy was late, and the bakery staff had trouble keeping the bakery clean: the owner lived upstairs and used his own washing machine as a backup, but it could not handle the load. But the thing is: while the bakery need this service, it does not need it to differentiate itself. As long as the cloth are there, it can keep on running. If the guy stop cleaning cloth, he can be trivially replaced with another provider, with minimal impact on the bakery. After all, people don’t buy bread because of how the dirty cloth are handled. They buy bread because the bread is good. The bakery should never outsource his bread making. But the cleaning of dirty cloth? Yes, absolutely. To get back to Kubernetes, and virtualization : what does anyone hope to gain by doing it themselves? Maybe regulation need it. Maybe their is some special need. I am not saying it is never useful. But for many people, the answer is often: not much. Most customers will not care. They are here for their tasty bread, a.k.a. getting their problem solved. I would be tempted to go as far as saying that maybe you should outsource one level higher, and not even worry about Kubernetes at all: services like Heroku or Amazon Beanstalk handle the scaling and a lot of other concerns for you with a much simpler model. But at this point, you are tying yourself to a provider, and that come with its own set of problems… I guess it depends.
  • @Obdurodon: If your storage system didn’t suck so much, you might not need so many layers of caches.
  • @zachtratar: It seems Google Cloud doesn’t want individual developers on their platform. Huge mistake. With the new “Cluster Management Fee”, maintaining the smallest Kubernetes cluster possible costs around $300 per month.
  • Metcalfe: My mom used to say if you can’t say something nice about someone you should just hold your tongue. But Bachrach was a not nice person. Or is for that matter; he may still be around for all I know. He was sort of a grouchy physicist on another floor in another lab, and he read my memo. And this is a physicist. The phrase you didn’t catch in his now famous memo — this memo’s posted all over the world because everyone thinks it’s a hoot that he wrote this memo — he said the problem with Ethernet was that it was not quantum noise limited, which is a very physicist thing to say. Of course it’s not quantum noise limited! That wasn’t the point. But he resented the fact that we were not using every shred of bandwidth that was capable. I told you how we decided on the clock speed. There were gigabits per second of capacity on that cable, and we were using a pitiful few megabits of it. But he was concerned that it wasn’t… And he did a nasty thing, and it was sort of an early lesson in nastiness, which is he wrote a memo. He didn’t come see me. He didn’t say, “I just read your memo and I have a few questions about it”. He didn’t do that. He sent a memo to my boss’s boss accusing me of writing something that wasn’t new and it wasn’t quantum [noise limited], completely making an idiot of himself. But it didn’t feel that way at the time. It felt kind of nasty. The next thing I know I’m in my boss’s boss’s office having to explain how this guy’s an idiot. Which he is, or was. To put it more kindly, he was from a different universe. I was in a computer science universe, he was in a physics universe. We were solving different problems and he should have minded his own business. Or at least come talk to me before sending a memo to my boss’s boss, so maybe I could save him the embarrassment. There are many places where I go where that memo is on the wall, pinned to some cubicle — “the old Bachrach memo” — because it’s just such a great example. Now Bachrach, I ran into him a decade or two after that unfortunate memo, whereupon he was letting it be known that he’d helped invent Ethernet. How he got there I don’t quite know, but it had something to do with this memo being really instrumental in fixing some of the early problems with Ethernet, which is a complete hallucination.
  • ncmncm: Parallelism is specifically the stuff that actually does happen completely independently on all processing units, that actually goes Nx as fast on N units (clock depression aside). Concurrency refers to the overhead of coordinating activity of those units, that keeps you from getting your Nx. It is overhead on top of any actually serial parts of the computation, which Amdahl’s law addresses. In other words: Parallelism giveth, and concurrency taketh away.
  • AzN1337c0d3r: There’s also the problem of Turbo Boost. My laptop’s 9980HK will boost to ~4.5 GHz when only loaded to a single core. However, when I load up all 8 cores, it might only sustain ~3.5 GHz. Therefore the 8 cores might not actually result in the work being completed 8 times as fast, only 6.2x (8*[3.5/4.5]) real-time due to the lowered clock rate of each individual core
  • londons_explore: Every time you see a claim like this [connecting satellites to ordinary smartphones], remember the Shannon–Hartley theorem. If you want to get a decent bandwidth in aggregate for your mobile network, you’re going to need to consider: * Maximize transmit Power: which is hard because phones are battery powered, and power is limited by law. Satellites are also battery powered. * Maximize bandwidth: Also limited by rules and regulations. * Maximize Directionality: If you have physically large antennas, you can direct your transmit power towards the receiver to make best use of it. * Minimize Distance: Received signal power goes down the the square of he distance. Double the distance, and you get one quarter of the bandwidth, all other things being the same. This proposal uses tiny existing antennas in phones, which means it must be making other tradeoffs. My guess is it seriously sacrifices the total throughput.
  • Kenneth O. Stanley: Instead of judging every activity for its potential to succeed, we should judge our projects for their potential to spawn more projects. If we really behave as treasure hunters and stepping stone collectors, then the only important thing about a stepping stone is that it leads to more stepping stones, period. The worst stepping stone is one that leads nowhere beyond itself, no matter how nice it may feel to stand upon it for the moment. As treasure hunters, our interest is in collecting more stepping stones, not in reaching a particular destination. The more stepping stones we find, the more opportunities there are to depart to somewhere greater.
  • Brian Martin: In summary, there are many sources of telemetry to understand our systems performance. Sampling resolution is very important, otherwise you’re going to miss these small things that actually do matter. In-process summarization can reduce your cost of aggregation and storage. Instead of taking 60 times the amount of data to store secondly time series, you can just export maybe five percentiles. The savings becomes even more apparent when you are sampling at even higher rates, you might want to sample every hundred millisecond, or 10 times per second, or something like that. You might want to sample even faster than that for certain things. Really, it all goes back to what is the smallest thing I want to be able to capture. As you start multiplying how much data you need within a second, the savings just becomes even more. 
  • PragmaticPulp: I worked for a company that was forced to go all-remote due to problems with our office space. I thought it would prove to management that remote work was a net win for everyone. I expected both productivity and employee happiness to increase. I was wrong on all counts. The problem is that you can’t suddenly transition people to WFH and expect them to figure it out right away. This doesn’t resonate with 20-something developers who spend much of their home time glued to a computer in a quiet room, but it will resonate with the parents who have kids at home. In our case, productivity fell immensely in certain teams and departments. People scrambled to find quiet space at home where they could hide from young children. Others struggled to work efficiently with purely digital communication. A surprising number of people couldn’t handle being productive without the watchful eye of their manager. Some people tried to take vacations and respond with one-liner emails from their phones. Some of us were more productive, but it wasn’t anywhere near as uniform as I expected. I’m concerned that this sudden, forced WFH environment will hurt perceptions of remote work more than it helps.
  • Summer Child: Other, less blissful moments are also seared into my memory. I remember listening to Google leadership tell a crowded café of people that the most important problems to solve centered around increasing ad revenue or improving the developer experience at Google. Sitting in the endless traffic between San Francisco and Mountain View, in the fourth of a line of 10 Google-branded buses. Aching with loneliness as I looked around a building of 300 people and found that the only people who looked like me were cleaners and café workers.
  • tracker1: Moore’s law is about transistor density, which has already slowed way past what Moore’s Law (more like Moore’s Observation) noted, a doubling of density around every ~18 months (give or take 6)… Stacking may have allowed the trend to continue, but it doesn’t seem to work well. What chiplets offer is effectively more compact compute units designed to work together. What was effectively a component design for multi-cpu systems a couple decades ago happens in a single socket package. With some better design considerations. This allows for an optimization for loss… if you have a huge single chip, there’s more chance you lose a whole chip in a wafer, meaning higher manufacturing loss. With smaller chiplets, you may lose one smaller chiplets, but the other 4-6 around that one lost still works. This means better yeilds out of each wafer. The other advantage is that you can mix different manufacturing nodes… such as compute or graphics chiplets being latest and greatest with memory and bus interfaces on a prior generation. Again, this allows for greater production and reducing loss.
  • Rowland Manthorpe: Across London, the unregulated short-term rental industry has combined with the Airbnb boom to create a crisis. As Airbnb undertakes a review of all hosts and listings on its platform, the question it will be faced with is this: of its seven million listings, how many are run by companies that use fraudulent accounts, post fake listings and outsource the management of properties to call centres in the Philippines? In London alone, data scraped from Airbnb by City Hall shows that just one per cent of the capital’s hosts were behind 15 per cent of Airbnb listings. Such figures suggest the systemisation of Airbnb listings – think high-yield investment opportunity rather than “unique, authentic places to stay” – has become widespread.
  • Rob Walker: Under Armour misread the rise of the so-called athleisure trend and put too much focus on performance. In short, this maker of gear for authentic athletes may have been better off catering more to the poseurs and couch potatoes.
  • Timothy Morgan: And most importantly, this automated design assistance could radically drop the cost of creating new chips. These costs are going up exponentially, and data we have seen (thanks to IT industry luminary and Arista Networks chairman and chief technology officer Andy Bechtolsheim), an advanced chip design using 16 nanometer processes cost an average of $106.3 million, shifting to 10 nanometers pushed that up to $174.4 million, and the move to 7 nanometers costs $297.8 million, with projections for 5 nanometer chips to be on the order of $542.2 million. Nearly half of that cost has been – and continues to be – for software. So we know where to target some of those costs, and machine learning can help.
  • jandrewrogers: Physical storage density per server is driving this. You can fit upward of a petabyte of physical storage in a server and there are many applications where this makes sense. RAM is still on the order of a terabyte, and expensive. Everything else follows from trying to use all of this physical storage effectively. Needless to say, when working with this much storage you aren’t installing a filesystem on it. When working with storage this dense, write performance matters immensely if you can’t wait (literally) a year to index your data model. Scaling write throughput on indexing structures has a problem of write sparsity: given a fixed amount of cache, there is a scaling point where virtually every single insert requires at least one page fault. Indexing structures like B+Trees that consume large amounts of space will not only push the data pages out of cache, the index itself may no longer fit into cache at high densities. 

Useful Stuff: 

  • How Rockets are Made Smarter Every Day 231 (part 2). Just fascinating in general, but there are few points of special interest for us:
    • The quality of your tools determines the kind of designs you can make. With finite element analysis from the 90s you could only create designs of a certain type. For example, a Delta booster rocket panel has an isogrid (triangle) shaped structure. The Vulcan rocket uses orthogrids (squares) that are not symmetric because the engineering tools are better. The new design takes half the time time to manufacture and is stronger.
    • Building a rocket combines both machinery with human craftsmanship. Machines do a lot of the work, but people do a surprising amount, especially the finer detail work. This reinforces a point we’ve talked about in the past. Jobs are a bundle of tasks. People are still good at a great many tasks and the future will be a cooperation between machine and people, not people being completely replaced by machines.
    • Rockets only have a factor of safety of 10% because they can’t afford the weight. This surprised me. I was expecting a much higher multiple, because you know, space. Buildings commonly use a factor of safety of 2.0 for each structural member. Pressure vessels use 3.5 to 4. Structural steel work in bridges uses 5 to 7. But once you realize every 7 pounds in the booster plate costs a pound in the space craft, it makes sense.
    • In software we could use more friction stir welding. It’s a lighter and stronger form of weld. We do a lot of conventional welding where we tack things together, add a lot of filler, melt it, and the result are joints with different properties than the original materials.
    • Use different technologies for different purposes. On a booster rocket remember you can afford 7kg of inert mass (aluminum) before it costs you 1kg on the upper stages. On the upper stages it’s 1-1, so they switch to a higher performance technology (stainless steel). 
    • Fly it before you fly it. Try new stuff on old systems to verify that it works. One of the strategies for a successful first flight on Vulcan is to test the Vulcan parts that can fly on Atlas on Atlas. Almost everything will fly on Atlas before it flies on Vulcan.
    • Don’t attach your brain to your ass lest it fall off.
    • Software in the guidance system can be upgraded 60 seconds before launch. This is to deal with changing wind conditions. They measure the wind before launch. Run it through simulation. Recertify the trajectory. Load it on the rocket. And away they go. They are just changing parameters, not code. Certification is through a hardware in the loop simulation laboratory.
    • Once the rocket is flying it can adjust its own parameters on the fly. It measures its own performance, and if for some reason it doesn’t use as much propellant as expected, it can switch to a different set of parameters that allow it to, say, extend the mission. 
    • Competimate. It’s a small industry. It’s not unsual to have a competitor in your supply chain or buying services from you. For example, they buy engines from Blue Origin, which reduces the cost of the engine because it can be produced at greater scale.
    • Space is now contested. The last time we had a new warfighting domain was a hundred years ago when air was added. They are investing in capabilities for Space Force.
    • Also, The Computer that Controlled the Saturn V
  • What happens if you remade the Turing test into a world-wide game show? Sci-Fi Short Film “Who Among Us”. Creepy and disturbing? Absolutely. As a possibly simulated human raised on way too many of these type of shows, I did figure out the right answer in the first few seconds. 
  • We’re not far away from AIs writing complete scripts and books (@23:30). What does this mean for writers? Joanna Penn thinks it’s the end of scale for human authors. Human writers have entered the long tail of their profession. Why? Human authors simply can’t compete with the massive, zero marginal cost, and personalized output of AIs. Humans will have to double down on their humanness. Do things only humans can do. Go hyper-local, hyper-experience, hyper-personal. Find in-person streams of income where people connect and can relate to each other directly. 
  • Great insight from Riot Games. What matters is how your entire product works, nobody cares if it’s composed of services, it must still function well as a whole. Part VI: Products, Not Services
    • Riot Games moved to microservices. They got the main benefit of microservices in that they were rarely blocked on anything that they couldn’t resolve on their own. Then they started noticing some worrying trends. 
    • The QA and load testing environments were getting less stable every month, which cost time and energy that we could have been spent on creating player value.
    • Microservices created a risky feedback loop. Development teams made microservices, deployed them, operated them, and were held accountable for their performance. This meant they optimized logs, metrics, and processes for themselves and in general rarely considered their services as something that someone without development context or even engineering ability would have to reason with. 
    • As developers kept creating more and more microservices, operating the holistic product became very difficult and resulted in an increasing number of failures. 
    • On top of that, the fluid team structure at Riot was leaving some microservices with unclear ownership. This made knowing who to contact while triaging difficult, resulting in many incorrectly attributed pages. 
    • Partner regions ops teams became overwhelmed by the increasing number of heterogeneous microservices, deployment processes, and organizational changes.
    • Allowing unaligned streams of changes into a distributed system will eventually lead to preventable incidents.
    • Failures start to happen when teams are looking to coordinate across their boundaries, where dependencies require bundling a release with multiple changes. 
    • The solution: Given that the previous attempts failed to produce the desired outcome, we decided to eliminate partial state manipulation by creating an opinionated declarative specification that captures the entirety of a distributed product – an environment. An environment contains all the declarative metadata required to fully specify, deploy, configure, run, and operate a set of distributed microservices collectively representing a product and is holistically and immutably versioned.
    • This approach worked because it led to consistent, versioned releases of bundled containers, their dependencies, how they should be interacting with one another, and all the supporting metadata that was required to launch and operate an entire game. The immutability led to deterministic deployments and predictable operations. 
    • The future: Riot is already a multi-cloud company, leveraging our own data centers, AWS and partner clouds, but we rely on statically designed topologies. We’d like to be able to describe expected and acceptable latencies between services, and have the tooling optimize for which underlying regions and lower-level PaaS services would meet those needs. This will lead to some services being co-located in the same rack, host, or cloud region vs. allowing them to be distributed across others. This is highly relevant for us due to the performance characteristics of game servers and supporting services.
  • NASA investigation finds 61 corrective actions for Boeing after failed Starliner spacecraft mission. Obvious question: what kind of acceptance tests was NASA running? They sound completely inadequate. You weren’t already reviewing code? Didn’t you have simulators? When you’ve contracted with another company this is always an excellent policy:
    • NASA said it will embed more software experts into Boeing’s Starliner team, to try to help correct the issues found. 
    • By combining teams you turn development into a joint quest instead of creating an adversarial relationship. That does wonders for honesty and communication. And if the contracting company doesn’t want your people working with them then you know they are trying to screw you. 
  • Facebook rebuilt the architecture and rewrote the entire codebase based on four principles: Use the OS, reuse the UI, leverage the SQLite database, and push to the serverProject LightSpeed: Rewriting the Messenger codebase for a faster, smaller, and simpler messaging app
    • Reduced core Messenger code by 84 percent, from more than 1.7M lines to 360,000. 2x faster startup time. 1/4 the size (from a peak of 130MB). 
    • They did not make these reductions by pruning the feature set as many do these days. They used the native OS wherever possible, reusing the UI with dynamic templates powered by SQLite, using SQLite as a universal system, and building a server broker to operate as a universal gateway between Messenger and its server features.
    • In today’s Messenger, the contact list is a single dynamic template. We are able to change how the screen looks without any extra code. Every time someone loads the screen — to send a message to a group, read a new message, etc. — the app has to talk to the database to load the appropriate names, photos, etc. Instead of having the app store 40 screen designs, the database now holds instructions for how to display different building blocks depending on the various sub-features being loaded. This single contact list screen is extensible to support a large number of features, such as contact management, group creation, user search, messaging security, stories security, sharing, story sharing, and much more.
    • To build our universal system, we took an idea from the desktop world. Rather than managing dozens of independent features and having each pull information and build its own cache on the app, we leveraged the SQLite database as a universal system to support all the features. Now, rather than supporting one system to update which friends are active now, another to update changes in profile pictures in your contact list, and another to retrieve the messages you receive, requests for data from the database are self-contained. All the caching, filtering, transactions, and queries are all done in SQLite. The UI merely reflects the tables in the database. We extended SQLite with the capability of stored procedures, allowing Messenger feature developers to write portable, database-oriented business logic, and finally, we built a platform (MSYS) to orchestrate all access to the database, including queued changes, deferred or retriable tasks, and for data sync support.
    • For anything that doesn’t fit into one of the categories above, we push it to the server instead. We had to build new server infrastructure to support the presence of MSYS’s single integrated data and sync layer on the client. In today’s Messenger, we have a universal flexible sync system that allows the server to define and implement business and sync logic and ensures that all interactions between the client and the server are uniform.
    • We set budgets per feature and tasked our engineers with following the architectural principles above to stick to those budgets. We also built a system that allows us to understand how much binary weight each feature is bringing in. We hold engineers accountable for hitting their budgets as part of feature acceptance criteria. Completing features on time is important, but hitting quality targets (including but not limited to binary size budgets) is even more important.
    • @butz: The loop of app building: 1. Build app with native framework. 2. Re-build app with some alternative framework which is slower, but let’s you deploy single code on all platforms. 3. Re-build app with native framework, because it is faster. 4. GOTO 2
    • reaperducer: They make it sound like the clouds parted and a ray of sunshine struck the programmers at Facebook and they realized that they didn’t have to ride the framework-go-round to hell. These are supposed to be the smartest people in SV, but from the outside, it strikes me more as a “No shit, Sherlock” moment.
    • ardit33: Spotify had about 500k, lines of code in 2015, (not including external libraries). There were about 13 features at the beginning, (think, playlists, radio, shared ui libraries, etc…). Each feature was about 10-20k lines of code in total. So, about 2-3 features equal a small standalone iOS app. When I left, there were more than 80 features in the app. Some were essential, many were a/b tests, and some were probably just legacy/waiting to be removed. 80 features * 15k lines of code = 1.2mil loc, easily, and they are the equivalent of shipping 20 small apps standalone….What are these features and why is just a stupid music app so large? : video, podcasts/shows, genius, etc.. etc.. etc…. To users just looks like another tab but almost every one of these major features could be its own standalone app for size and complexity.
  • Netflix moved an application from one platform to another platform that had containers and there was 3x performance improvement. Wow, that would mean 3x fewer servers would be needed. Or does it? Our intrepid Detective Inspector Brendan Gregg is on the case. LISA19 – Linux Systems Performance. It’s a tour of six important areas: observability tools, methodologies, benchmarking, profiling, tracing, and tuning. Brendan used the tools to uncover many clues, which was really the point of the exercise. What was the problem? A lower load on the second system meant it wasn’t context switching as much, so the application could use more cache. Bummer, no free lunch. Also, BPF Performance Tools
  • Infrastructure privilege is the idea that commercial companies develop faster more efficient hardware and software systems that others do not have access to. As a consequence instead of making the global computing more efficient, only local efficiencies are achieved, which is not great for the world. 
    • The term infrastructure privilege was coined by efficiency advocate Amir Michael, founder of the Open Computer Project. Not coincidentally, OCP is an initiative started by Facebook that aims to accelerate data center and server innovation while increasing computing efficiency. 
    • Learn more about IP from a couple of On the Metal podcasts (wrap up,  Amir Michael). On the Metal is a cool podcast focussing on the hardware/software interface. This is their first year and they have already have a lot of interesting content. 
    • This is the first year for On the Metal because it’s the first year for the Oxide Computer Company who has some founders you may recognize: Steve Tuck, Bryan Cantrill and Jessie Frazelle. What are they up to? They are building a new kind of server. What does that mean? That would be telling and they aren’t telling. Maybe because they don’t know yet? But Bryan drops some hints in a talk he gave: Stanford Seminar – The Soul of a New Machine: Rethinking the Computer.
    • Bryan’s talk starts with a trip trough server history lane, beginning with the IBM 709 in 1961, though Sun in 1999, hyperscale purpose fit machines in 2009, and hyperscale 2020 using compute sleds that plug into a DC bus bar. But what’s next?
    • Bryan believes that there a lot of people who still want to own their computers and these hyperscaler computers are not a good fit. If you take a full stack view and redesign the server from scratch you can do a lot better. But you need a new approach.
    • For a new approach we need:
      • a real hardware root-of-trust.
      • a fit-to-purpose BMC (baseboard management controller).
      • host firmware confined to boosting a host operating system
      • a true rack-scale design in which a top-of-rack switch is co-designed with compute and storage.
      • a dense form factor that allows for efficient operation.
      • integrated software: hypervisor, control plane, storage, ToR + API endpoints for both operator and developer.
      • fully open software at every level of the stack. 
    • So this is an on-prem vision with the hope that if you build it they will come. Typically people are satisfied with worse-is-better solutions that cloud providers are happy to offer. Bryan makes the case that the cloud is expensive and a lot of people would like an alternative. In fact, on-prem customers are angry and are demanding an alternative! At least that’s the pitch you give VCs. 
    • Could a new compute platform make inroads? Even this toned down version of Bryan can sell ice to Eskimos, so I hope so. Competition is good for all of us and the cloud right now is evolving along a very predictable not lower price curve. It’s time for a disruption and those usually come from small passionate teams doing the impossible. Yet it’s hard to envision how a custom platform can succeed. History is littered with 20% better systems that get boat raced by commodity components within a few years. What is the key product differentiator they can build that can be duplicated or marginalized by cloud providers? That is not as clear.
  • Need to scale up to 23000 Kafka events per second? Here’s how using 11 Kubernetes nodes and 280 pods. And it’s on Azure. Scalable Microservice Demo K8s Istio Kafka. Most interesting is the use of the SAGA pattern to implement transactions across microservices (which I thought wasn’t supposed to be a thing). Also, Could Microsoft Azure Actually Win the Cloud?
  • AI ain’t all that. The Great AI Fallacy
    • The idea that we can deploy optimized entities amongst humans and we won’t find flaws in their objective functions is nonsense—that’s the Great AI Fallacy.
    • We’re still far away from building flexible, intelligent AI systems. All we’re really building are neural networks that map inputs to outputs. These perform well on perceptual tasks, but it isn’t much different than we’ve done before. Compared to natural intelligences they aren’t flexible, they can’t adapt in a task specific way according to context, they can’t decide how to respond more sensibly than their inputs allow. 
    • We’re building systems with embedded objective functions that often perform better than a human at their specific tasks. Humans are entity that has continued to exist by selection bias over time. It’s remarkable we do so well at tasks we were never optimized to perform. 
  • Good idea. Using AWS Lambda Layers to implement sub-millisecond static caches. Layers aren’t necessary of course, but they allow easy standardization across functions.
  • ServerlessDays Nashville 2020 video are now available
  • Netflix vs the environment. Remember that report that said watching 30 minutes of Netflix has the same carbon footprint as driving a car 4 miles? Sounds wrong, doesn’t it? Aren’t computers are always getting more efficient?
    • It was wrong. Way wrong. The real number is less than 20 grams of co2, which is like driving a car 160 meters. They were off by at least 2x on the emissions intensity. They were off by 40x on the electricity used for video downloads. 
    • Then there’s Recalibrating global data center energy-use estimates, which we of course can not read, so here’s a summary Cloud Computing Is Not the Energy Hog That Had Been Feared: A new study of data centers globally found that while their computing output jumped sixfold from 2010 to 2018, their energy consumption rose only 6 percent. The scientists’ findings suggest concerns that the rise of mammoth data centers would generate a surge in electricity demand and pollution have been greatly overstated.
    • Move bits, not atoms—it’s still the future.
  • Jeff Lawson – How to Build a Platform
    • We’re used to thinking of products in the meat world as having a supply chain, it’s interesting to think of platforms/services/frameworks/libraries as part of digital supply chain. Maybe software teams need someone in the Digital Supply Manager role? 
    • Old: Build vs. buy. New: Build or die. The rise of platforms has inverted the build vs. buy decision. In the past “buy” was chosen because software was seen as a cost center to be minimized. Now with the rich service ecosystem you can build software for much less money and by building custom software you can create a competitive advantage. Software is a way to differentiate yourself from everyone else who has outsourced their product.
    • Pull developers into the solving of problems for customers. Platforms let you observe and work with customers to better solver their problems.
  • DevOps Life: In our example, using the free tier of Amazon Lambda and Amazon SQS, generating 2,000 PDFs is costing us ~$1.63 ($1.62 for SQS, $0.01 for Lambda).
  • Alex Blewitt on Understanding CPU Microarchitecture for Performance. You’ll learn what happens in a CPU and how to make programs vroom.
    • Use cacheline-aligned or cacheline-aware data structures.
    • Compress data in memory and decompress on the fly.
    • Avoid random memory access when possible.
    • Configure huge pages and use madvise & defer
    • Partition memory with libnuma for data locality
    • Each CPU is its own networked mesh cluster
    • Branch speculation and memory/TLB misses are costly
    • Use branch free and lock free algorithms when possible.
    • Analyse perf counters with top down architecture analysis
    • Use (auto)vectoratisation and use XMM/YMM/ZMM when sensible.
  • Media is changing. Michael Mulvihill, FOX Sports EVP/Head of Strategy 
    • The fundamental premise that guides everything they are doing is that the entire video industry is reorganizing into two marketplaces: live content and on-demand content. Live includes sports, cable news, etc. On-demand content is everything that is scripted and used to run in a prime time network schedule at certain times. More and more it’s watch on an on-demand platform like Netflix. The on-demand world is largely about advertising avoidance. The live world because you are watching in real-time you can’t avoid the commercials, which adds more value to the content. 
    • Even marginal content like college basketball is lifted by the rising tide of live content value as scripted content moves to new on-demand platforms. And that’s why new live properties like the XFL have value. 
    • 93 of the 100 most watch shows in 2019 were live sports events. Twenty years ago that number was more like 20-25. There’s an enormous shift away from scripted TV content at the highest and of the TV market toward live content. That benefits NFL, NBA, college basketball, horse racing, etc. Everything that’s on a commercially sponsored platform that’s live sports is benefiting from the evolution of the business. 
    • The understanding of how people use Netflix is limited because Netflix doesn’t give out that data. The suspicion is Netflix has less of an impact on day time consumption and more of an impact on night time viewing, so some content is being moved to the day.
  • We thought the internet would make a lot of things better. Boy were we wrong. TKC 605 Steven Levy author of Facebook: The Inside Story
    • Interesting bit where Facebook was driven by an end-of-history style type narrative. That’s where you convince yourself communism will eventually win so anything you do to bring communism about quicker is justified. *anything*
    • Only for Facebook the story they told themselves was about growth. Anything you could do to grow Facebook is justified. Why? Connecting the world is good. Obvi. So growth is good. Growth means more people are being connected. More growth is better because that means more people are being connected. So anything you do to make money is good because that means you can grow even faster which connects even more people. 
    • Unfortunately connecting the whole world proved not to be the intuitively obvious good that people thought. Much like communism. When you grow so fast you can’t control all the things you break along the way, bad things happen to the world. 
    • Zuck said he didn’t realize how people would abuse the platform, that when many people are connected they don’t act better, they behave worse. Facebook made it very easy to behave very badly. Growth demanded such sacrifices—of other people.
    • Notice how end-of-history arguments are always so self serving? That should be your first clue.
  • Innovation Through Software Development and IT. Research shows…
    • High performing companies don’t need to make a tradeoff between moving fast and breaking things. You can move fast without breaking things. In fact, you make the system more stable over time. High performers do better at everything. The capabilities that enable high performance in one area enable high performance in other areas. Using version control in development works just as well in production.
    • Toyota did not win by making shitty cars faster. They won by making high quality cars faster and having a shorter time to market.
    • Speed and stability go together through DevOps. There’s isn’t a tradeoff. You don’t need to slow down to be more stability. High performers can have it all. 
    • Failure is inevitable, what matters is your MTBF. If you only go down once a year for three days that’s not good. If you go down for a short period of time and you come back almost immediately, there’s a small blast radius, and customers barely notice, that’s OK. 
    • In a high quality process when you release and it doesn’t break.
    • Developing and delivering software with speed and stability drive organizational goals like profitability and productivity market share. At one time it was thought this wasn’t true. It was though IT Doesn’t Matter. Before software was an on-prem sales model. Everyone bought the same software so software was not a way to differentiate and bring value to customers. Software as a point of parity, not a point of distinction. In the software is eating the world we live in today in-house software is way to differentiate and bring value to the business. That’s something people can’t copy.
    • Any company can be a high performer if they want to change. Every company these days is a technology company. China did not go through the IT doesn’t matter phase so technology is built-in to their DNA.
    • DevOps is the capability that underlies your ability to practice lean startup and rapid iterative processes. 
    • One of the biggest predictors of IT performance is the ability of teams to get stuff done on their own without dependencies on other teams. Microservices are just one way. You can acheive results on mainframes, for example. There is no statistical correlation with performance, weather you’re on a mainframe, green field, or brown field system. 
    • Maturity models are not a good way to think of technology transformation. Once you’ve arrived then what happens? Resources are gone. The best most innovative companies continue to push. It’s better to think of a capability model. What capabilities are necessary to develop and deliver software with speed and stability? Capabilities drive outcomes, maturity models don’t. 
    • Work in small batches so you can reason about cause and effect. This is what DevOps enables.
    • Pathological organizations are characterized by low cooperation between different departments and up and down the organization. How do you deal with people who bring bad news? Do you focus on finding fault? How do you deal with failure? Is novelty crushed?
    • Google found the greatest teams encouraged psychological safety. Did the team feel like it is safe to take risks? When things go wrong are they supported? If not people are not going to take risks which means you will not get any novelty. Create teams where it’s safe to take risks, where failures are thought of as learning experiences. 
    • Rules aren’t bad, they allow you to scale an organization. But have a mission orientation. If something is important to the mission then the rules can be broken.
    • Also, The System of Profound Knowledge
  • If we are living in a simulation that must be one fast computer. Simulating Computer Architecture with “Mechanistic” Models – No more 100k Slowdown?
  • It seems AWS Graviton2 has lived up to its claims when put to the test. With competitive pricing it looks like Graviton2 will be making its mark in the cloud business. Benchmarking the AWS Graviton2 with KeyDB – M6g up to 65% faster: M6g instances are 20% cheaper on a per GB scale; M6g instances can be over 2X cheaper when looking at computing cost / performance; m6g.large is 1.65X faster than m5.large.
  • The 5 Ages of Burning Man: Exploration (1986-1990); Rapid Growth (1990-1996); Protectionism (1997-2000); Outreach (2001-2010); Scarcity (2011-present).
  • One reason it’s so hard to think about protocols in a distributed system. Balancing Loop with Delay: “Balancing loops coupled with delays can create complex behavior because there are so many different sources and sizes of delays. ” Interestingly, lack of timely feedback is a key difference between kind and wicked problems. Also, How to Avoid Cascading Failures in Distributed Systems
  • This is exactly backwards. It’s a map vs. territory conflation. Code is a map. The territory is encoded in the people and structure of your organization. Your Developer is not a Single Point of Failure: After all, developers, encode business processes and keep the business from being held hostage to arcane workflows and procedures known only to “special” individuals. Instead, processes are written down in code. The business owns the source code and can hire other developers to work on it if they so choose. Developers are not single points of failure. Developers prevent single points of failure.
  • Now this is a distributed system. Did you know that spacecraft do not navigate all by themselves? Voyager is still sent navigation commands from earth. Why? Atomic clocks are huge. How a miniaturized atomic clock could revolutionize space exploration.
    • We are sending more spacecraft further into deep space than ever before. But every one of those spacecraft out there depends on its navigation being performed right here at Earth to tell it where it is and, far more importantly, where it is going. And we have to do that navigation here on Earth for one simple reason: spacecraft are really bad at telling the time. But if we can change that, we can revolutionize the way we explore deep space. 
    • So we measure that signal time very, very accurately here on Earth, down to better than one-billionth of a second. But it has to be measured here on Earth. There’s this great imbalance of scale when it comes to deep space exploration. Historically, we have been able to send smallish things extremely far away, thanks to very large things here on our home planet. As an example, this is the size of a satellite dish that we use to talk to these spacecraft in deep space. And the atomic clocks that we use for navigation are also large. The clocks and all of their supporting hardware can be up to the size of a refrigerator. Now, if we even want to talk about sending that capability into deep space, that refrigerator needs to shrink down into something that can fit inside the produce drawer.
  • 10 Breakthrough Technologies: Unhackable internet; Hyper-personalized medicine; Digital money; Anti-aging drugs; AI-discovered molecules; Satellite mega-constellations; Quantum supremacy; Tiny AI; Differential privacy; Climate change attribution
  • Don’t believe it when people say scaling problems are good problems to have. Robinhood Maxed Out Credit Line Last Month Amid Market Tumult.
  • An app modernization story — Part 4 (Serverless Microservices): The final architecture solved pretty much all the issues I outlined earlier. There are 3 focused services doing 1 thing at a time well. Each service can be independently reasoned about, updated and scaled. Updating is not scary anymore. It’s just an update to one of the services and the revision feature of Cloud Run makes reverting back to previous code very easy. There’s loose dependency between services via Pub/Sub and Firestore. No issue with cold starts as the Web frontend is backed with Firestore and it’s pretty quick to start up. On the flip side, the architecture is considerably more complex than before. There are a few moving parts and overall setup is not as easy as a single Cloud Run service with no external dependencies. However, that’s the price we were willing to pay to make the architecture more resilient and easier to update.
  • Handling 350k Requests for $3 using Lambda
    • merty: This isn’t actually one of those solutions where Lambda shines, pricing wise. I would simply trigger a Lambda function once a minute (or every X minutes) using CloudWatch to fetch the latest articles and save them to an S3 bucket which I would expose and cache using CloudFront or any other CDN service. This would lead to: – No Lambda costs as it would be covered by the monthly free tier of 1M requests. – No storage costs as the size of the stored data would be extremely small. – Really fast responses as the “response” would actually be a static file cached at the CDN. – The only parameter defining your cost would be your CDN of choice, which would cost somewhere between free and as low as $10/TB. For a project like the one in the article, that’s hundreds of millions of requests for just $10.
    • NathanKP: Yep, that is exactly the architecture that I use to watch over 600k Github repo changelogs for https://changelogs.md. Lambda generates static HTML in the background, puts it in S3, and the static HTML get served via CloudFront. The Lambda costs are a whopping 26 cents per month, for over 2 million Lambda invocations per month.
    • rubyn00bie: My Raspiberry PI model 3 would do roughly[1] ~250 req/s with an Elixir server I wrote. 350k @ $3 requests seems kind of expensive to me to be honest.
    • blantonl: I run 8 different production API endpoints, all on Lambda, directly through Application Load Balancer instead of API Gateway, and they served 510 million requests last month for about $200. Those API endpoints took me about a day to deploy using serverless deployment tools. They economically scale between extremely varied workload requirements. There are no complaints on my end.
  • AMD FTW. Why Cloudflare Chose AMD EPYC for Gen X Servers
    • Compared with our prior server (Gen 9), our Gen X server processes as much as 36% more requests while costing substantially less. Additionally, it enables a ~50% decrease in L3 cache miss rate and up to 50% decrease in NGINX p99 latency, powered by a CPU rated at 25% lower TDP (thermal design power) per core.
    • Based on our findings, we selected the single socket 48-core AMD 2nd Gen EPYC 7642. Key factors we noticed were its large L3 cache, which led to a low L3 cache miss rate, as well as a higher sustained operating frequency.
    • Partnering with AMD, we tuned the 2nd Gen EPYC 7642 processor to achieve additional 6% performance. We achieved this by using power determinism and configuring the CPU’s Thermal Design Power (TDP).
  • To Microservices and Back Again: From the start, Segment embraced a microservice architecture in our control plane and data plane. Microservices have many benefits: improved modularity, reduced testing burden, better functional composition, environmental isolation, and development team autonomy, etc. but when implemented wrong their benefits can quickly become burdens.  After years of continuing to add to our microservice architecture we found ourselves in a spot where our developer velocity was quickly declining and we were constantly tripping over our microservice architecture and its complexity. In this session you’ll learn what microservice antipatterns to avoid, the trade offs between microservices and a monolith, how to identify when it’s time to take a step back and make a big change, and how moving to a monolith was the solution that worked for us.
  • Good discussion. Simple Systems Have Less Downtime. But discussions on simplicity are all pretty much the same. Everyone likes simple but nobody can say how to make something simple or what simple is. Simple is usually what they do.
  • Great example of using low-level profiling techniques to figure out what’s really happening. Cloudflare on When Bloom filters don’t bloom
    • Modern CPUs are really good at sequential memory access when it’s possible to predict memory fetch patterns (see Cache prefetching). Random memory access on the other hand is very costly. 
    • Advanced data structures are very interesting, but beware. Modern computers require cache-optimized algorithms. When working with large datasets, not fitting L3, prefer optimizing for reduced number loads, over optimizing the amount of memory used.
    • I guess it’s fair to say that Bloom filters are great, as long as they fit into the L3 cache. The moment this assumption is broken, they are terrible.
    • One colleague often says: “You can assume modern CPUs are infinitely fast. They run at infinite speed until they hit the memory wall“.
    • Finally, don’t follow my mistakes – everyone should start profiling with ‘perf stat -d’ and look at the “Instructions per cycle” (IPC) counter. If it’s below 1, it generally means the program is stuck on waiting for memory. Values above 2 would be great, it would mean the workload is mostly CPU-bound. 
  • HelloFresh on Everything We Learned Running Istio In Production
    • we run hundreds of microservices that do everything from supply chain management and handling payments to saving customer preferences.
    • In Istio,this meant choosing the core set of features that enabled observability and reliability. For resilience we enabled Circuit Breaking, Outlier Detection, Retries (where necessary), and Timeouts. We enabled them in a way that developers could choose which features they want to implement but with a company standard interface (more on that below). In practical terms we added sidecar proxies (Envoy), Gateways, Virtual Services and Destination Rules for most services.
    • We deploy all services using Helm charts.
    • At HelloFresh we organize our teams into squads and tribes. Each tribe has their own Kubernetes namespace. As mentioned above, we enabled sidecar injection namespace by namespace then application by application. Before enabling applications for Istio we held workshops so that squads understood the changes happening to their application. Since we employ the model of “you build it, you own it”, this allows teams to understand traffic flows when troubleshooting. 
  • Should storage be modeled as a NoSQL database? Computational Storage Winds Its Way Towards The Mainstream
    • There is both a cost and a time factor involved in moving data from where it is generated to where it is processed it, and with certain classes of computational problems being very I/O bound, it makes more sense to perform the processing as close as possible to where the data is stored, rather than shuffling gigabytes or terabytes of information around.
    • Scott Shadley: “If you have a large amount of stored data that you need to do something with, our tagline for moving forward is – we store massive amounts of data, and we return just the value of that data to you”
    • This reminds me of data modeling for a NoSQL database. To see what I mean take a look at Rick Houlihan in AWS re:Invent 2018: Amazon DynamoDB Deep Dive: Advanced Design Patterns for DynamoDB. The idea is to model your data and create queries such that data is processed sequentially and only the data of interest is returned. A good interface to these devices would look something like the DynamoDB API.
  • Impressive visualization of diffusion across networks. Going Critical. Without immunity it clearly shows how network density impacts diffusion rates. Containment sucks, but how else can you reduce the number of potential edges in the network?

Soft Stuff:

  • firecracker-microvm/firecracker: Firecracker is an open source virtualization technology that is purpose-built for creating and managing secure, multi-tenant container and function-based services that provide serverless operational models. Firecracker runs workloads in lightweight virtual machines, called microVMs, which combine the security and isolation properties provided by hardware virtualization technology with the speed and flexibility of containers.
  • earthspecies/roadmaps: An open-source collaborative and nonprofit dedicated to decoding animal communication.
  • bottlerocket-os/bottlerocket: Bottlerocket is a free and open-source Linux-based operating system meant for hosting containers. Bottlerocket is currently in a developer preview phase and we’re looking for your feedback. If you’re ready to jump right in, read our QUICKSTART to try Bottlerocket in an Amazon EKS cluster.
  • matrix.org: A new basis for open, interoperable, decentralised real-time communication

Pub Stuff: 

  • In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems: We propose SLIDE (Sub-LInear Deep learning Engine) that uniquely blends smart randomized algorithms, with multi-core parallelism and workload optimization. Using just a CPU, SLIDE drastically reduces the computations during both training and inference outperforming an optimized implementation of Tensorflow (TF) on the best available GPU. Our evaluations on industry-scale recommendation datasets, with large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 3.5 times (1 hour vs. 3.5 hours) faster than the same network trained using TF on Tesla V100 at any given accuracy level. On the same CPU hardware, SLIDE is over 10x faster than TF. We provide codes and scripts for reproducibility.
  • Discovering the Brain’s Nightly “Rinse Cycle”: The findings, published recently in the journal Science, are the first to suggest that the brain’s well-known ebb and flow of blood and electrical activity during sleep may also trigger cleansing waves of blood and CSF. While the experiments were conducted in healthy adults, further study of this phenomenon may help explain why poor sleep or loss of sleep has previously been associated with the spread of toxic proteins and worsening memory loss in people with Alzheimer’s disease.
  • fastai/fastbook: These draft notebooks cover an introduction to deep learning, fastai, and PyTorch. fastai is a layered API for deep learning
  • Out of the Tar Pit: We believe that – despite the existence of required accidental complexity – it is possible to retain most of the simplicity of the ideal world in the real one. We now look at how this might be achievable. Our recommendations for dealing with complexity (as exemplified by both state and control) can be summed up as avoid, and separate.
  • Memory-Ecient Search Trees for Database Management Systems: Th‘e growing cost gap between DRAM and storage together with increasing database sizes means that database management systems (DBMSs) now operate with a lower memory to storage size ratio than before. On the other hand, modern DBMSs rely on in-memory search trees (e.g., indexes and €lters) to achieve high throughput and low latency. ‘ese search trees, however, consume a large portion of the total memory available to the DBMS ‘is dissertation seeks to address the challenge of building compact yet fast in-memory search trees to allow more ecient use of memory in data processing systems. We €rst present techniques to obtain maximum compression on fast read-optimized search trees. We identi€ed sources of memory waste in existing trees and designed new succinct data structures to reduce the memory to the theoretical limit. 

from High Scalability: http://feedproxy.google.com/~r/HighScalability/~3/fjQxyqsaCwg/stuff-the-internet-says-on-scalability-for-march-13th-2020.html