Radar Trends to Watch: December 2022 – O’Reilly
This month’s news has been overshadowed by the implosion of SBF’s TFX and the possible implosion of Elon Musk’s Twitter. All the noise doesn’t mean that important things aren’t happening. Many companies, organizations, and individuals are wrestling with the copyright implications of generative AI. Google is playing a long game: they believe that the goal isn’t to imitate art works, but to build better user interfaces for humans to collaborate with AI so they can create something new. Facebook’s AI for playing Diplomacy is an exciting new development. Diplomacy requires players to negotiate with other players, assess their mental state, and decide whether or not to honor their commitments. None of these are easy tasks for an AI. And IBM now has a 433 Qubit quantum chip–an important step towards making a useful quantum processor.
- Facebook has developed an AI system that plays Diplomacy. Diplomacy is a board game that includes periods for non-binding negotiations between players, leading to collaborations and betrayals. It requires extensive use of natural language, in addition to the ability to understand and maintain relationships with other players.
- Shutterstock will be collaborating with OpenAI to build a model based on DALL-E that has been trained only on art that Shutterstock has licensed. They will also put in place a plan for compensating artists whose work was used to train the model.
- Facebook’s large language model for scientific research, Galactica, only survived online for three days. It produced scientific papers that sounded reasonable, but the content was often factually incorrect, including “fake research” attributed to real scientists. It was prone to generating hate research directed against almost any minority.
- Google has put a Switch Transformers model on HuggingFace. This is a very large Mixture of Experts model (1.6 trillion parameters) that uses many sub-models, routing different tokens to different models. Despite the size, Switch Transformers are relatively fast and efficient.
- OneAI has launched a Natural Language Processing-as-a-Service service, based on OpenAI’s Whisper model. Whisper is relatively small, impressively accurate, and supports multiple languages.
- AI governance–including the ability to explain and audit results–is a necessity if AI is going to thrive in an era of declining public trust and increasing regulation.
- Researches have developed an AI system that learns to identify objects by using a natural language interface to ask humans what they’re seeing. This could be a route towards AI that learns more effectively.
- Google is developing a human-in-the-loop tool for their large language model LaMDA, designed to help writers interact with AI to create a story. The Wordcraft Writers Workshop is another project about collaborating with LaMDA. “Using LaMDA to write full stories is a dead end.”
- You didn’t really want a never-ending AI-generated discussion between Werner Herzog and Slavoj Žižek, did you? Welcome to the Infinite Conversation.
- Code as Policies extends AI code generation to robotics: it uses a large language model to generate Python code for robotic tasks from verbal descriptions. The result is a robot that can perform tasks that it has not been explicitly trained to do. Code is available on GitHub.
- AskEdith is a natural language interface for databases that converts English into SQL. Copilot for DBAs.
- Facebook has used AI to build an audio CODEC that is 10 times more efficient than MP3.
- SetFit is a much smaller language model (1/1600th the size of GPT-3) that allows smaller organizations to build specialized natural language systems with minimal training data.
- Wide transformer models with fewer attention layers may be able to reduce the size (and power requirements) of large language models while increasing their performance and interpretability.
- Semi-supervised learning is a partially automated process for labeling large datasets. Starting with a small amount of hand-labeled data, you train a model to label data; use that model; check results for accuracy; and retrain.
- DuckDB is a very fast database designed for online analytic processing (OLAP) of small to medium datasets. It runs easily on a laptop and integrates very well with Python.
- How do you manage SBOM drift? Building a software bill of materials is one thing; keeping it accurate as a project goes through development and deployment is another.
- Who is using Rust? Time for a study. Nearly 200 companies, including Microsoft and Amazon; Azure’s CTO strongly suggests that developers avoid C or C++ in favor of Rust.
- What comes after Copilot? Github is looking at voice-to-code: programming without a keyboard.
- genv is a tool for managing GPU use, an often neglected part of MLOps. Unlike CPUs, they are usually allocated statically, and can’t be reallocated if they’re underused or unused.
- Multidomain service orchestration could be the next step beyond Kubernetes: orchestration between software components that are running in completely different environments.
- Rewind, an unreleased product for Macs, claims to record everything you do, see, or hear, so you can look it up later. There are obvious ramifications for privacy and security, though users can start and stop recording. The key technology seems to be extremely effective compression.
- Progressive delivery for databases? As James Governor points out, database schemas have been left behind by CI/CD. That may be changing.
- Turbopack, a new Rust-based bundler for Next.js, promises greatly improved performance. Unlike Webpack, Turbopack does incremental builds, and is designed for use in both development and production.
- Shell scripting never goes out of date. Here are some best practices, starting with “always use bash.”
- The US Department of Defense has released their road map towards implementing zero trust by 2027.
- A new ransomware attack steals the victim’s Discord account in addition to encrypting files. It’s theorized that the Discord account may be used to launch cryptocurrency and NFT scams. In any case, it’s a sure sign of where cyber criminals see value: not in Facebook or Twitter.
- 95% of all web applications have security holes. And that’s an improvement over last year. 77% had a vulnerability listed in OWASP’s top 10: misconfiguration, broken access control, and other basic stuff. The biggest problem in infosec is (still) getting the basics right.
- The popularity of cryptojacking (mining cryptocurrency with malware planted in someone else’s applications) continues to rise, as the collapse in cryptocurrency prices makes legitimate mining unprofitable.
- A threat group named Worok is using steganography to hide malware within PNG images.
- All of the major browsers (Chrome, Firefox, Safari) trust certificates that allow a number of untrustworthy companies to act as certificate authorities. These companies are involved in activities like planting spyware on web sites to collect users’ personal data.
- A massive SEO-poisoning campaign has compromised 15,000 WordPress sites, with the aim of causing Google searches to send people to fake Q&A sites. This may be a precursor to using the fake sites for phishing or installing malware.
- The British Government has started a scan of all Internet devices located in the UK. Its intent is to detect vulnerabilities.
- Cyberattacks are increasingly targeted at small to medium businesses, the vast majority of which don’t have plans for defense or disaster recovery.
- Multifactor Fatigue is a new kind of attack against multifactor authentication: bombarding a user with automation requests, hoping that they will accidentally approve one.
- Scott Aaronson has posted an “extremely compressed” (3-hour) version of his undergraduate course in Quantum Computing on YouTube. It’s an excellent way to get started.
- Horizon Quantum Computing is launching a development platform that will let programmers write code in a language like C or C++, and then compile and optimize it for a quantum computer.
- IBM has created a 433-qubit quantum chip, and updated the Qiskit runtime with improved error correction. This represents a big step forward, though we are still far from usable quantum computing.
Cryptocurrency and Blockchains
- The Australian Stock Exchanged canceled its 6-year-old blockchain experiment, which would have put most of its work onto a Blockchain-like shared distributed ledger.
- Vitalik Buterin responds to the FTX failure by hypothesizing about a “proof of solvency” that would be independent of audits and other “fiat” methods. The theme is familiar: can cryptocurrency move closer to trustlessness?
- One “selling point” of NFTs has been that royalties can be passed to creators on resale of the NFT. However, many marketplaces do not enforce royalty payments, and building royalties into the smart contracts underlying NFTs is close to impossible. Some marketplaces, including Magic Eden and OpenSea, have developed tools for enforcing royalty payments.
- Infrastructure for renewable energy is bound to be less centralized. Is it an application for a blockchain? Or is a blockchain just a tool for recentralization? Is it creepy when Shell is arguing for decentralization?
- Can a nation upload itself to the metaverse? At the COP27 climate summit, Tuvalu’s foreign minister proposed, bitterly, that this may be their only solution to global warming, which will put their entire nation underwater. Their geography, culture, and national sovereignty could be preserved in a virtual world.
- The Dark Forest is a massive multiplayer online game that is based on a blockchain. It is almost certainly the most complex game based on blockchain technology. There is no central server; it may show a way into building a Metaverse that is truly decentralized.
- When is VR too connected to the real world? Palmer Lucky, founder of Oculus, has built a VR headset that will kill you if you die in the game. While he says this is just “office art,” he seems to believe that devices like this will eventually become real products.
- The internet developed organically, in ways nobody could have predicted. Ben Evans argues that if the Metaverse happens, it will also develop organically. That isn’t an excuse not to experiment. But it is a reason not to invest too much in conflicting definitions.
- The flow of users from Twitter to Mastodon means that the ActivityPub protocol (the protocol behind Mastodon’s federated design) is worth understanding. Mastodon won’t (can’t) make the mistake of disenfranchising developers of new clients and other applications.
- Google is imposing a penalty on AI-generated content in its rankings. While a reduction of 20% seems small, that penalty causes a significant reduction in traffic.
Learn faster. Dig deeper. See farther.