DALL-E

back to index

description: image generator program

9 results

The Singularity Is Nearer: When We Merge with AI
by Ray Kurzweil
Published 25 Jun 2024

BACK TO NOTE REFERENCE 103 Jeff Dean, “Google Research: Themes from 2021 and Beyond,” Google Research, January 11, 2022, https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html. BACK TO NOTE REFERENCE 104 For examples of DALL-E’s remarkably creative images, see Aditya Ramesh et al., “Dall-E: Creating Images from Text,” OpenAI, January 5, 2021, https://openai.com/research/dall-e. BACK TO NOTE REFERENCE 105 “Dall-E 2,” OpenAI, accessed June 30, 2022, https://openai.com/dall-e-2. BACK TO NOTE REFERENCE 106 Chitwan Saharia et al., “Imagen,” Google Research, Brain Team, Google, accessed June 30, 2022, https://imagen.research.google.

This is the equivalent of showing an image-focused AI only five images of something unfamiliar, like unicorns (instead of five thousand or five million, as previous methods required), and getting it to recognize new unicorn images, or even create unicorn images of its own. But DALL-E and Imagen took this a dramatic step further by excelling at “zero-shot learning.” DALL-E and Imagen could combine concepts they’d learned to create new images wildly different from anything they had ever seen in their training data. Prompted by the text “an illustration of a baby daikon radish in a tutu walking a dog,” DALL-E spat out adorable cartoon images of exactly that. Likewise for “a snail with the texture of a harp.” It even created “a professional high quality emoji of a lovestruck cup of boba”—complete with heart eyes beaming above the floating tapioca balls.

Previous AI systems had generally been limited to inputting and outputting one kind of data—some AI focused on recognizing images, other systems analyzed audio, and LLMs conversed in natural language. The next step was connecting multiple forms of data in a single model. So OpenAI introduced DALL-E (a pun on surrealist painter Salvador Dalí and the Pixar movie WALL-E),[105] a transformer trained to understand the relationship between words and images. From this it could create illustrations of totally novel concepts (e.g., “an armchair in the shape of an avocado”) based on text descriptions alone. In 2022 came its successor, DALL-E 2,[106] along with Google’s Imagen and a flowering of other models like Midjourney and Stable Diffusion, which quickly extended these capabilities to essentially photorealistic images.[107] Using a simple text input like “a photo of a fuzzy panda wearing a cowboy hat and black leather jacket riding a bike on top of a mountain,” the AI can conjure up a whole lifelike scene.[108] This creativity will transform creative fields that recently seemed strictly in the human realm.

pages: 194 words: 57,434

The Age of AI: And Our Human Future
by Henry A Kissinger , Eric Schmidt and Daniel Huttenlocher
Published 2 Nov 2021

Will Douglas Heaven, “DeepMind Says Its New Language Model Can Beat Others 25 Times Its Size,” MIT Technology Review, December 8, 2021, https://www.technologyreview.com/2021/12/08/1041557/deepmind-language-model-beat-others-25-times-size-gpt-3-megatron/. 5. Ilya Sutskever, “Fusion of Language and Vision,” The Batch, December 20, 2020, https://read.deeplearning.ai/the-batch/issue-72/. 6. “Dall·E 2,” OpenAI.com, https://openai.com/dall-e-2/. 7. Cade Metz, “Meet Dall-E, the A.I. That Draws Anything at Your Command,” New York Times, April 6, 2022, https://www.nytimes.com/2022/04/06/technology/openai-images-dall-e.html 8. Robert Service, “Protein Structures for All,” Science, December 16, 2021, https://www.science.org/content/article/breakthrough-2021. 9. David F. Carr, “Hungarian Gov Teams Up with Eastern European Bank to Develop AI Supercomputer,” VentureBeat, December 9, 2021, https://venturebeat.com/2021/12/09/hungarian-gov-teams-up-with-eastern-european-bank-to-develop-ai-supercomputer/. 10.

As OpenAI’s chief scientist predicted at the end of 2020, language models have “start[ed] to become aware of the visual world.”5 Multimodality, the ability of text-trained language models to process and generate audio and visual media, is a burgeoning field of exploration. The most prominent current example is DALL·E 2 from OpenAI, announced in early 2022.6 DALL·E 2 can create photographic images or professional-quality artwork based on arbitrary text descriptions. For example, it has generated realistic images of “cats playing chess” and “a living room filled with sand, sand on the floor, with a piano in the room.”7 These are original images, although originality does not always mean creativity.

pages: 321 words: 113,564

AI in Museums: Reflections, Perspectives and Applications
by Sonja Thiel and Johannes C. Bernhardt
Published 31 Dec 2023

Arguing based on phenomenological and analytical positions (especially that of John McDowell), a thinking being can only be spoken of if representations of the world are not only processed but also understood—his central thesis: a thinking being must be a bearer of some form of life. In his paper, ‘AI and Art: Arguments for Practice’, Arno Schubbach takes up the issue, rekindled by image generators such as Dall-E, of whether AI can be creative and produce art. In a fascinating recourse to the experimental works of Michael Noll from the 1960s, he argues that humans and their input are still the decisive factor in the production of art—the discussion should thus rather be about how AI can be productively integrated into creative practices.

At the same time, they need other input to guide the process of de-noising: this guidance is provided by a text prompt describing the target picture. This approach also created such a buzz in 2022 because the technology became available very quickly and is relatively easy to use, from Midjourney by the company of the same name, which was used by Jason M. Allen, to OpenAI’s DALL-E 2 and Stability AI’s Stable Diffusion, which has become the most popular model: it requires less computing power and was made openly available, so that it was used so frequently that it nearly gave rise to its own flood of images.8 Given these unquestionable advances in computer-assisted picture generation over all previous approaches and especially over the simple programs from the 1960s, it may be tempting to take up Noll’s question of ‘man or machine’ and answer it now 6 7 8 For GANs used in the field of AI-generated art, see the seminal paper Goodfellow/PougetAbadie/Mirza et al. 2014 and the overview by Maerten/Soydaner 2023, 14–17.

The authors, researchers, and developers, based at universities and research institutions, come from a wide range of disciplines such as computer science, programming, behavioural and cognitive science, linguistics and literary studies, 2 3 4 https://themuseumsai.network (all URLs here accessed in June 2023). https://livia-ai.github.io. https://www.artsmetrics.com/en/list-of-artificial-intelligence-ai-initiatives-in-museums/. Isabel Hufschmidt: Troubleshoot? neuroscience, physics, mathematics, industrial and civil engineering, industrial design, art history, cultural studies, educational and social sciences, media studies, and archaeology. Figure 1: DALL·E / OpenAI; prompt by Claudia Larcher, 2023. Topics include the use of AI for exhibition scheduling (Lee/Lin 2010), camera placement (Li 2013), security systems (Garzia 2022), conservation concepts (La Russa/Santagati 2021), acoustic comfort in exhibition spaces (D’Orazio/Montoschi et al. 2020), visitor tracking (Onofri/Corbetta 2022), visitor flow management (Centorrino/Corbetta et al. 2021), predictive analysis of tourist flows (Gao 2021), routing (Hsieh 2017) and route planning (Xu/Guo et al. 2021) for visitors, and even the creation of attractive branding (Chiou/Wang 2018) and deepfake presentations (Mihailova 2021).

Four Battlegrounds
by Paul Scharre
Published 18 Jan 2023

It can allow for models that can connect concepts across different types of data, such as explaining the content of images in words or creating new images based on text descriptions. Text-to-image models, such as OpenAI’s DALL·E and DALL·E 2, Google Brain’s Imagen, and Stability AI’s Stable Diffusion, can create new AI-generated images based on a text description. Multimodal training data can lead to models that have a richer understanding of the world. DALL·E’s concept of a “cat,” for example, encompasses pictures of actual cats, cartoon sketches of cats, and the word “cat.” Researchers have discovered that the multimodal models actually have artificial neurons tied to underlying concepts.

For an analysis of overall trends in model size, see Pablo Villalobos et al., Machine Learning Model Sizes and the Parameter Gap (arXiv.org, July 5, 2022), https://arxiv.org/pdf/2207.02852.pdf. 295multimodal models: Ilya Sutskever, “Multimodal,” OpenAI Blog, January 2021, https://openai.com/blog/tags/multimodal/; Aditya Ramesh et al., “DALL·E: Creating Images from Text,” OpenAI Blog, January 5, 2021, https://openai.com/blog/dall-e/; Aditya Ramesh et al., Zero-Shot Text-to-Image Generation (arXiv.org, February 26, 2021), https://arxiv.org/pdf/2102.12092.pdf; Alec Radford et al., “CLIP: Connecting Text and Images,” OpenAI Blog, January 5, 2021, https://openai.com/blog/clip/; Alec Radford et al., Learning Transferable Visual Models From Natural Language Supervision (arXiv.org, February 26, 2021), https://arxiv.org/pdf/2103.00020.pdf; Gabriel Goh et al., “Multimodal Neurons in Artificial Neural Networks,” OpenAI Blog, March 4, 2021, https://openai.com/blog/multimodal-neurons/; Romero, “GPT-3 Scared You?” 295Text-to-image models: Ramesh et al., “DALL·E”; Ramesh et al., Zero-Shot Text-to-Image Generation; Aditya Ramesh et al., “DALL·E 2,” OpenAI Blog, n.d., https://openai.com/dall-e-2/; Aditya Ramesh et al., Hierarchical Text-Conditional Image Generation with CLIP Latents (arXiv.org, April 13, 2022), https://arxiv.org/pdf/2204.06125.pdf; Chitwan Saharia et al., “Imagen,” Google Research, n.d., https://imagen.research.google/; Chitwan Saharia et al., Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding (arXiv.org, May 23, 2022), https://arxiv.org/pdf/2205.11487.pdf; Emad Mostaque, “Stable Diffusion Public Release,” Stability AI blog, August 22, 2022, https://stability.ai/blog/stable-diffusion-public-release; Emad Mostaque, “Stable Diffusion Launch Announcement,” Stability AI blog, August 10, 2022, https://stability.ai/blog/stable-diffusion-announcement. 295artificial neurons tied to underlying concepts: Goh et al., “Multimodal Neurons in Artificial Neural Networks” (OpenAI Blog); Gabriel Goh et al., “Multimodal Neurons in Artificial Neural Networks” (full paper), distill.pub, March 4, 2021, https://distill.pub/2021/multimodal-neurons/; “Unit 550,” OpenAI Microscope, n.d., https://microscope.openai.com/models/contrastive_4x/image_block_4_5_Add_6_0/550 295larger, more diverse datasets: Radford et al., Learning Transferable Visual Models From Natural Language Supervision. 295Gato: Scott Reed et al., “A Generalist Agent,” DeepMind blog, May 12, 2022, https://www.deepmind.com/publications/a-generalist-agent; Scott Reed et al., A Generalist Agent (arXiv.org, May 19, 2022), https://arxiv.org/pdf/2205.06175.pdf. 295interrogate the inner workings of multimodal models: Goh et al., “Multimodal Neurons in Artificial Neural Networks” (full paper). 295new ways of attacking models: Goh et al., “Multimodal Neurons in Artificial Neural Networks” (full paper). 296trend toward ever-larger AI models: Jared Kaplan et al., Scaling Laws for Neural Language Models (arXiv.org, January 23, 2020), https://arxiv.org/pdf/2001.08361.pdf.

For an analysis of overall trends in model size, see Pablo Villalobos et al., Machine Learning Model Sizes and the Parameter Gap (arXiv.org, July 5, 2022), https://arxiv.org/pdf/2207.02852.pdf. 295multimodal models: Ilya Sutskever, “Multimodal,” OpenAI Blog, January 2021, https://openai.com/blog/tags/multimodal/; Aditya Ramesh et al., “DALL·E: Creating Images from Text,” OpenAI Blog, January 5, 2021, https://openai.com/blog/dall-e/; Aditya Ramesh et al., Zero-Shot Text-to-Image Generation (arXiv.org, February 26, 2021), https://arxiv.org/pdf/2102.12092.pdf; Alec Radford et al., “CLIP: Connecting Text and Images,” OpenAI Blog, January 5, 2021, https://openai.com/blog/clip/; Alec Radford et al., Learning Transferable Visual Models From Natural Language Supervision (arXiv.org, February 26, 2021), https://arxiv.org/pdf/2103.00020.pdf; Gabriel Goh et al., “Multimodal Neurons in Artificial Neural Networks,” OpenAI Blog, March 4, 2021, https://openai.com/blog/multimodal-neurons/; Romero, “GPT-3 Scared You?”

pages: 336 words: 91,806

Code Dependent: Living in the Shadow of AI
by Madhumita Murgia
Published 20 Mar 2024

GANs are not the only tool available to make deepfakes, as new AI techniques have grown in sophistication. In the past two years, a new technology known as the transformer has spurred on advances in generative AI, software that can create entirely new images, text and videos simply from a typed description in plain English. AI art tools like Midjourney, Dall-E and ChatGPT that are built on these systems are now part of our everyday lexicon. They allow the glimmer of an idea, articulated in a few choice words, to take detailed visual form in a visceral way via simple apps and websites. AI image tools are also being co-opted as weapons of misogyny. According to Sensity AI, one of the few research firms tracking deepfakes, in 2019, roughly 95 per cent of online deepfake videos were non-consensual pornography, almost all of which featured women.2 The study’s author, Henry Ajder told me that deepfakes had become so ubiquitous in the years since his study that writing a report like that now would be a near-impossible task.

Amber Yu, a freelance illustrator, told the website Rest of the World that the video game posters she designed earned her between $400 and $1,000 a pop.16 She’d spend weeks perfecting each one, a job requiring artistry and digital skills. But in February 2023, a few months after AI image-makers such as Dall-E and Midjourney were launched, the jobs she relied on began to disappear. Instead, she was asked to tweak and correct AI-generated images. She was paid about a tenth of her previous rate. A Guangdong-based artist who worked at one of China’s leading video game companies said she wishes she ‘could just shoot down these programs.’

Endnotes INTRODUCTION 1 M. Murgia, ‘My Identity for Sale’, Wired UK, October 30, 2014, https://www.wired.co.uk/article/my-identity-for-sale. 2 J. Bridle, ‘The Stupidity of AI’, The Guardian, March 16, 2023, https://www.theguardian.com/technology/2023/mar/16/the-stupidity-of-ai-artificial-intelligence-dall-e-chatgpt#:~:text=They%20enclosed%20our%20imaginations%20in,new%20kinds%20of%20human%20connection. 3 Hanchen Wang et al., ‘Scientific Discovery in the Age of Artificial Intelligence’, Nature 620, no. 7972 (August 3, 2023): 47–60, https://doi.org/10.1038/s41586-023-06221-2. 4 Meredith Whittaker, ‘The Steep Cost of Capture’, Interactions 28, no. 6 (November 10, 2021): 50–55, https://doi.org/10.1145/3488666. 5 V.

pages: 524 words: 154,652

Blood in the Machine: The Origins of the Rebellion Against Big Tech
by Brian Merchant
Published 25 Sep 2023

Ambitious start-ups like Midjourney, and well-positioned Silicon Valley companies like OpenAI, are already offering on-demand AI image and prose generation. DALL-E spurred a backlash when it was unveiled in 2022, especially among artists and illustrators, who worry that such generators will take away work and degrade wages. If history is any guide, they’re almost certainly right. DALL-E certainly isn’t as high in quality as a skilled human artist, and likely won’t be for some time, if ever—but as with the skilled cloth workers of the 1800s, that ultimately doesn’t matter. DALL-E is cheaper and can pump out knockoff images in a heartbeat; companies will deem them good enough, and will turn to the program to save costs.

With app-based work, jobs are precarious, are subject to sudden changes in workload and pay rates, come with few to no benefits and protections, place the worker under intense, nonstop surveillance, and are highly volatile and unpredictable. And the boss is the algorithm; HR consists of a text box that workers can log complaints into, and which may or may not generate a response. The modern worker can sense the implications of this trend. It’s not just ride-hailing either—AI image-generators like DALL-E and neural net–based writing tools like ChatGPT threaten the livelihoods of illustrators, graphic designers, copywriters, and editorial assistants. Streaming platforms like Spotify have already radically degraded wages for musicians, who lost album sales as an income stream years ago. Much about Andrew Yang may be suspect, but he did predict correctly that anger would again spread like wildfire as skilled workers watched algorithms, AI, and tech platforms erode their earnings and status.

pages: 338 words: 104,815

Nobody's Fool: Why We Get Taken in and What We Can Do About It
by Daniel Simons and Christopher Chabris
Published 10 Jul 2023

Their expertise in developing sophisticated computational models is genuine, but it is not the expertise necessary to evaluate whether a model’s output constitutes generally intelligent behavior. People who make these predictions appear to be swayed by the most impressive examples of how well new machine learning models like ChatGPT and DALL-E do in producing realistic language and generating beautiful pictures. But these systems tend to work best only when given just the right prompts, and their boosters downplay or ignore the cases where similar prompts make them fail miserably. What seems like intelligent conversation often turns out to be a bull session with a bot whose cleverness comes from ingesting huge volumes of text and responding by accessing the statistically most relevant stuff in its dataset.

AI 2041: Ten Visions for Our Future
by Kai-Fu Lee and Qiufan Chen
Published 13 Sep 2021

Consider that just months after its release, people had built applications on top of GPT-3 that included a chatbot that lets you talk to historical figures, a music composition tool that finishes guitar tabs that you start, an app capable of taking half an image and completing the full image, and an app called DALL.E that can draw a figure based on a natural language description (such as “a baby daikon radish in a tutu walking a dog”). While these apps are mere curiosities at present, if the flaws above are fixed, such a platform could evolve into a virtuous cycle in which tens of thousands of smart developers create amazing apps that improve the platform while drawing more users, just like what happened with Windows and Android.

pages: 221 words: 70,413

American Ground: Unbuilding the World Trade Center
by William Langewiesche
Published 1 Jan 2002

Tra le prime affermazioni imbarazzanti ci sono state quelle di Van Romero, esperto in esplosivi e vicepresidente del New Mexico Institute of Mining and Technology, il quale ha pubblicamente dichiarato che secondo lui la responsabilità dei crolli andava attribuita a cariche esplosive piazzate in precedenza, aggiungendo che gli aerei non erano altro che un’esca per attirare le squadre di soccorso. È stato subito sommerso dalle e-mail dei teorici della cospirazione. Dopo appena una settimana ha tentato di rimangiarsi tutto, sconfessando pubblicamente se stesso e sottoscrivendo l’opinione sempre più diffusa secondo cui le torri erano crollate per l’effetto combinato degli incendi e dei danni causati dall’impatto. «Sono molto turbato...