With the launch of ChatGPT in November 2022, Generative AI quickly became the biggest new technology trend emerging in recent years. The focus and surrounding excitement were mainly placed on its ability to generate text and carry on a perfect conversation.
However, months before the launch of ChatGPT, major strides had already been made on the front of image-generating AI.
The emergence of AI Image Generators
AI Image Generators play a key role in the current state of media generation technologies. These tools synthesize imagery by learning from large collections of images. To achieve this, the image generator model must understand patterns, styles, and features of imagery of interest. Given a user input, the model aims to create images matching the visual attributes of those images it learned from. In the world of AI image generation, the two standout methods for constructing these models are generative adversarial networks (GANs) and Stable Diffusion models.
Stable Diffusion models outperform GANs in image quality due to their ability to capture complex data distributions effectively. However, just like GANs, they generate images randomly, and demand substantial computational resources during the generation process.
However, researchers continue investigating methods for reducing training costs and time and improve resolution of generated images. Moreover, the versatility of these models has expanded, and they can accept various types of input from text to other images. For example, when provided with a describing sentence as input, a diffusion model can generate an image roughly aligning with a textual description. When the input is a mask, the model can edit specific areas within an image, which opens exciting possibilities for image customization.
As we observe the evolving landscape of AI image generation, it becomes evident that these technologies hold substantial promise for businesses, offering exciting prospects to augment visual content quality, increase productivity, and enable personalization.
Application Within Visual Commerce
With 88% of consumers considering clear product images as the key element within their shopping experience, e-commerce cannot exist without visual aids to advertise the products in question. The complexity around providing visual aids ranges from simple product images to infographic overlays and now – with the emergence of new image generating technologies – visualizing the product in question in a customer’s home or on their body as if they were wearing the product. With the continuous and growing adoption of online shopping, visual aids need to bridge the gap between in-store or at-home try-ons and having to decide on a product online.
The following are three examples that can provide a better customer experience through the utilization of AI image generators – presented from Home Depot’s perspective, the world’s largest home improvement retailing company. Besides well-known products such as building materials and garden products, the Home Depot also offers a wide variety of home décor products where visual aids play a significant role during the online shopping experience.
As previously mentioned, image customization represents an exciting frontier in AI image generation. One promising application area involves the virtual repainting of walls or applying wallpaper within a customer's home. This technology enables individuals to visualize how various color options might appear in their specific space, considering factors like lighting conditions and room layouts. Traditional painting can be costly and time-consuming, often requiring multiple attempts to find the right color. Conversely, leveraging image editing capabilities empowers customers to swiftly generate visuals of their space with their chosen colors or patterns, streamlining the decision-making process.
Compatible product visualization
When shopping for multiple products, especially those in home décor, customers want to be assured that these items are visually compatible. AI image generators offer a rapid solution for assembling these product images and presenting them in an aesthetically pleasing format. This helps customers gain a clear sense of how these items complement each other, thereby increasing their confidence in making individual purchases or exploring additional products within the same product line.
Producing creative online and marketing content demands substantial manual labor from artists and designers to attain visually appealing quality. AI-driven image generators have the potential to enhance and streamline this process, either by augmenting or replacing manual design tasks. Additionally, their stochastic characteristics can boost the diversity of creative content, resulting in a broader range of visual materials.
Data Scientist Skills and Tools
Clearly, there are various areas of application within the e-commerce space for AI image generators. However, technology (and government regulations – more on that later) still have to bridge some existing gaps:
- Firstly, the quality of generated images is not 100% accurate yet. Even though the latest machine learning models achieved significant improvements in this regard, objects are still not always realistically represented. Either the shape of the object or the color could be significantly off compared to reality. This mismatch is caused by the fact that these generative machine learning models are trying to approximate the true distribution and representation of all kinds of images. Approximation, of course, is not perfect. Thus, censoring the quality of the images might undermine its scalability.
- Secondly, AI image generators must not only create realistic images, but also be able to follow enterprise rules when creating these images. Image generators currently rely on a few keywords to create an image, but more detailed instructions might be necessary to ensure that the image does not include any content that conflicts with the brand's values or positioning.
- Finally, AI image generators come with a potential risk of ethical issues surrounding security, data privacy and copyright. There are currently no clear regulations on who owns the output of an AI image generator – or any other large language model for that matter. These considerations might slow down the deployment of this new technology until a set of mature ethics review processes is in place.
There are two major directions to keep an eye on to unlock the limitations and challenges discussed above: technology research and enterprise adaptation. With a strong push coming from the large tech cloud providers, there is a lot of research still ongoing to further unlock the power of diffusion models by improving image quality and lowering training costs.
In terms of enterprise adaptation, you already see a few advancements in terms of visual aids popping up across different e-commerce platforms. However, due to the above technological hurdles, use cases need to be tested vigorously and rolled out carefully to avoid negative effects on the customer experience. Both technological improvements and enterprise adaptation will go hand in hand – and we are excited to participate in both developments!nce life cycle, the data analyst identifies and collects data from all the data sources in this data analytics step. Converting data to a format that is ready for intended statistical analysis and visualization is called data preparation.
Further reading and sources:
 D. P. Kingma and M. Welling, "Auto-encoding variational bayes," arXiv preprint arXiv:1312.6114, 2013.
 I. Goodfellow et al., "Generative adversarial networks," Communications of the ACM, vol. 63, no. 11, pp. 139-144, 2020.
 J. Ho, A. Jain, and P. Abbeel, "Denoising diffusion probabilistic models," Advances in Neural Information Processing Systems, vol. 33, pp. 6840-6851, 2020.
 R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image synthesis with latent diffusion models," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 10684-10695.