AI-powered personalization in E-commerce and Fashion industries: use cases and enable technologies
9 min readNov 7, 2022


#AI #FashionDataset #personalization #EcommerceDataset #Dataset

Imagine you have been scrolling through Instagram and suddenly stopped on a photo of your friends’ group shot. What stopped you were a delightful velvet blue dress on a strange lady from that group photo.

You love it and want to know more about the dress ASAP. You can certainly ask your friend to ask that lady for details. Or you can just simply snap a screenshot of that woman and head to online shopping platforms.

You upload the screenshot to the platform and with just one click, all the visually similar dresses popped out.

Or if you are looking for more inspiration from this dress, online shopping platforms can also help you to find matching shoes, bags and accessories.

That’s exactly what online shopping should be — fast, easy, and personalized.

Not to mention, online shopping platforms nowadays are becoming more like personal assistants to each customer.

Thanks to artificial intelligence, normally known as AI, brands and shopping platforms now have the ability to turn every customer’s engagement and interaction into meaningful and rewarding experiences.

AI-powered technologies empower brands to step up their game by offering what customers demand: personalization.

Photo by on Unsplash

In this article, we will talk about:

1. what is personalization in E-commerce & fashion?

2. How AI boosts personalization in the E-commerce and Fashion industries?

3. The enabling technologies make things happen

4. The Role of Data in E-commerce Personalization.

what is personalization in E-commerce & fashion?

Well, if we use one short sentence to explain what is personalization for E-commerce and Fashion, it could be like this: customers expect brands and retailers to recognize them and treat each of them as a VIP while shopping online.

These days, customers switch loyalties based on the experiences brands and online shopping platforms offer, such as the convenience and accuracy of product discovery, the understanding of personal interests, the intelligence of search optimization, and so on.

With the power of artificial intelligence (AI), brands and online platforms are finally able to offer tailored services and serve relevant content to customers at scale and enjoy more profitable business relationships, most importantly, conversion rates and product sales.

Photo by S O C I A L . C U T on Unsplash

How AI boosts personalization in the E-commerce and Fashion industries?

From text search to image search, interest discovery, then content-based recommendations and virtual fitting, AI technologies play key roles in personalization in E-commerce and Fashion.

There are several fundamental technologies that make the online shopping experience getting more “personal”.

Text search

Text search is a kind of early-stage technology. When people use the browser and online app shopping software, sometimes they do not know which specific clothing to choose, and usually, input text for retrieval according to their own needs.

However, there are a lot of fake items’ titles and descriptions which don’t match image contents on the Internet. So in order to help customers find more accurate and specific requests, this technology is invested to realize the pairing of text and clothing.

How does it work? The technology can enhance product tags with AI-powered deep tagging. For instance, image-based tags from vertical-specific attributes and their synonyms.

Visual search

As we mentioned early, people can purchase a product based on a real-life photo or screenshot. Visual search is one of the simplest ways for shoppers to find what they’re looking for and shorten their path to purchase.

That is how visual AI works.

It also solves a problem that is always overlooked. Shopper’s knowledge of product terminology always limits their ability to search.

So, that is probably the reason why “image search” for online shopping has become a popular product discovery tool for people, especially after nearly three years of the COVID-19 pandemic.

Shoppers also use visual search to find additional and visually similar items based on a product they spot.

For instance, a watch brand, H. Samuels, the website, they offer a link under each product image on the website, showing the “Shop similar” option. After clicking the link, a new page will show all the watches similar to the original one you searched for.

However, “image search” technology sometimes has its limitation. It might not focus on the parts of the clothing from an uploaded image, and the background is probably complex. So the result might come out incorrect. So that’s why high-quality data is the key for brands and online shopping platforms. We will discuss this part at the end of this article.

Fashion Image Caption

As there are a large number of clothing pictures on the Internet, it would be a huge project to label out the characteristics of clothing pictures one by one manually. If we can automatically generate some descriptions of pictures through deep learning, it will greatly improve efficiency.

Content-based Recommendation

Shopping apps often need to recommend users’ preferences, classify them automatically according to the pictures they browse, and then recommend similar clothes to users to promote consumption.

For example, a shopper has already purchased a dress. the data for that dress is noted by the system.

We are not living in just the world with AI. The world has successfully entered the era of advanced AI.

AI also plays important roles in fields like virtual fitting and fake Detection. Here are two articles from for more Info.

The enabling technologies make things happen

1. Image Classification

Through the image classification technology, the clothing such as: skirts, short sleeves, hats, shorts and other categories are automatically classified. The corresponding application scenarios of image classification are: automatic clothing sorting, content recommendation attribute classification, etc. Classification technology can also be divided into: classification based on clothing style, classification based on clothing attributes.


Classification is based on clothing styles, often targeting categories for specific clothing such as shoes, tops, pants, etc. The pattern of feature extraction network, fully-connected network, and multi-classification loss is often used to recognize clothing classification. Feature extraction networks include the VGG series, Resnet series, and Inception series. The Loss function uses Focal Loss and multi-class cross-entropy Loss, depending on how balanced the dataset is.

2. Image segmentation

For a fashion picture, the image segmentation technology can distinguish the clothes, pants, and various accessories of the figure in the picture at the pixel level. In the virtual fitting scene, it is often necessary to identify and segment the clothes worn by the experiencer, so as to replace the virtual clothes more accurately.

Mask R-CNN ,from[1]

Mask R-CNN[1] is often used as the Baseline network structure for image semantic segmentation task. Mask R-CNN can be decomposed into Resnet-FPN, RPN, Faster RCNN ,and Mask branch. Resnet-FPN performs convolution operations on the input images and extracts features, and then obtains feature images of different scales through the FPN layer. RPN selects the most suitable feature map for the candidate region. Finally, the network is divided into two branches, one is the traditional Faster RCNN detection and classification branch; In addition, the Mask prediction task is completed for the unique semantic segmentation branch of the Mask network.

3. Cross-Modal Retrieval And Generation

Cross-modal technology in the Fashion field can be divided into two aspects: generating text descriptions through images, and retrieving relevant clothing pictures from massive data through descriptive text (cross-modal retrieval). Cross-modal techniques often require massive image text descriptions and large-scale training of paired images. The commonly used framework of this kind of task is: by extracting image side features and text side features and optimizing the spatial distance of related modes in the same subspace, the spatial distance of unrelated modal features is gradually approaching, and the irrelevant modal features are gradually estranged.

W2vvpp[2] is a representative work of cross-modal retrieval. By using Bow, Lstm, and W2V at the text end to extract text information and splicing, and then reduce the dimension through the full connection layer, in order to unify with visual features. At the visual end, features were extracted by Resnet and Resnext, and then splicing, and dimensionality were reduced through the full connection layer. Finally, the similarity of the two modal features is calculated to narrow the spatial distance and finally realize the mutual retrieval of different modes. Similarly, Caption is the inverse of retrieval.


4. Image Generation

Image generation can be divided into conditional generation and unconditional generation. Conditional generation provides examples of clothing to generate images of similar styles, represented by pix2PIx models. Random disturbance is used as input to generate more diversified images such as DCGAN and ProGAN.

In most generating tasks, antagonism loss is used as loss function, and reconstruction loss is added in some conditional generating tasks. The realization of anti-loss needs a discriminator and a generator as the medium. The generator generates more realistic pictures to mislead the discriminator, and the discriminator tries to distinguish the real picture and the generated picture. When the two reach Nash equilibrium, the realistic generation effect can be achieved. Since the conditional generation task usually has the reference image generated by the target, it is necessary to calculate the L1 loss between the generated image and the reference image pixels to complete the optimization.

Image Generation from

5. Detection

Detection can be divided into target detection and key point detection, and both of them are coordinate points that need to regression clothing according to the picture. It has great technical value in clothing search and recognition.

Yolo[3] is a typical algorithm for target detection. The input first divides an image into a grid, and if the center of an object falls in the grid, the grid is responsible for predicting the object. Each network needs to predict the location information and confidence information of a BBox, and one BBox corresponds to four location information and one confidence information. Confidence represents the confidence of the predicted box containing object and how accurate the box is. The target window can be predicted according to the previous step, and then the target window with low possibility can be removed according to the threshold value. Finally, the NMS can remove the redundant window.


The Role of Data in E-commerce and Fashion Personalization

What’s at the core of these complex AI technologies?


Most of the deep learning models adopt the data-driven way to conduct supervised optimization training, and providing diverse and accurate datasets for the models is the cornerstone of the subject research.

Therefore, collected several open datasets and commercial datasets for you and your AI models. Hope these datasets help.

Check this article out to read more.

👉 AI-powered personalization of E-commerce and Fashion: open and commercial datasets

Reference List

  1. He, Kaiming, et al. “Mask r-cnn.” Proceedings of the IEEE international conference on computer vision. 2017.
  2. Li, Xirong, et al. “W2vv++ fully deep learning for ad-hoc video search.” Proceedings of the 27th ACM International Conference on Multimedia. 2019.
  3. Redmon, Joseph, et al. “You only look once: Unified, real-time object detection.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  4. He, Kaiming, et al. “Deep residual learning for image recognition.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. Open Datasets:

1. Unleash your creativity with this AI Photo & Video Editing Open Dataset. Explore the power of AI in transforming visual content!

2. Fuel your research and analysis with a diverse range of open datasets. Discover a world of data at your fingertips!

3. Dive into the world of fashion and e-commerce with this open dataset. Gain insights into the latest trends and revolutionize your strategies!


-- is committed to providing professional, agile and secure data products and services to the global AI industry.