Images and videos have become the favorite content types in this digital world. Most of the data created by users are images, videos, and audio produced from mobile phones and shared via social media.
Operating systems, on the other hand, have not improved to handle the images, video, and audio content types natively. Native actions/commands are a part of the OS and do not require any specific software to be installed. (e.g., basic cut, copy, and paste actions are native to the operating systems).
What is covered in this article?
This article focuses on envisioning native image processing capabilities in Windows OS. I will look at the image files but the same can be applied to the video content. I will cover the current state, user needs, possible solutions, and pitfalls.
What is not covered in this article?
This article doesn’t cover the user journey to bring/upload photos, and videos to a Windows PC or laptop. This is an important callout as most of the audio/video/image content is generated from smartphones that run Android(Google) or iOS(Apple) which are not controlled by Microsoft.
Current State
All the image-processing workflows in Windows are designed to view and edit photos sequentially (i.e.one by one). This quickly becomes non-scalable and inefficient as users now have 100 times the volume of images to handle. Working with these archaic workflows is a pain.
I generally get two arguments against it:
- There is software (like Adobe Photoshop) to handle these complex image-processing workflows/tasks. Users who need these complex workflows can use advanced software to get their work done.
- Let us keep the OS simple. Putting these complex options natively in Windows OS will confuse normal users.
My response:
- I love the advanced photo editing tools, but they target a different user segment (i.e., advanced users with specific needs and complex workflows). The idea is not to replace or compete with advanced photo editing tools. I am concerned about Windows OS handling the tectonic changes in the image/photo market in the last ten years.
- The fact that users can not do much with their images (which run into thousands) today on the Windows OS means that it is either too complex or cumbersome for users. My proposed solution helps keep the OS simple and brings attention to the basic needs of the users. Microsoft should avoid historical mistakes- it has already lost a few battles in the browser and music player domain as it didn’t anticipate the changing user needs.
Users can be segmented into two buckets:
- Normal users – ~80% of users who are not experts in image processing. They love to work with the default settings/workflows. They do not want to install any software or spend time doing any complex image editing tasks but want to see the OS do the job without their input. They are not looking for perfect outcomes but are happy if they get ~95% accurate results. They love to see the magic (aka default/automatic processing). I want to focus on these users as they are finding themselves in this crazy situation where they are getting hundreds of photos every day but do not know how to handle them.
- Advanced users – ~20% of users who are experts/professionals in image processing. They need impeccable results, have access to the best image processing software, and want to spend time editing their images. I do not want to focus on this user group as their needs are too specific to be a native feature of the OS.
Needs and the pain points of “Normal users”:
They are very clear about the problem they are facing. They have a lot of images and videos that they are not able to manage with the current features in Windows OS (example screenshots below).
It is visible from the above screenshots that Windows OS keeps a “single image” (rather than a group of images) as the central entity. It was not envisioned that normal users can have thousands of photos to go through.
For example, it takes ~10 seconds to apply the ‘Enhance your Photo’ feature to a photo. If a user has 200 photos, then the user has to spend ~2000 seconds (i.e., 33 minutes) just for this trivial task. Obviously, the users are right in complaining about it.
Proposed Solution(s)
One can segment the user actions with the image content into six buckets (as showcased in the image lifecycle). The solution will revolve around solving user needs in these buckets.
Any solution designed to solve this problem should have the following traits:
- Ability to perform actions on a “group of images” rather than a “single image”.
- It should have default options/workflows that take care of ~95% of user needs. Users should not be required to modify any settings.
- It should learn from user behavior (i.e., AI-powered).
- It should be able to connect to the internet to get the latest AI model(s) and social media trends.
- It should be able to connect to social media platforms (I will not put much focus on it as it requires a separate blog post).
Let’s look at individual buckets of the image lifecycle now:
- Load:
- Definition: Task to load(copy) images to a folder in Windows OS.
- Native actions required: Windows OS already has good native actions for it. So no need to redesign it.
- View:
- Definition: Bucket of tasks to view images once they are loaded on Windows OS.
- Native actions required: Once the photos are loaded, the Windows OS magic should begin automatically (without any user action). These native magic actions should create “projections” that do not replicate images but create “views” for users.
- Group photos into albums based on metadata and computer vision models. e.g., club photos of a birthday party in one album, combine photos of a trip/location, and group photos of an individual together. This grouping should keep improving as the OS gets more images from users.
- Create an age graph of an individual. For example, how an individual has changed in the last 5 to 7 years. These types of groupings are dependent on social media trends. The system should be able to update itself basis the key social media trends.
- Auto-create short videos from the photo(s)/videos by combining them and adding music to them. Make funny videos and memes as per the latest social media trends.
- I can list multiple other views, but the idea is to show magic to users by bringing life/magic to their images. It will let users spend more time on the Windows OS.
- Edit:
- Definition: Bucket of tasks for editing the photos
- Native actions required: In a good solution, there should not be a need for users to edit the photos. “Views” should be able to do the job for all the user needs. The increase in the use of “views” and the decrease in the need for “edits” is the true test of the proposed approach.
- Windows OS should learn from the user behavior and put the most used options for image editing in the right-click menu.
- Ability to edit the photos in bulk (e.g., putting filters on photos in one go, auto-enhancing all the photos in one go).
- Making videos from a selection of photos by just inputting the type of video required. For example, creating a “birthday wishes” video or a “Happy holidays” video from a selection of photos. It will rely on AI models learning from user behavior and data.
- Remove duplicate photos, and size reduction, put the logo, and other similar tasks.
- Both the “View” and the “Edit” buckets rely heavily on AI capabilities.
- Share:
- Definition: Bucket of tasks to share photos from Windows OS.
- Native actions required: Users want to share their photos with their network. Sharing images should become a native action.
- Share via a social platform, sharing privately, or share via the cloud, among other options. I will not dive deeper into it as it is a complete topic in itself.
- Store:
- Definition: Bucket of tasks to store the photos on Windows OS PC/laptop.
- Native actions required: Compared to other platforms, Microsoft OS has an edge here as PCs have a large amount of (free)storage compared to mobile phones and cloud platforms.
- Windows OS should store the “raw” images and the metadata associated with them so that changes made by the users are not lost (one can think of it like a Git repo for photos).
- Storing images in this form becomes space efficient and helps create “infinite content” with limited storage. It is CPU-hungry and needs to be balanced/optimized based on user behavior.
- Delete:
- Definition: Bucket of tasks to delete photos
- Native actions required: In a way, it is a sub-part of the “edit” bucket. So most of the native tasks for deletion should follow the principles of the edit bucket.
- Users delete photos when there is a lack of storage space, unnecessary content (duplicate, wrong photos), or when some memory/event needs to be erased.
- OS should give recommendations on size reduction(with quality reduction) in case of storage issues before deleting a memory completely. Photos that are not used should be the first candidates for size reduction. It massively improves storage space utilization.
- In case of duplicate photos/junk content, the system should offer recommendations for bulk deletion to let the user get done with it quickly.
Pitfalls and Risks
Like with any new project, there are a few pitfalls to consider here (being an optimist, I would say that the benefits outweigh the risks).
- Imitation– Avoid copying features of other platforms.
- Too late to the game– Microsoft is too late to the game, and other tech companies have already solved it. I would argue Google still struggles with a 15GB storage limit for free accounts. Apple has built a paid ecosystem around it. Facebook has not been successful in this area.
- Why should users put the images on Windows OS PC/Laptop?– I decided not to discuss it in this article (it requires another blog post). It is a significant risk as content is generated from smartphones that run Android(Google) or iOS(Apple) which are not controlled by Microsoft.
- It can get complex for users– Handling hundreds of images and processing them with AI to generate ‘magical results’ can confuse normal users.
- Difficult problem to modify the OS– It is not simple to make changes to an OS. It is not as simple as adding a patch to the OS but tinkering with the core of the OS.
Benefits to Microsoft
- 80% use case handling– Microsoft has basic tools like Paint, Notepad, Calculator, and many others to provide fundamental “80% use case” capability. It is in line with the image workflows discussed in this article, i.e., it lets 80% of users solve their base case with image handling.
- Achieve more– Microsoft wants to put the power of technology in the hand of users to help them achieve more. Ideas discussed in this blog post align with the core values of Microsoft. Users can feel the magic of Microsoft technology using these native actions.
- Cost efficient– It is super cost-efficient for Microsoft as all the computing and storage will happen on the user’s machine.
- Platform Usage– Improvement in Microsoft Windows OS usage statistics as users would rely on Windows OS for their image and video needs. It also helps in promoting the Microsoft brand.
- AI Capabilities– It gives Microsoft access to large amounts of real-world image and video data to train AI models for Enterprise use (e.g., computer vision services on the Azure platform).
Disclaimer: https://vinaysachdeva.com/disclaimer/. The opinions expressed in the blog post are my own and do not reflect the view(s) of my employer.