Microsoft’s computer vision model will generate alt text for Reddit images

Two years ago, Microsoft announced Florence, an AI system that it pitched as a “complete rethinking” of modern computer vision models. Unlike most vision models at the time, Florence was both “unified” and “multimodal,” meaning it could (1) understand language as well as images and (2) handle a range of tasks rather than being limited to specific applications, like generating captions.

Now, as a part of Microsoft’s broader, ongoing effort to commercialize its AI research, Florence is arriving as a part of an update to the Vision APIs in Azure Cognitive Services. The Florence-powered Microsoft Vision Services launches today in preview for existing Azure customers, with capabilities ranging from automatic captioning, background removal and video summarization to image retrieval.

“Florence is trained on billions of image-text pairs. As a result, it’s incredibly versatile,” John Montgomery, CVP of Azure AI, told TechCrunch in an email interview. “Ask Florence to find a ...

Read Entire Article