Skip to main content

AI software can identify objects in photos and videos at near-human levels

A new AI software program developed by researchers at Google and Stanford University can recognise objects in photos and videos at near-human levels of understanding.

ai software program google stanford university object recognition technology images videos human level understanding

It was only recently that computer systems became smart enough to identify unknown objects in photographs. Even then, it has generally been limited to individual objects. Now, two separate teams of researchers at Google and Stanford University have created software able to describe entire scenes. This could lead to much better and more intelligent algorithms in the future.
Stanford's work, entitled "Deep Visual-Semantic Alignments for Generating Image Descriptions", explains how specific details found in photographs and videos can be translated into written text. Google's version of the technology, in a study titled "Show and Tell: A Neural Image Caption Generator", produced similar results.
While each team used a slightly different approach, they both combined deep convolutional neural networks with recurrent neural networks that excel at text analysis and natural language processing. The programs were able to "learn" from each new interaction, with algorithms enabling the system to improve its accuracy by scanning scene after scene, looking for patterns, and then using the accumulation of previously described scenes to extrapolate what is being depicted in the next unknown image.

ai image recognition

"The system can analyse an unknown image and explain it in words and phrases that make sense," says Fei-Fei Li, a professor of computer science and director of the Stanford Artificial Intelligence Lab. "This is an important milestone. It's the first time we've had a computer vision system that could tell a basic story about an unknown image by identifying discrete objects and also putting them into some context."
These latest algorithms are being trained on a visual dictionary – the ImageNet project – with a database of more than 14 million objects. Each object is described by a mathematical term, or vector, that enables the machine to recognise the shape the next time it is encountered. Those mathematical definitions are linked to the words humans would use to describe the objects.
“I was amazed that even with the small amount of training data that we were able to do so well,” said Oriol Vinyals, a Google computer scientist who worked with members of the Google Brain project. “The field is just starting, and we will see a lot of increases.”
In the near term, computer vision systems that can discern the story in a picture will enable people to search photo or video archives and find highly specific images. Eventually, these advances will lead to robotic systems able to navigate unknown situations. Driverless cars would also be made safer. However, it also raises the prospect of even greater levels of government surveillance.

 frisbee 
"A group of young people playing a game of Frisbee."
 

 frisbee 
"A person riding a motorcycle on a dirt road."
 

 frisbee 
"A pizza sitting on top of a pan on top of a stove."
 

Comments

Popular posts from this blog

Square’s New Apple Pay And Chip Card Reader Available To Pre-Order

Shortly after going public,  Square  announced that its new card reader is now available to pre-order on  its website  for $49. The new reader will ship in early 2016. It’s been a slow roll-out for the company’s new reader as Square first teased it at Apple’s WWDC in June. Compared to the good old Square reader that you put in your headphone jack, this one packs a few new features. First, it supports Apple Pay, and potentially other contactless payment systems. It has an NFC chip and a tokenization system for secure contactless payments. Second, the new bigger design comes with a new slot for chip cards in case you can’t pay with your phone. Finally, it’s a wireless reader that connects to your phone or tablet using Bluetooth. It has a small built-in battery and you can recharge it with a standard microUSB port. According to  Square’s website , 100 retailers are already using the new reader. But the company has yet to ship the new rea...

Report: Amazon Is Building An App To Let Normal People Deliver Packages For Pay

Amazon is apparently enlisting everyday humans in its network of endless online shopping delivery. The WSJ reports that the ecommerce giant is working on an app internally that would allow the average consumer to make a little cash by picking up Amazon packages at various retail locations and dropping them off at their final destination. WSJ’s sources did not have a timeline for the release of this product, internally called ‘On My Way,’ and were unsure whether it would launch at all. Amazon has spent years not only iterating the way it tailors your online shopping experience — the mega retailer has one of the best suggestion engines in the business — but also the way that it gets you your products with speed and convenience. Besides the standard shipping (or two-day for Prime members), Amazon has fiddled with the idea of letting Uber drivers and yellow cabs deliver products same-day, as well as using bike messengers and third-party delivery services for Prime N...

The EHang 184 Is A Human-Sized Drone Taking Off At CES

We’ve seen some pretty cool stuff on day 1 of CES 2016, but probably nothing more eye-catching than the EHang 184, a human-sized drone built by the Chinese UAV company  EHang . Yes you heard right — a giant autonomous drone that fits a human. It’s basically what you would expect to see if someone shrunk you down to the size of a LEGO and stuck you next to a DJI Inspire. Except no one was shrunk, and the giant flying machine was sitting smack in the middle of the CES drone section. EHang, which was founded in 2014 and has raised about $50M in venture fundingto date, was pretty gung-ho about telling everyone at CES that the 184 was the future of personal transport. And for the most part, people were too in awe to question them. But the reality is that the company probably was using the 184 as more of a marketing tool for their standard-sized drones like the  Ghost . Not that we’re saying that the 184 will never be a real thing, just that it probably isn’t co...

Xiaomi’s 15.6” Notebook To Cost Less Due To Older CPU & GPU

Xiaomi is, first and foremost, a smartphone manufacturer. This company tends to dabble in pretty much anything tech-related, and they will release their first notebook soon.  Inventec  has already confirmed that they’re working on (one of) the company’s notebook, and that the device is expected to arrive in April next year. Well, Inventec is working on one of the company’s notebooks, but three different ones have been mentioned, the 12.5, 13.3 and 15.6-inch models. Inventec is working on the 12.5-inch model, while Compal is rumored to be working on the 13.3-inch variant. The  15.6-inch notebook  is the most interesting one here, read on. The specifications of the 15.6-inch Xiaomi notebook have surfaced a while back, and according to that report, the device will sport a 15.6-inch 1080p (1920 x 1080) display, 8GB of RAM and will be powered by Intel’s Core i7 4th-generation SoC. Nvidia’s GeForce GTX 760M GPU is said to be included in this package as well, and...

Why Edge is the best browser for Windows 10 users?

Windows 10 comes with a whole new browser,i.e. Edge aka project SPARTAN. The good news is, EDGE is fast with a user friendly interface. The bad news is, there is not a lot to offer to Chrome users. #User_interface Project Spartan was started from scratch with a user friendly interface. The home screen of EDGE contains news-feed updates and more, just like iGoogle homepage and a welcoming search bar with the title “Where to next?” . Edge UI contains only basic controls required for general operation i.e., bold icons, wider tabs than the traditional slimmer ones in google chrome. Address bar contains buttons such as HUB giving easy access to favorite bar, reading list, history and downloads, web note button, share and more actions. Bold icons and wider tabs are easily accessible by touch users resulting in better user experience. Also, the light colored theme seems clean and simple. #Performance The best part of EDGE is, it is boosted with performance. Browser...