Device Detection with User Agent Strings

Created:

Device detection is hard but sometimes necessary – user agent strings are not reliable and difficult to parse. There's more rant than solution in this post.

A first look at this problem brought me to a question: how come (nearly) all user agent strings start with Mozilla???

It's funny, because many colleagues around me don't know, but Wikipedia knows, that

During the first browser war, many web servers were configured to send web pages that required advanced features, including frames, to clients that were identified as some version of Mozilla only. Other browsers were considered to be older products such as Mosaic, Cello, or Samba, and would be sent a bare bones HTML document.

My childish brain will render this as

server-side sniffing and client-side spoofing on the battle of user agent strings

In fact, the matter about User Agent is...

a complicated dance between server-side sniffing to provide the right experience for the right devices on the one hand, and client-side spoofing in order to bypass incorrect or inconvenient sniffing on the other

Detecting device using user agent is sometimes unavoidable

The MDN guide on device detection using user agent strings also advises to try to avoid using user agent strings when possible:

Unfortunately, sometimes detecting device with UA string is unavoidable. For instance, many apps (e.g., Amazon) ship a separate site for cellphones v.s. desktop and / or tablets. This is mostly done at server side to avoid nasty redirect on client side. And then, most likely, we write a nginx configuration block to ping the user agent strings for user device. In this case, we don't even have browser APIs with us.

Existing solutions

There are many existing solutions for detecting device from user agent strings. There is even paid API solution. If you are in a browser environment, ua-parser-js is a good option.

If you are looking at some very basic facts, say, you only want to know whether it's a mobile device or not. There is, however, no quick solution. MDN guide recommends testing for Mobi for mobile devices:

common browser user agent strings

I'm not sure how many people are actually using it in production. The time I look now, as of Dec 2020, it'd run into troubles as newer iPads are complicating the situation.

iPadOS has two different flavors of UA strings, one says it's mobile, the other pretends to be a Macintosh

After iPadOS, you will likely encounter two different flavors of user agent strings, both from iPads:

For the newer iPads that install iPadOS, browsers like Safari and Firefox for iPad will send a user agent string that looks like this:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.1 Safari/605.1.15

This is nearly inseparable from MacOS desktop, except it has a Version/x.y.z string at the end. The rationale behind this, I guess, is that Apple intends for iPads to be a device to replace computers. So they separated the operating system iPadOS and made it act more like a computer. And this created many unhappy developers.

For older iPads before iPadOS, and Chrome browser on newer iPads, you will likely see a user agent string that looks like this:

Mozilla/5.0 (iPad; CPU OS 14_3 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/87.0.4280.77 Mobile/15E148 Safari/604.1

This is a situation that really makes me feel I'm not very smart. As in:

To differentiate iPad from Macintosh: First you test for iPad, this will give you older models and some browsers on newer models. But then this is not enough, because for the other newer models, the only difference is with the Version/ string.

Now what about iPad v.s. mobile? We can test for Mobi, as suggested by MDN. But then this will leave out older iPads whose user agent strings hit iPad and Mobi.

🤷🏻‍♀️

What about other tablets

After learning from ua-parser-js, I conclude that the most accurate strategy for such detection is to maintain a comprehensive mapping of UA string detectors, where each detector tries to be as tight as possible. For example, Samsung tablets will look like:

/android.+((sch-i[89]0\d|shw-m380s|gt-p\d{4}|gt-n\d+|sgh-t8[56]9|nexus 10))/i,
/((SM-T\w+))/i

Here's the source code for the full list. I think it's really doing a great job, nearly all device manufacturers are included, and they parse for type of device as well. And their issues will highlight current problems.

But, erm, I can't use this directly on my Nginx configuration, to achieve this this I'm learning lua now, looks like there's no simple command for case insensitive regex matching... Good luck to myself.

User agent client hints?

Recently, Google is pushing forward a proposal about User-Agent Client Hints, which gains support from other browser vendors as well. The idea is, instead of an unformatted single user agent string, server may ask and client may send separated bits of information hints, like platform, CPU arch, browser & version, whether it's mobile or tablet or desktop, etc.

You can try their demo on Chrome Beta >84. It looks like this:

User-Agent Client Hints demo

Although they won't be in time to solve my today problem, I still wish that they ship this.

Links