Internationalized Domain Names (IDN) in GoogleChrome

Google Chrome中的国际化域名(IDN)

 

Background

背景

Many years ago, domains could only consist of the Latin letters A to Z,digits, and a few other characters. Internationalized Domain Names (IDNs) were created to better support non-Latinalphabets for web users around the globe.

许多年前,域名只能由拉丁字母A到Z、数字和其他几个字符组成。创建国际化域名(IDN)是为了更好地支持全球互联网用户的非拉丁字符的使用。

Different characters from different (or even the same!) languages can lookvery similar. We’ve seen reports of proof-of-concept attacks. These arecalled homograph attacks. For example, the Latin “a” looks a lot like the Cyrillic“а”, so someone could register http://ebаy.com (usingCyrillic “а”), which could be confused for http://ebay.com. This is a limitation of how URLs are displayed in browsers in general,not a specific bug in Chrome.

来自不同语言(甚至相同语言!)的不同字符看起来可能非常相似。我们已经看到了概念验证攻击的报告。这些被称为同形异义字攻击。例如,拉丁语“a”看起来很像西里尔语“a”,所以有人可以注册http://ebаy.com(使用西里尔语“а”),这可能会与http://ebay.com混淆。这不是Chrome中的漏洞,而是URL在浏览器中显示方式局限性的问题。

In a perfect world, domain registrars would not allow these confusabledomain names to be registered. Some domain registrars do exactly that, mostlyby restricting the characters allowed, but many do not. To better protectagainst these attacks, browsers display some domains in punycode(lookslike xn--...) instead of the original IDN, according to their own IDN policies.

在理想情况下,域名注册商不会允许注册这些易混淆的域名。一些域名注册商通过限制字符,来禁止注册易混淆的域名,但许多注册商没有限制。为了更好地抵御这些攻击,浏览器根据自己的IDN策略,以punycode(看起来像xn--...)显示域名,而不显示原始的IDN。

This is a challenging problem space. Chrome has a global user base ofbillions of people around the world, many of whom are not viewing URLs withLatin letters. We want to prevent confusion, while ensuring that users acrosslanguages have a great experience in Chrome. Displaying either punycode or avisible security warning on too wide of a set of URLs would hurt web usabilityfor people around the world.

这是一个具有挑战性的问题。Chrome拥有全球数十亿人的用户群,其中许多人不访问使用拉丁字母的URL。我们希望可以防止混淆,同时确保跨语言的用户在Chrome中拥有良好的体验。在很长的URL中显示punycode或可见的安全警告,将损害世界各地人民的网络可用性。

Chrome and other browsers try to balance these needs by implementing IDNpolicies in a way that allows IDN to be shown for valid domains, but protectsagainst confusable homograph attacks.

Chrome和其他浏览器试图通过实施IDN策略来平衡这些需求,该策略允许IDN显示为有效域名,且可以防止易混淆的同形异义字的攻击。

Chrome's IDN policy is one of several tools that aim to protectusers. Google Safe Browsing continuesto help protect over two billion devices every day by showing warnings to userswhen they attempt to navigate to dangerous or deceptive sites or downloaddangerous files. Password managers continue to remember which domain passwordlogins are for, and won’t automatically fill a password into a domain that is not theexactly correct one.

Chrome的IDN政策是旨在保护用户的几种工具之一。GoogleSafe Browsing每天持续帮助保护超过20亿台设备,当用户试图跳转到危险或欺骗性网站或下载危险文件时向他们发出警告。密码管理器记住使用密码的域名,并且不会自动将密码填充到不正确的域名中。

How IDN works

IDN的工作原理

IDNs were devised to support arbitrary Unicode characters in hostnames ina backward-compatible way. This works by having user agents transform hostnamescontaining non-ASCII Unicode characters into an ASCII-only hostname, which canthen be sent on to DNS servers. This is done by encoding each domain label intoits punycode representation. This representation includes a four-characterprefix (xn--) and then the unicode translated to ASCII Compatible Encoding (ACE). Forexample, http://öbb.at is transformed to http://xn--bb-eka.at.

IDN旨在以向后兼容的方式支持主机名中的任意Unicode字符。这个工作是通过用户代理将包含非ASCII Unicode字符的主机名转换为仅限ASCII的主机名,然后发送到DNS服务器。这是通过将每个域名标签编码到其punycode表示形式来实现的。此表示形式包括一个四个字符的前缀(xn--),然后翻译成ASCII兼容编码(ACE)的unicode。例如,http://öbb.at转换为http://xn--bb-eka.at。

 

Google Chrome's IDN policy

Google Chrome的IDN政策

Since Chrome 51, Chrome uses an IDN display policy that does not take intoaccount the language settings (the Accept-Language list) of the browser.A similar strategy is used by Firefox.

自Chrome51以来,Chrome使用的IDN显示策略不考虑浏览器的语言设置(接受语言列表)。Firefox使用类似的策略。

Google Chrome decides if it should show Unicode or punycode for eachdomain label (component) of a hostname separately. To decide if a componentshould be shown in Unicode, Google Chrome uses the following algorithm:

GoogleChrome决定是否应该为主机名的每个域名标签(称为组件)分别显示Unicode或punycode。为了决定组件是否应该以Unicode中显示,Google Chrome使用以下算法:

1.    Converteach component stored in the ACE to Unicode per UTS 46 transitional processing (ToUnicode).

1.     根据UTS46过渡处理(ToUnicode),将存储在ACE中的每个组件转换为Unicode。

2.    If there isan error in ToUnicode conversion (e.g. contains disallowed characters, starts with a combining mark, or violates BiDi rules), show punycode.

2.     如果ToUnicode转换中出现错误(例如,包含不允许的字符、以组合标记开头或违反BiDi规则),则显示punycode。

3.    If there isa character in a label not belonging to Characters allowed in identifiers per Unicode Technical Standard 39 (UTS 39), show punycode.

3.     如果标签中的字符不属于Unicode技术标准39(UTS 39)标识符中允许的字符,则显示punycode。

4.    If anycharacter in a label belongs to the disallowed list, show punycode.

4.     如果标签中的任何字符属于不允许的列表,则显示punycode。

5.    If thecomponent uses characters drawn from multiple scripts, it is subject to ascript mixing check based on “Highly Restrictive” profile of UTS 39with an additional restriction on Latin. If thecomponent fails the check, show the component in punycode.

5.     如果组件使用从多个脚本中提取的字符,则需要根据UTS 39的“高度限制”配置文件进行脚本混合检查,并对拉丁语有额外的限制。如果组件检查失败,则用punycode显示组件。

·      Latin, Cyrillicor Greek characters cannot be mixed with each other

·      拉丁语、西里尔语或希腊语字符不能混合在一起

·      Latincharacters in the ASCII range can be mixed ONLY with Chinese (Han, Bopomofo),Japanese (Kanji, Katakana, Hiragana), or Korean (Hangul, Hanja)

·      ASCII范围内的拉丁字符只能与中文(汉字、注音符号)、日语(日语汉字、片假名、平假名)或韩语(韩语字母、韩语汉字)混合

·      Han (CJKIdeographs) can be mixed with Bopomofo

·      汉字(CJK表意文字)可以与注音符号混合

·      Han can bemixed with Hiragana and Katakana

·      汉字可以和平假名和片假名混合

·      Han can bemixed with Korean Hangul

·      汉字可以和韩国韩语字母混合

6.    If two ormore numbering systems (e.g. European digits + Bengali digits) are mixed, showpunycode.

6.     如果两个或多个编号系统混合(例如欧洲数字+孟加拉数字),则显示punycode。

7.    If thereare any invisible characters (e.g. a sequence of the same combining mark or asequence of Kana combining marks), show punycode.

7.     如果有任何不可见的字符(例如相同组合标记的序列或假名组合标记的序列),则显示punycode。

8.    If thereare any characters used in an unusual way, show punycode. E.g. LATIN MIDDLE DOT (·) usedoutside elageminada.

8.     如果有任何字符以不寻常的方式使用,则显示punycode。例如,拉丁的中间的点 (·) 在 ela geminada 之外使用。

9.    Test thelabel for mixed script confusable per UTS 39. If mixed script confusable is detected, showpunycode.

9.     根据UTS 39测试混合字符的标签。如果检测到混合字符可混淆,则punycode。

10. Test thelabel for whole script confusables: If all the letters in a given label belong to a setof whole-script-confusable letters in one of the whole-script-confusable scripts and if the hostname doesn't have acorresponding allowed top-level-domain for that script, show punycode. Examplefor Cyrillic: The first label in hostname аррӏе.com (xn--80ak6aa92e.com) isall Cyrillic letters that look like Latinletters AND theTLD (com) is not Cyrillic AND the TLD is not one of the TLDsknown to host a large number of Cyrillic domains (e.g. ru, su, pyc, ua). Show itin punycode.

10.  测试全脚本可混淆的标签:如果给定标签中的所有字母都属于一个全脚本可混淆脚本中的一组全脚本可混淆字母,并且主机名没有相应的允许的顶级域名,则显示punycode。西里尔字母示例:主机名аррӏе.com(xn--80ak6aa92e.com)中的第一个标签是看起来像拉丁字母的西里尔字母,并且TLD(com)不是西里尔字母,并且该TLD不是已知托管大量西里尔域名的TLD之一(例如ru、su、pyc、ua)。显示punycode。

11. If thelabel contains only digits and digit spoofs, show punycode.

11.  如果标签只包含数字和数字欺骗,将显示punycode。

12. If thelabel matches a dangerous pattern, show punycode.

12.  如果标签与危险图案匹配,将显示punycode。

13. Ifthe skeleton of the registrable part of a hostname isidentical to one of the top domains after removing diacritic marks and mappingeach character to its spoofing skeleton (e.g. www.googlé.com with é inplace of e), show punycode.

13.  如果在删除变音符号并将每个字符映射到其欺骗性结构(例如www.googlé.com用é代替e)后,主机名可注册部分的结构与顶级域之一相同,则显示punycode。

 

Otherwise, show Unicode.

否则,则显示Unicode。

 

This is implemented 

by IDNToUnicodeOneComponent() and IsIDNComponentSafe() in components/url_formatter/url_formatter.ccand IDNSpoofChecker classin components/url_formatter/spoof_checks/idn_spoof_checker.cc.

这由components/url_formatter/url_formatter.cc中的IDNToUnicodeOneComponent()和IsIDNComponentSafe()以及components/url_formatter/spoof_checks/idn_spoof_checker.cc中的IDNSpoofChecker类实现。

 

Additional Protections

额外保护

In addition to the spoof checks above, Chrome also implements a full pagesecurity warning to protect against lookalike URLs. You can find an example ofthis warning at chrome://interstitials/lookalike. Thiswarning blocks main frame navigations that involve lookalike URLs, either as adirect navigation or as part of a redirect.

除了上面的欺骗检查外,Chrome还实现了全页面安全警告,以防止类似的URL。您可以在chrome://interstitials/lookalike上找到此警告的示例。此警告会阻止涉及相似URL的主框架导航,无论是作为直接导航还是重定向的一部分。

The algorithm to show this warning is as follows:

显示此警告的算法如下:

1.    If thescheme of the navigation is not http or https, allow thenavigation.

1.     如果导航方案不是http或https,则允许导航。

2.    If thenavigation is a redirect, check the redirect chain. If the redirect chain issafe, allow the navigation. (See Defensive Registrations section for details).

2.     如果导航是重定向,将检查重定向链。如果重定向链是安全的,则允许导航。(详情请参阅防御性注册部分)。

3.    If thehostname of the navigation has at least a medium site engagement score, allowthe navigation. Site engagement score is assigned to sites by the Site Engagement Service.

3.     如果导航的主机名至少具有中等网站参与度分数,则允许导航。网站参与度分数由网站参与服务分配给网站。

4.    If thehostname of the navigation is in domains.list,allow the navigation.

4.     如果导航的主机名在domains.list,则允许导航。

5.    If the userpreviously allowed the hostname of the navigation by clicking “Ignore” in the warning, allowthe navigation. Currently, user decisions are stored per tab, so navigating tothe same site in a new tab may show the warning.

5.     如果用户之前通过单击警告中的“忽略”允许导航的主机名,则允许导航。目前,用户决策按选项卡存储,因此在新选项卡中导航到同一网站可能会显示警告。

6.    If thehostname has the same skeleton as a recently engaged site or a top 500 domain,block the navigation and show the warning.

6.     如果用户之前通过单击警告中的“忽略”允许导航的主机名,则允许导航。目前,用户决策按选项卡存储,因此在新选项卡中导航到同一网站可能会显示警告。

 

All of these checks are done locally on the client side.

所有这些检查都是在客户端本地完成的。

 

Defensive Registrations

防御性注册

Domain owners can sometimes register multiple versions of their domains,such as the ASCII and IDN versions, to improve user experience and preventpotential spoofs. We call these supplementary domains defensive registrations.

域名所有者有时可以注册其域名的多个版本,如ASCII和IDN版本,以改善用户体验并防止潜在的欺骗。我们称这些补充域名为防御性注册。

In some cases, Chrome's lookalike warning may flag and block navigationsto these domains:

在某些情况下,Chrome的相似警告可能会标记和阻止导航到以下域名:

·      If one ofthe sites is in domains.list butthe other isn't, the latter will be blocked.

·      如果其中一个网站在domains.list中,而另一个不在,后者将被阻止。

·      If the userengaged with one of the sites but not the other, the latter will be blocked.

·      如果用户使用其中一个网站,但不与另一个网站互动,后者将被阻止。

Avoiding a lookalike warning on your site

避免在您的网站上出现相似警告

Domain owners can avoid the “Did you mean” warning by redirecting theirdefensive registrations to their canonical domain.

域名所有者可以通过将防御性注册重定向到主域名来避免“你是说”警告。

Example: If youown both example.com and éxample.com and the majority of your traffic is to example.com, you can fix the warning by redirecting éxample.com to example.com. The lookalike warning logic considers this a safe redirect and allowsthe navigation. If you must also redirect http navigationsto https, do this in a single redirect such as http://éxample.com -> https://example.com. Use HTTP 301 or HTTP 302 redirects, the lookalikewarning ignores meta redirects.

示例:如果您同时拥有 example.com 和 éxample.com,并且您的大部分流量都流向了 example.com,您可以通过将éxample.com 重定向到 example.com 来修复警告。相似警告逻辑认为这是一个安全的重定向并允许导航。如果您必须将 http 导航重定向到 https,请在单个重定向中执行此操作,例如 http://éxample.com ->https://example.com。使用 HTTP 301 或 HTTP 302 重定向,相似警告会忽略元重定向。

 

Reporting Security Bugs

报告安全漏洞

We reward certain cases of IDN spoofs according to Chrome's Vulnerability Reward Program policies. Please see this document before reporting a security bug.

我们会根据Chrome的漏洞奖励计划政策奖励某些IDN欺诈案件。在报告安全漏洞之前,请参阅此文档。

 

Poweredby Gitiles| Privacy

由Gitiles提供支持|隐私

 

(译文仅供参考)