«

亚马逊AWS服务中断导致Alexa、Snapchat、《堡垒之夜》、Venmo等多款应用瘫痪。

qimuai 发布于 阅读:9 一手编译


亚马逊AWS服务中断导致Alexa、Snapchat、《堡垒之夜》、Venmo等多款应用瘫痪。

内容来源:https://www.engadget.com/big-tech/amazons-aws-outage-has-knocked-services-like-alexa-snapchat-fortnite-venmo-and-more-offline-142935812.html?src=rss

内容总结:

亚马逊云服务突发大规模故障 全球互联网服务遭遇“断网潮”

当地时间10月19日至20日,亚马逊旗下云计算服务平台AWS突发大规模服务中断,导致全球众多依赖其云服务的网站、应用程序和在线平台陷入瘫痪。此次故障波及范围之广,被业内专家形容为“半个互联网陷入临时失忆”。

据监测,从美国东部时间19日凌晨3时11分开始,AWS位于弗吉尼亚州北部的主要服务区出现API错误率激增和响应延迟。故障核心被确定为区域级DynamoDB数据库服务的DNS解析异常,导致包括亚马逊智能助手Alexa、支付软件Venmo、社交应用Snapchat、游戏《堡垒之夜》等热门服务出现登录困难、交易中断或响应迟缓等问题。

尽管AWS在20日凌晨6时35分宣布完全修复DNS故障,但连锁反应持续发酵。截至当天下午,包括弹性计算云(EC2)在内的多项核心服务仍受影响,亚马逊不得不采取新实例启动速率限制措施以辅助系统恢复。直至晚间6时53分,AWS才正式宣布所有服务恢复正常运行。

此次事件再次引发对互联网基础设施集中化风险的关注。数据显示,AWS占据全球云基础设施市场约30%份额,其美国东部1区更是承载着大量互联网企业的核心业务。圣母大学信息技术教授迈克·查普勒指出:“这就像大型互联网服务遭遇了临时性失忆——数据完好存储,但数小时内无人能访问。”

随着数字化进程加速,本次大规模服务中断为全球科技行业敲响警钟:当互联网命脉系于少数云服务巨头时,如何构建更具韧性的数字基础设施已成为亟待解决的课题。

中文翻译:

亚马逊云服务突发大规模中断,导致Alexa、Snapchat、《堡垒之夜》、Venmo等众多平台瘫痪。这场严重故障揭示出:将大部分互联网服务构筑于少数科技巨头的云基础设施之上,实则暗藏风险。

从10月19日清晨至20日凌晨,互联网仿佛陷入半瘫痪状态。亚马逊云服务(AWS)的严重故障导致海量网站、应用程序、游戏及其他依赖亚马逊云平台的服务集体"停摆"。受影响的应用包括Venmo、Snapchat、Canva和《堡垒之夜》等众多热门软件,就连亚马逊自家的语音助手Alexa也出现卡顿。如果你曾感觉整个互联网都在与你作对——这并非错觉。好消息是,亚马逊于10月20日东部时间18:53宣布已解决"AWS服务错误率升高与延迟加剧"的问题。

亚马逊官方声明指出:"本次事件由区域级DynamoDB服务端点的DNS解析故障引发。"在抢修过程中,工程师们遭遇了更多连锁问题,但最终全面修复了系统。"截至东部时间15:01,所有AWS服务均已恢复正常运行。"

至20日东部时间16:30左右,各类服务逐渐回归正轨。此前响应迟缓或完全无响应的Venmo、Lyft等应用已恢复流畅运行。而在当天13:15,多项服务仍处于中断状态:询问Alexa天气或控制智能家居指令失效,Lyft应用响应速度明显迟滞,Venmo交易无法完成。

根据AWS服务状态页面记录,东部时间周一凌晨3:11起,美东1区(北弗吉尼亚州数据中心)出现"多项AWS服务错误率激增与延迟攀升"。至凌晨5:01,AWS确认故障根源在于DynamoDB应用程序接口的DNS解析异常。DynamoDB是存储AWS客户数据的核心数据库。

东部时间12:08,亚马逊发布简短声明重申上述结论,并补充"基础DNS问题于太平洋时间2:24得到彻底控制"。公告同时指出,由于启动新EC2实例持续受阻,部分客户在美东1区仍遭遇AWS服务错误率升高的情况。亚马逊透露其官网、子公司及AWS客服系统均受到波及。

"数据在亚马逊服务器中安然无恙,但数小时内无人能调取,导致应用与数据暂时失联。"圣母大学信息技术与运营学教授迈克·查普尔向CNN比喻,"这好比互联网患上了大面积暂时性失忆症。"

AWS在凌晨6:35宣布完全解决DNS故障,称"多数AWS服务现已正常运作"。但此次中断引发的连锁反应仍持续影响EC2等关键服务——这个支撑无数企业在线应用的虚拟机平台尚未完全恢复。

上午8:48,AWS表示"在解决美东1区新EC2实例启动问题上取得进展",建议客户避免将新部署绑定特定可用区,以保留"EC2在选择最优可用区时的灵活性"。9:42的更新显示,尽管已在美东1区多个可用区实施"多重缓解措施",但"新EC2实例启动错误率仍居高不下",因此采取"限流启动新实例以加速系统恢复"。10:14的通报指出"美东1区多项服务仍存在显著API错误与连接问题"。即便所有故障排除,待处理请求积压与系统恢复仍需时间。

由于海量企业将业务部署在AWS美东1区,周一清晨的故障导致互联网陷入半瘫痪。截至上午时段,众多网站服务仍处于卡顿或报错状态。故障监测平台Down Detector显示各类服务中断报告激增,除亚马逊自有服务外,用户报告涉及银行、航空公司、Disney+、Snapchat、Reddit、Lyft、Apple Music、Pinterest、《堡垒之夜》、Roblox及《纽约时报》等平台——可能危及用户连续猜词游戏Wordle的连胜纪录。

Reddit等平台虽未直接提及AWS,但其状态更新页面暗示服务中断可能与云服务管道故障存在关联。

AWS向来以弹性伸缩的计算服务器资源、全球分布式数据中心等优势吸引全球企业,这些特性对需要提供24小时不间断国际服务的企业极具吸引力。截至2025年中,AWS占据全球云基础设施市场30%份额。但此次事件警示:将互联网命脉系于少数服务商存在显著隐患。

【更新记录】

英文来源:

Amazon's AWS outage knocked services like Alexa, Snapchat, Fortnite, Venmo and more offline
A massive outage highlights why relying on a few companies to power much of the internet is far from ideal.
It felt like half of the internet was dealing with a hangover from the morning of October 19 to the early hours of October 20. A severe Amazon Web Services outage took out many, many websites, apps, games and other services that rely on Amazon’s cloud division to stay up and running. That included a long list of popular software like Venmo, Snapchat, Canva and Fortnite. Even Amazon's own assistant Alexa stuttered, and if you were wondering why the internet seemed to be against you — you weren't imagining it. The good news is that, Amazon announced by 6:53PM Eastern time on October 20 that it resolved the "increased error rates and latencies for AWS Services."
The company said it "identified the trigger of the event as DNS resolution issues for the regional DynamoDB service endpoints." It ran into more problems as it tried to solve the outage, but it was eventually able to fix everything. "By 3:01 PM, all AWS services returned to normal operations," it said.
At about 4:30PM ET on October 20, things seemed to be returning back to normal. Apps like Venmo and Lyft, which were either slow to respond or completely nonresponsive before, were appearing to behave smoothly.
As of 1:15PM ET on October 20, multiple services were unavailable, including asking Alexa for the weather or to turn off lights in your home. The Lyft app was also slower to respond than usual, and Venmo transactions were not completing.
According to the AWS service health page at the time, Amazon was looking into "increased error rates and latencies for multiple AWS services" in the US-EAST-1 region (i.e. data centers in Northern Virginia) as of 3:11AM ET on Monday. By 5:01AM, AWS had figured out that a DNS resolution issue with its DynamoDB API was the cause of the outage. DynamoDB is a database that holds info for AWS clients.
At about 12:08PM ET, the company posted a small statement that reiterated the above and added that the "underlying DNS issue was fully mitigated at 2:24 AM PDT." According to the notice, some Amazon "customers still continue to experience increased error rates with AWS services in the N. Virginia (us-east-1) Region due to issues with launching new EC2 instances." Amazon also said Amazon.com and Amazon subsidiaries, as well as AWS customer service support operations have been impacted.
“Amazon had the data safely stored, but nobody else could find it for several hours, leaving apps temporarily separated from their data,” Mike Chapple, a teaching professor of IT, analytics and operations at University of Notre Dame, told CNN. “It’s as if large portions of the internet suffered temporary amnesia.”
As of 6:35AM, AWS said it had fully mitigated the DNS issue and that "most AWS Service operations are succeeding normally now." However, the knock-on effect caused issues with other AWS services, including EC2, a virtual machine service on which many companies build online applications.
At 8:48AM, AWS said it was "making progress on resolving the issue with new EC2 instance launches in the US-EAST-1 Region." It recommended that clients not tie new deployments to specific Availability Zones (i.e. one or more data centers in a given region) "so that EC2 has flexibility" in picking a zone that may be a better option.
At 9:42AM, Amazon noted on the status page that although it had applied "multiple mitigations" across several Availability Zones in US-EAST-1, it was "still experiencing elevated errors for new EC2 instance launches." As such, AWS was "rate limiting new instance launches to aid recovery." The company added at 10:14AM that it was seeing "significant API errors and connectivity issues across multiple services in the US-EAST-1 Region." Even once all the issues are resolved, AWS will have a significant backlog of requests and other factors to process, so it'll take some time for everything to recover.
Many, many, many companies use US-EAST-1 for their AWS deployments, which is why it felt like half of the internet was knocked offline on Monday morning. As of mid-morning, tons of websites and other services were sluggish or offering up error messages. Outage reports for a broad swathe of services spiked on Down Detector. Along with Amazon's own services, users reported issues with the likes of banks, airlines, Disney+, Snapchat, Reddit, Lyft, Apple Music, Pinterest, Fortnite, Roblox and The New York Times — sorry to anyone whose Wordle streaks may be at risk.
Sites like Reddit have posted their own status updates, and though they don't explicitly mention AWS, it's possible that the services' paths may cross somewhere in the pipelines.
AWS offers a lot of useful features to clients, such as the ability for websites and apps to automatically scale compute and server capacity up and down as needed to handle ebbs and flows in traffic. It also has data centers around the world. That kind of infrastructure is attractive to companies that serve a global audience and need to stay online around the clock. As of mid-2025, it was estimated that AWS' share of the worldwide cloud infrastructure market was 30 percent. But incidents such as this highlight that relying on just a few providers to be the backbone of much of the internet is a bit of a problem.
Update October 20, 2025, 9:21PM ET: This story has been updated with Amazon's latest update that says the issue has been resolved.
Update, Oct 20 2025, 10:57AM ET: This story has been updated to include a short list of services affected in the intro.
Update, Oct 20 2025, 11:17AM ET: This story has been updated to include a reference to Reddit's own status update website.
Update, Oct 20 2025, 1:15PM ET: This story has been updated to include a paragraph reflecting the status of popular services like Lyft, Venmo and Alexa, based on our editors' personal experiences as of this time.
Update, Oct 20 2025, 3:15PM ET: This story has been updated to include a short statement from Amazon describing a timeline of events, when the underlying issue was mitigated and what parts of Amazon have been impacted.
Update, Oct 20 2025, 4:30PM ET: This story has been updated to reflect the status of services like Venmo and Lyft as of Monday afternoon.

Engadget

文章目录


    扫描二维码,在手机上阅读