Muing Average Kernel

KVS para sites de tubos Otimizado para cargas de servidores pesados e protegido contra sobrecargas Você não precisa gastar dinheiro em hardware de servidor caro ou fazendas de servidores. O KVS está pronto para lidar com toneladas de tráfego imediatamente. Mais de 10 milhões de visitas por dia são uma média diária para sites KVS. Nunca será o limite crítico após o qual o seu site deixa de crescer e fica lento. A proteção anti-sobrecarga permite que você identifique os blocos de página potencialmente problemáticos e o ajude a otimizá-los antes de causarem problemas. 100 código PHP aberto pode ser comprado Seu site é 100 sem backdoor e você é o único que controla isso. Você nunca teve motivos para duvidar disso. Você está no controle total de tudo, capaz de fazer modificações ilimitadas sempre que precisar. Categorização extensiva e fácil de entender Crie novas seções de forma rápida e fácil sem qualquer habilidade de desenvolvimento PHP. Categorize seu conteúdo estruturando seus dados exatamente como você precisa. Suporte multi-idioma para site e conteúdo Não limite seu próprio crescimento. Crie várias versões do site em diferentes idiomas e obtenha mais tráfego internacional. Gerencie as versões de idioma no painel de administração que permite controlar facilmente seus tradutores e atualizar sua cópia. Conversão e armazenamento de conteúdo multi-servidor O tempo é dinheiro. Não desperdice tanto o processamento de grandes volumes de conteúdo. O KVS suporta processamento de conteúdo paralelo por vários servidores de conversão. Você ficará impressionado com a rapidez com que o KVS pode ser. Use vários servidores para armazenar seu conteúdo. Encontrou um servidor mais barato Perfeito, com o KVS, você pode mover seu conteúdo lá facilmente, sem perdas ou limites. Construa redes de sites usando o mesmo banco de dados. Lança mais sites gastando menos dinheiro e tempo. Obtenha mais tráfego SE usando diferentes cópias para diferentes sites. Construa sites de nicho usando seu site de multi-nicho primário. Todo o conteúdo é gerenciado no seu painel de administração. Experimente, é muito fácil. Os códigos incorporados são suportados, oferecendo configurações flexíveis de configuração de anúncios e estatísticas detalhadas. Cresça seu site e sua base de usuários ao permitir que os visitantes distribuam seus códigos incorporados. Adicione um código embutido para anúncios e configure suas configurações. O código pode ter um vídeo de qualidade inferior ou mesmo um trailer. Tantas configurações para escolher. Orientado para recursos sociais e comunitários Faça seus usuários gastar mais tempo no seu site. Deixe-os interagir, trocar informações e se juntar a comunidades de pessoas de mentalidade semelhante. O KVS pode ajudá-lo a criar uma ótima combinação de recursos da comunidade e serviços baseados em assinatura. Quanto mais motivados forem, mais eles fazem no seu site. Incentive seus usuários a participar de oferecer tokens como incentivos. Estes podem ser gastos em bônus, incluindo acesso a seções premium. Suporte profissional em inglês, de primeira mão de desenvolvedores Nós nunca fornecemos atendimento ao cliente usando operadores de terceiros. Todas as suas questões e questões são tratadas pelos próprios desenvolvedores que, obviamente, conhecem seus produtos por dentro e por fora. Todos os problemas são resolvidos imediatamente sem ser rejeitado infinitamente entre suporte e tecnologia. Entendemos o quão importante é isso, e os comentários dos clientes comprovam que estamos fazendo isso direito. Criador de páginas incrívelmente flexível com modelos de design simples e fáceis de usar Crie novas páginas e modifique as que você já possui sem ter que pagar nenhum desenvolvedor. As páginas modulares criadas pelo nosso construtor de sites não exigem a edição do código PHP. Você só precisa de habilidades básicas HTML e Smarty. Diminua os custos de desenvolvimento e mantenha seu código livre de erros. Nós nunca deixamos de adicionar recursos atualizados a novas atualizações do KVS. Reprodutor de vídeo que suporta HTML5, modos de qualidade múltipla e opções de monetização virtualmente infinitas Monetize seu site sem qualquer incômodo e faça todos os ajustes exigidos por uma campanha específica. Nosso player foi construído para sites de tubos e atende a todos os requisitos atuais do mercado. Segurança absoluta e controle de 100 sites Seu site KVS está sempre totalmente protegido pelo nosso sistema de proteção multi-camada. Você pode usar proteção anti-hacking com análise permanente de desempenho de software e hardware, auditoria de alteração de arquivo do sistema, sistema de alerta e estatísticas detalhadas. Mantenha-se atualizado sobre o desempenho do seu site em todos os momentos. Suporte de conteúdo de formato múltiplo e criação de formatos à medida que você cria e cresce seu site. Tornar o seu site mais fácil de usar. Ofereça vídeos em mais resoluções, formatos e configurações de qualidade. Monetize seu site usando trailers de vídeos gratuitos e filmes completos para usuários inscritos. Gerencie seus formatos de vídeo à medida que você cria e expande seu site com base em seus arquivos de origem. Você simplesmente escolhe os parâmetros, e o KVS faz o resto. Rotação de tela de vídeo e vídeo O rotador baseado em comportamento seleciona melhores vídeos e melhores capturas de tela para vídeos. Assim, o seu conteúdo mais acessível e excitante está sempre no topo. Importar e exportar feeds O conteúdo definido pelo editor é sempre facilmente disponível para os webmasters que usam esse conteúdo para promover sites de editores. Além disso, você pode automatizar a adição de grandes volumes de conteúdo de praticamente qualquer fonte. Várias opções de acesso pago Monetize seu site usando nosso sistema avançado de cobrança. O KVS suporta todos os processadores de pagamento populares, incluindo CCBill, Epoch, SegPay, NATS, Zombaio, Vendo, SMSCiuk, SMSDostup, xBill etc. Escolha entre associações baseadas em duração (mensais ou outros períodos), bem como pagar por visualização usando tokens. Novos recursos de cobrança estão sempre a caminho. Listas de reprodução do usuário, públicas e privadas. Outra característica incrível que permite aos usuários facilmente estruturar o conteúdo do site em suas listas de reprodução privadas personalizadas e encontrar o conteúdo que eles precisam nas listas de reprodução públicas. Isso pode aumentar a fidelidade dos usuários drasticamente e tornar seu site mais amigável para o SE. Atualizações e melhorias nunca param KVS significa 9 anos de desenvolvimento e melhoria sem parar. Embora o produto tenha se tornado popular e estável muito rápido, nunca paramos de aperfeiçoá-lo. A maioria dos nossos clientes possui sites de nível superior, é por isso que sempre precisamos oferecer recursos e recursos atualizados para corresponder. O KVS é tudo sobre recursos de ponta que satisfazem plenamente as necessidades atuais dos proprietários de sites em todo o mundo. Confira todos os recursos do Kernel Video Sharing agora. Dê uma olhada no nosso site de demonstração e peça a nova versão agora Trabalhe com os melhores, torne-se o melhor Bravo Media Group Alex Nós estamos usando o KVS agora há algum tempo e nós não olhamos para trás desde então. Nossa experiência não foi nada além de excelente desde o primeiro dia. A KVS atende todos os requisitos modernos no mercado dinâmico de hoje. Ele se adequará a todos os modelos de negócios à medida que novas versões forem lançadas com freqüência com novos recursos adicionados o tempo todo. Além disso, o script é incrivelmente flexível e permite que você o reconfigure como. Nós usamos esse script há alguns anos. Ficamos tão felizes, gradualmente, nos movendo para os nossos principais sites. A KVS oferece uma ótima combinação de recursos, processamento de conteúdo e opções de administração de sites, e um mecanismo de modelo muito fácil de usar. Adicione a equipe de suporte altamente profissional e você obtém um dos principais produtos no mercado hoje. Nossas recomendações Utilizamos o KVS em vários sites bem sucedidos com mais de 1 milhão de visualizações de página por dia e sempre foi um script estável e eficiente. Ele pode lidar com grandes sites com facilidade. Sua equipe de suporte é rápida e excepcionalmente útil e é uma das muitas razões pelas quais continuamos voltando para o KVS para nossos novos sites. Eu dou a minha mais alta recomendação para a construção do seu site de tubos. Nós estamos testando esse script e usá-lo há alguns anos. Eu acredito que nenhum script pode ser uma competição para a KVS no mercado de hoje. Um equilíbrio impressionante entre facilidade de uso, opções de personalização e um conjunto promissor de recursos poderosos. RC Support Royal Cash Temos vindo a utilizar este produto para criar e executar alguns dos nossos sites. Oferece uma incrível facilidade de uso ao gerenciar e configurar um site, a carga do servidor é mínima e as páginas carregam muito rápido. Agradecemos toda a ajuda que a Kernel Team ofereceu quando estávamos configurando e personalizando o Kernel Video Sharing para esses sites. Não desejamos nada além de sucesso em levar seu produto ainda mais. Eu realmente gosto do Kernel Video Sharing. Estou sempre procurando um novo script para me ajudar a fazer meu trabalho mais rápido e, naturalmente, mais fácil. Eu sempre cuidado com problemas ao adicionar um novo script. Muitas dessas coisas simplesmente não funcionam e é preciso dias para consertar os erros que causam. Então, fiquei agradavelmente surpreso quando este script simplesmente não mostrou problemas. Então, a facilidade de uso realmente me soprou. Tay Tony Bucks Deixe-me juntar-me a todas as pessoas aqui e dizer que esse script é roco. Eu nem posso começar a listar todos os recursos. É um cubo de Rubiks real, você apenas combina as partes da maneira que você deseja. Nossa equipe vem cortando modelos para o KVS há vários anos. Nós adoramos o quão dinâmico é esse script, sempre avançando. Em nossa experiência, todas as questões foram tratadas pelo departamento de suporte imediatamente. Se você precisa de algo que ainda não existe, você pode ter certeza de que será muito breve depois de contar isso sobre isso. Tendo criado dúzias de sites com KVS, agora temos certeza disso. Nós fomos construindo e apoiando sites adultos em turnê por um bom tempo agora. Estes incluem sites CJ, tubos, TGPs, países, sites AVS, blogs e muito mais. Para a maioria dos sites baseados em vídeo, usamos Kernel Video Sharing. Atualmente acreditamos que o KVS é a melhor escolha para sites de tubos. Há tantas maneiras de personalizar os modelos para atender às suas necessidades. Nós gostamos que você possa usar recursos básicos para. Pronto para começar a fazer negócios com profissionais e criar um site com sucesso no KVS. Atualizado: 14 de junho de 2010 Este artigo faz parte do meu Linux Kernel Crash Book. Está disponível para download gratuito em formato PDF Finalmente, chegou o grande momento. Lendo as informações exibidas pelo utilitário de falhas, entendendo o que essas linhas curiosas significam e passando pelo problema para o outro lado. Aprendemos a configurar nossos sistemas para o vazamento de kernel, usando LKCD e Kdump. Tanto localmente como em toda a rede. Aprendemos a configurar o mecanismo de falha no CentOS e no openSUSE. E analisamos as sutis diferenças entre os dois sistemas operacionais. Em seguida, nós dominamos o uso básico do utilitário de falha, usando-o para abrir o núcleo de memória despejado e processar as informações nele contidas. Mas ainda não aprendemos a interpretar o resultado. Pré-introdução Hoje, nos concentraremos apenas nisso. Leia a análise do vmcore, entenda o que as entradas significam, realize uma investigação básica do problema, examine o código-fonte e obtenha uma metodologia eficiente para lidar com os problemas de falha do kernel no futuro. Então, se você estiver com disposição para alguma infra-estrutura super-séria, siga-me. Índice de leitura necessária Você DEVE ler os outros artigos em outro para entender completamente como funciona o acidente. Você pode encontrar a lista detalhada de referências abaixo. Sem dominar os conceitos básicos, incluindo a funcionalidade Kdump e crash, você não poderá seguir este tutorial de forma eficiente. Analisando o relatório de falhas - Primeiros passos Depois de iniciar a falha, você receberá a informação do relatório inicial no console. É aí que começa a análise do acidente. Crash 4.0-8.9.1.el5.centos Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Red Hat, Inc. Copyright (C) 2004, 2005, 2006 IBM Corporation Copyright (C) 1999 -2006 Hewlett-Packard Co Copyright (C) 2005, 2006 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan KK Copyright (C) 2005 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. Este programa é um software livre, abrangido pelo público geral GNU Licença, e você pode trocá-lo e distribuir cópias dele sob certas condições. Digite a ajuda para copiar para ver as condições. Este programa não tem absolutamente nenhuma garantia. Entre na garantia de ajuda para obter detalhes. NOTA: stdin: not a tty GNU gdb 6.1 Copyright 2004 Free Software Foundation, Inc. O GDB é um software livre, coberto pela GNU General Public License, e você pode trocá-lo e distribuir cópias dele sob certas condições. Digite show copy para ver as condições. Não há absolutamente nenhuma garantia para o GDB. Digite a garantia do show para obter detalhes. Este GDB foi configurado como x8664-unknown-linux-gnu. Bt: não pode transição da pilha de exceção para a pilha de processos atual: apontador de pilha de exceção: ffff810107132f20 ponteiro da pilha de processo: ffff81010712bef0 current currentbase: ffff8101b509c000 KERNEL: usrlibdebuglibmodules2.6.18-164.10.1.el5.centos. plusvmlinux DUMPFILE: vmcore CPUS: 2 DATA: Tue Jan 19 20:21:19 2010 UPTIME: 00:00:00 LOAD AVERAGE: 0.00, 0.04, 0.07 TASKS: 134 NODENAME: testhost2localdomain RELEASE: 2.6.18-164.10.1.el5 VERSÃO: 1 SMP Thu Jan 7 19:54: 26 EST 2010 MACHINE: x8664 (3000 Mhz) MEMÓRIA: 7.5 GB PANIC: SysRq. Disparar um crashdump PID: 0 COMMAND: swapper TASK: ffffffff80300ae0 (1 de 2) THREADINFO: ffffffff803f2000 CPU: 0 ESTADO: TASKRUNNING (ACTIVE) Permite percorrer o relatório. A primeira coisa que você vê é algum tipo de erro: bt: não é possível fazer a transição da pilha de exceção para a pilha de processos atual: apontador da pilha de exceção: ffff810107132f20 ponteiro da pilha de processo: ffff81010712bef0 currentscore: ffff8101b509c000 A explicação técnica para este erro é um pouco complicada. Citado a partir do tópico da lista de discussão do utilitário de falha sobre as mudanças na versão 4.0-8.11 do software de falha, nós aprendemos as seguintes informações: Se um NDP kdump emitido para uma cpu x8664 não-falhando foi recebido durante a execução em schedule (), depois de ter configurado o Próxima tarefa como atual no cpus runqueue, mas antes de alterar a pilha do kernel para a da próxima tarefa, um backtrace falharia para fazer a transição da pilha de exceções NMI para a pilha de processo, com a mensagem de erro bt: não pode Transição da pilha de exceções para a pilha de processos atual. Este patch relatará incoerências encontradas entre uma tarefa marcada como a tarefa atual em um cpus runqueue e a tarefa encontrada no per-cpu x8664pda pcurrent field (2.6.29 e anterior) ou a variável currenttask per-cpu (2.6.30 e mais tarde). Se puder ser determinado de forma segura que a configuração do runqueue (usada por padrão) é prematura, a tarefa ativa interna per-cpu dos serviços de falha será alterada para ser a tarefa indicada pelo valor específico da arquitetura apropriada. O que isso significa é um aviso que você deve prestar atenção ao analisar o relatório de falha. Isso nos ajudará a determinar qual estrutura de tarefas precisamos olhar para solucionar o problema de falha. Por enquanto, ignore este erro. Não é importante entender o que o relatório de acidentes contém. Você pode ou não vê-lo. Agora, vamos examinar o código abaixo desse erro. KERNEL: especifica o kernel em execução no momento da falha. DUMPFILE: é o nome do núcleo de memória despejado. CPUS: é o número de CPUs na sua máquina. DATA: especifica a hora da falha. TAREFAS: indica o número de tarefas na memória no momento da falha. A tarefa é um conjunto de instruções de programa carregadas na memória. NODENAME: é o nome do host quebrou. RELEASE: e VERSION: especifique a versão e a versão do kernel. MÁQUINA: especifica a arquitetura da CPU. MEMÓRIA: é o tamanho da memória física na máquina quebrada. E agora venha os bits interessantes: PANIC: especifica que tipo de falha ocorreu na máquina. Existem vários tipos que você pode ver. SysRq (System Request) refere-se a Magic Keys, que permite enviar instruções diretamente ao kernel. Eles podem ser invocados usando uma seqüência de teclado ou fazendo eco de comandos de letras para procsysrq-trigger. Desde que a funcionalidade esteja habilitada. Nós discutimos isso no tutorial do Kdump. Oops é um desvio do comportamento esperado e correto do kernel. Normalmente, o uops resulta no processo ofensivo sendo morto. O sistema pode ou não retomar seu comportamento normal. Muito provavelmente, o sistema entrará em um estado imprevisível e instável, o que poderia levar ao pânico do kernel se alguns dos recursos de buggy, matados, forem solicitados mais tarde. Por exemplo, nas minhas revisões do Ubuntu Karmic e Fedora Constantine, vimos evidências de falhas no kernel. No entanto, o sistema continuou funcionando. Esses acidentes foram de fato oopses. Nós discutiremos o caso Fedora mais tarde. Panic é um estado em que o sistema encontrou um erro fatal e não pode se recuperar. O pânico pode ser causado pela tentativa de acesso a endereços não permitidos, carregamento forçado ou descarga de módulos do kernel ou problemas de hardware. Em nosso primeiro exemplo mais benigno, o PANIC: string refere-se ao uso de Magic Keys. Nós desencadeamos deliberadamente um acidente. PANIC: SysRq. Disparar um PID de crashdump: é o ID do processo do. Processo que causou o acidente. COMANDO: é o nome do processo, neste caso, o swapper. Swapper. Ou PID 0 é o programador. É o processo que delega o tempo da CPU entre os processos executáveis e, se não houver outros processos no runqueue, ele assume o controle. Você pode querer se referir ao swapper como a tarefa ociosa, por assim dizer. Há um swapper por CPU, que você verá logo quando começarmos a explorar o acidente com maior profundidade. Mas isso não é realmente importante. Vamos encontrar muitos processos com nomes diferentes. TASK: é o endereço em memória para o processo ofensivo. Usaremos essas informações mais tarde. Existe uma diferença no endereçamento de memória para arquiteturas de 32 bits e 64 bits. CPU: é o número da CPU (relevante se for mais de um) onde o processo ofensivo estava em execução no momento da falha. CPU refere-se a núcleos de CPU e não apenas a CPUs físicas. Se você estiver executando seu Linux com hyperthreading ativado, você também contará threads separados como CPUs. Isso é importante lembrar, porque falhas recorrentes em apenas uma CPU específica podem indicar um problema de CPU. Se você estiver executando seus processos com afinidade definida para determinadas CPUs (taskset), então você pode ter mais dificuldade em identificar problemas relacionados à CPU ao analisar os relatórios de falhas. Você pode examinar o número de suas CPUs executando cat proccpuinfo. ESTADO: indica o estado do processo no momento da falha. TASKRUNNING refere-se a processos executáveis, ou seja, processos que podem continuar sua execução. Mais uma vez, falaremos mais sobre isso mais tarde. Ficando mais quente, vimos um exemplo benigno até agora. Apenas uma introdução. Vamos dar uma olhada em vários outros exemplos, incluindo casos reais. Por enquanto, sabemos pouco sobre o acidente, exceto pelo processo que o causou. Vamos agora examinar vários outros exemplos e tentar entender o que vemos lá. Exemplo do Fedora Vamos voltar ao caso do Fedora. Dê uma olhada na captura de tela abaixo. Embora a informação seja organizada um pouco diferente do que vimos anteriormente, essencialmente, é a mesma coisa. Mas há uma nova informação: Pid: 0, comm: swapper. Não está contaminado. Vamos focar a corda não manchada por um momento. O que significa isso significa que o kernel não está executando nenhum módulo que tenha sido carregado com força. Em outras palavras, provavelmente estamos enfrentando um erro de código em algum lugar em vez de uma violação do kernel. Você pode examinar seu kernel em execução executando: até agora, aprendemos outro pouco de informação. Vamos falar sobre isso mais tarde. Outro exemplo, do Livro Branco Veja: MEMÓRIA: 128MB PANIC: Ops: 0002 (verifique o log para obter detalhes) PID: 1696 COMANDO: insmod O que temos aqui Uma nova informação. Oops: 0002. O que isso significa Erro na página do kernel Os quatro dígitos são um código decimal do Erro da página do kernel. Leitura OReillys Understanding Linux Kernel, Capítulo 9: Processo de espaço de endereço, Handicador de exceção de falha de página, páginas 376-382, aprendemos as seguintes informações: Se o primeiro bit estiver limpo (0), a exceção foi causada por um acesso a uma página que Não está presente se o bit estiver configurado (1), isso significa acesso inválido direito. Se o segundo bit estiver limpo (0), a exceção foi causada pelo acesso de leitura ou execução se definido (1), a exceção foi causada por um acesso de gravação. Se o terceiro bit estiver limpo (0), a exceção foi causada enquanto o processador estava no modo Kernel, caso contrário, ocorreu no modo Usuário. O quarto bit nos diz se a falha foi um Instruction Fetch. Isso só é válido para arquitetura de 64 bits. Uma vez que nossa máquina é de 64 bits, o bit tem significado aqui. Isso é bastante interessante. As informações aparentemente incompreensíveis começam a se sentir realmente lógicas. Oh, você também pode ver os Erros da Página do Kernel no seguinte formato, como uma tabela: Às vezes, o acesso inválido também é chamado de falha de proteção: Portanto, para entender o que aconteceu, precisamos traduzir o código decimal em binário e depois examinar Os quatro bits, da direita para a esquerda. Você pode encontrar esta informação em archarchmmfault. c na árvore de origem do kernel: bits de código de erro de falha de página definir PFPROT (1ltlt0) ou nenhuma página encontrada definir PFWRITE (1ltlt1) definir PFUSER (1ltlt2) definir PFRSVD (1ltlt3) definir PFINSTR (1ltlt4) In Nosso caso, decimal 2 é binário 10. Olhando da direita para a esquerda, o bit 1 é zero, o bit 2 está aceso, os bits 3 e 4 são zero. Observe a contagem binária, a partir de zero. 0002 (dec) --gt 0010 (binário) --gt Não busca a instruçãoKernel modeWriteAcesso inválido Portanto, temos uma página não encontrada durante uma operação de gravação no modo Kernel, a falha não foi uma busca de instrução. Claro, é um pouco mais complicado do que isso, mas ainda estava recebendo uma boa idéia do que está acontecendo. Bem, está começando a ficar interessante, não é isso. Olhando para o processo ofensivo, insmod. Isso nos diz um pouco. Tentamos carregar um módulo de kernel. Tentou escrever para uma página que não encontrou, o que significa falha de proteção, o que causou o bloqueio do nosso sistema. Este pode ser um código gravemente escrito. Verificação de status OK, até agora, vimos um pouco de informações úteis. Aprendemos sobre os campos identificadores básicos no relatório de falhas. Aprendemos sobre os diferentes tipos de Panics. Aprendemos a identificar o processo ofensivo, decidindo se o kernel está contaminado e que tipo de problema ocorreu no momento do acidente. Mas acabamos de iniciar nossa análise. Vamos levar isso para um novo nível. Ficando quente No primeiro artigo sobre falha, nós aprendemos sobre alguns comandos básicos. É hora de usá-los. O primeiro comando que queremos é bt - backtrace. Queremos ver o histórico de execução do processo ofensivo, ou seja, backtrace. PID: 0 TAREFA: CPU ffffffff80300ae0: 0 comando: swapper 0 ffffffff80440f20 crashnmicallback em ffffffff8007a68e 1 ffffffff80440f40 donmi em ffffffff8006585a 2 ffffffff80440f50 nmi em ffffffff80064ebf RIP exceção: RIP defaultidle61: ffffffff8006b301 RSP: RFLAGS ffffffff803f3f90: 00000246 RAX: 0000000000000000 RBX: RCX ffffffff8006b2d8: 0000000000000000 RDX : 0000000000000000 RSI: 0000000000000001 RDI: RBP ffffffff80302698: 0000000000090000 R8: ffffffff803f2000 R9: R10 000000000000003e: ffff810107154038 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIGRAX: ffffffffffffffff CS: 0010 SS: 0,018 --- ltexception stackgt - - 3 ffffffff803f3f90 defaultidle no ffffffff8006b301 4 ffffffff803f3f90 cpuidle em ffffffff8004943c Temos muitos dados aqui, vamos começar a digeri-lo lentamente. Rastreamento de chamadas A seqüência de linhas numeradas, começando com o sinal de hash (), é o rastreamento de chamadas. É uma lista das funções do kernel executadas antes da falha. Isso nos dá uma boa indicação do que aconteceu antes do sistema ter caído. 0 ffffffff80440f20 crashnmicallback em ffffffff8007a68e 1 ffffffff80440f40 donmi em ffffffff8006585a 2 ffffffff80440f50 nmi em ffffffff80064ebf PIR excepção: PIR defaultidle61: ffffffff8006b301 RSP: RFLAGS ffffffff803f3f90: 00000246 RAX: 0000000000000000 RBX: RCX ffffffff8006b2d8: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: RBP ffffffff80302698: 0000000000090000 R8 : ffffffff803f2000 R9: R10 000000000000003e: ffff810107154038 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ORIGRAX: ffffffffffffffff CS: 0010 SS: 0,018 --- --- stackgt ltexception 3 ffffffff803f3f90 defaultidle em ffffffff8006b301 4 ffffffff803f3f90 CpuIdle em ffffffff8004943c Vamos discutir isso mais tarde. Ponteiro de instruções A primeira linha realmente interessante é esta: exceção RIP: defaultidle61 Temos exceção RIP: defaultidle61. O que isso significa Primeiro, vamos discutir RIP. RIP é o ponteiro de instruções. Ele aponta para um endereço de memória, indicando o progresso da execução do programa na memória. No nosso caso, você pode ver o endereço exato na linha logo abaixo da linha de exceções entre colchetes: exceção RIP: defaultidle61 RIP: ffffffff8006b301 RSP: ffffffff803f3f90. Por enquanto, o endereço em si não é importante. Nota: Na arquitetura de 32 bits, o ponteiro de instruções é chamado de EIP. A segunda parte da informação é muito mais útil para nós. Defaultidle é o nome da função kernel na qual o RIP reside. 61 é o deslocamento, em formato decimal, dentro da referida função onde ocorreu a exceção. Este é o ponto muito importante que usaremos mais tarde em nossa análise. Registo de segmentos de código (CS) O código entre a cadeia entre colchetes até --- ltexception stackgt --- é o despejo de registros. A maioria não nos é útil, exceto o registro CS (Code Segment). Mais uma vez, encontramos uma combinação de quatro dígitos. Para explicar este conceito, preciso me desviar um pouco e falar sobre níveis de privilégio. O nível de privilégio é o conceito de proteger recursos em uma CPU. Os segmentos de execução diferentes podem ter diferentes níveis de privilégio, que dão acesso aos recursos do sistema, como regiões de memória, portas de IO, etc. Existem quatro níveis, variando de 0 a 3. O nível 0 é o mais privilegiado, conhecido como modo Kernel. O Nível 3 é o menos privilegiado, conhecido como Modo de Usuário. A maioria dos sistemas operacionais modernos, incluindo o Linux, ignoram os dois níveis intermediários, usando apenas 0 e 3. Os níveis também são conhecidos como Rings. Uma exceção notável do uso de níveis foi o sistema IBM OS2. O Registro de Segmento de Código (CPL) do Nível de Privilégio atual (CS) é aquele que aponta para um segmento onde as instruções do programa são definidas. Os dois bits menos significativos deste registro especificam o nível de privilégio atual (CPL) da CPU. Dois bits, significando números entre 0 e 3. Nível de privilégio do descritor (DPL) amplificador Nível de privilégio solicitado (RPL) O nível de privilégio do descritor (DPL) é o maior nível de privilégio que pode acessar o recurso e está definido. Esse valor é definido no Segment Descriptor. O nível de privilégio solicitado (RPL) é definido no Segment Selector, os dois últimos bits. Matematicamente, o CPL não pode exceder MAX (RPL, DPL) e, se o fizer, isso causará uma falha de proteção geral. Agora, por que isso é importante, você pergunta, por exemplo, se você encontrar um caso em que o sistema caiu enquanto o CPL era 3, isso poderia indicar hardware defeituoso, porque o sistema não deveria falhar devido a um problema no modo Usuário . Alternativamente, pode haver um problema com uma chamada de sistema de buggy. Apenas alguns exemplos difíceis. Para obter mais informações, considere se referir ao Oracle Vernerning Understanding Kernel, Capítulo 2: Endereçamento de memória, Página 36-39. Você encontrará informações úteis sobre Segment Selectors, Segment Descriptors, Table Index, Global and Local Descriptor Tables, e, claro, o Current Privilege Level (CPL). Voltar ao nosso registro de falhas: como sabemos, os dois bits menos significativos especificam o CPL. Dois bits significam quatro níveis, no entanto, os níveis 1 e 2 são ignorados. Isso nos deixa com 0 e 3, o modo Kernel e o modo Usuário, respectivamente. Traduzido em formato binário, temos 00 e 11. O formato usado para apresentar os dados do descritor pode ser confuso, mas é muito simples. Se a figura mais à direita for igual, então estavam no modo Kernel se a última figura for estranha, então estavam no modo Usuário. Por isso, vemos que o CPL é 0, a tarefa ofensiva que conduziu ao crash foi executada no modo Kernel. Isso é importante para saber. Pode nos ajudar a entender a natureza do nosso problema. Apenas para referência, heres um exemplo onde o acidente ocorreu no modo Usuário, coletado em uma máquina SUSE: Mas isso é apenas uma conversa geeky. De volta ao nosso exemplo, aprendemos muitos detalhes úteis e importantes. Conhecemos o endereço da memória exata onde o ponteiro de instruções estava no momento da falha. Conhecemos o nível de privilégio. Mais importante ainda, sabemos o nome da função kernel e o deslocamento onde o RIP estava apontando no momento da falha. Para todos os fins práticos, precisamos encontrar o arquivo de origem e examinar o código. Claro, isso pode não ser sempre possível, por várias razões, mas vamos fazer isso, no entanto, como um exercício. Então, sabemos que a função crashnmicallback () foi chamada por donmi (), donmi () foi chamado por nmi (), nmi () foi chamado por defaultidle (), o que causou a falha. Podemos examinar essas funções e tentar entender mais profundamente o que elas fazem. Nós faremos isso em breve. Agora, revele nosso exemplo do Fedora mais uma vez. Exemplo do Fedora, novamente Agora que entendemos o que está errado, podemos dar uma olhada no exemplo do Fedora novamente e tentar entender o problema. Nós temos um crash em um kernel não contaminado, causado pelo processo swapper. O relatório de falha aponta para a função nativeapicwritedummy. Então, há também um rastreamento de chamadas muito longo. Um pouco de informações úteis que devem nos ajudar a resolver o problema. Veremos como podemos usar os relatórios de falhas para ajudar os desenvolvedores a corrigir erros e produzir um software melhor e mais estável. Agora, vamos nos concentrar mais no crash e nos comandos básicos. Backtrace para todas as tarefas Por padrão, o crash exibirá backtrace para a tarefa ativa. Mas você também pode querer ver o backtrace de todas as tarefas. Neste caso, você vai querer executar o foreach. Dump do buffer de mensagens do sistema do sistema - buffer de mensagens do sistema de despejo Este comando despeja o conteúdo do logbuf do kernel em ordem cronológica. O kernel log bugger (logbuf) pode conter pistas úteis que precedem a falha, o que pode nos ajudar a identificar com mais facilidade o problema e entender por que o nosso sistema caiu. O comando log pode não ser realmente útil se você tiver problemas de hardware intermitentes ou exclusivos erros de software, mas definitivamente vale a pena tentar. Heres nosso registro de choque, as últimas linhas: ide: o opcode falhou foi: 0xec mtrr: tipo incompatibilidade para f8000000,400000 antigo: uncachable new: write-combinating ISO 9660 Extensions: Microsoft Joliet Level 3 ISO 9660 Extensions: RRIP1991A SysRq. Disparar um crashdump E theres a mensagem SysRq. Útil para saber. Em casos reais, pode haver algo muito mais interessante. Exibir informações do status do processo ps - exibir informações de status do processo Este comando exibe o status do processo para os processos selecionados ou todos no sistema. Se nenhum argumento for inserido, os dados do processo serão exibidos para todos os processos. Dê uma olhada no exemplo abaixo. Nós temos dois processos swapper Como eu disse anteriormente, cada CPU possui seu próprio agendador. A tarefa ativa está marcada com gt. O utilitário de falha pode carregar apontando para uma tarefa que não causou o pânico ou pode não conseguir a tarefa de pânico. Não há garantias. Se você estiver usando máquinas virtuais, incluindo VMware ou Xen, as coisas podem ficar ainda mais complicadas. In this case, the pointer in the ps output marks the wrong process: Using backtrace for all processes (with foreach) and running the ps command, you should be able to locate the offending process and examine its task. Other useful information you may need: Bracketed items are kernel threads for example, init and udevd are not. Then, theres memory usage information, VSZ and RSS, process state, and more. Super geeky stuff Note: This section is impossibly hard. Too hard for most people. Very few people are skilled enough to dabble in kernel code and really know whats going on in there. Trying to be brave and tackle the possible bugs hidden in crash cores is a noble attempt, but you should not take this lightly. I have to admit that although I can peruse crash reports and accompanying sources, I still have a huge deal to learn about the little things and bits. Dont expect any miracles. Theres no silver-bullet solution to crash analysis Time to get ultra-serious. Lets say you may even want to analyze the C code for the offending function. Needless to say, you should have the C sources available and be able to read them. This is not something everyone should do, but its an interesting mental exercise. Source code All right, you want examine the code. First, you will have to obtain the sources. Some distributions make the sources readily available. For example, in openSUSE, you just have to download the kernel-source package. With CentOS, it is a little more difficult, but doable. You can also visit the Linux Kernel Archive and download the kernel matching your own, although some sources may be different from the ones used on your system, since some vendors make their own custom changes. Once you have the sources, its time to examine them. Example, on openSUSE: You could browse the sources using the standard tools like find and grep, but this can be rather tedious. Instead, why not let the system do all the hard work for you. A very neat utility for browsing C code is called cscope. The tool runs from the command line and uses a vi-like interface. By default, it will search for sources in the current directory, but you can configure it any which way. cscope is available in the repositories: Now, in the directory containing sources (by default, usrsrclinux ), run cscope: This will recursively search all sub-directories, index the sources and display the main interface. There are other uses as well try the man page or --help flag. Now, its time to put the tool to good use and search for desired functions. We will begin with Find this C symbol. Use the cursor keys to get down to this line, then type the desired function name and press Enter. The results will be displayed: Depending on what happened, you may get many results or none. It is quite possible that there is no source code containing the function seen in the crash report. If there are too many results, then you might want to search for the next function in the call trace by using the Find functions called by this function option. Use Tab to jump between the input and output section. If you have official vendor support, this is a good moment to turn the command over and let them drive. If you stick with the investigation, looking for other functions listed in the call trace can help you narrow down the C file you require. But theres no guarantee and this can be a long, tedious process. Furthermore, any time you need help, just press. and you will get a basic usage guide: In the kernel source directory, you can also create the cscope indexes, for faster searches in the future, by running make cscope. Disassemble the object Assuming you have found the source, its time to disassemble the object compiled from this source. First, if youre running a debug kernel, then all the objects have been compiled with the debug symbols. Youre lucky. You just need to dump the object and burrow into the intermixed assembly-C code. If not, you will have to recompile the source with debug symbols and then reverse-engineer it. This is not a simple or a trivial task. First, if you use a compiler that is different than the one used to compile the original, your object will be different from the one in the crash report, rendering your efforts difficult if not impossible. Trivial example I call this example trivial because it has nothing to do with the kernel. It merely demonstrates how to compile objects and then disassemble them. Any source will do. In our case, well use MPlayer, a popular open-source media player as our scapegoat. Download the MPlayer source code, run. configure, make. After the objects are created, delete one of them, then recompile it. Run make ltobject namegt. for instance: Please note that make has no meaning without a Makefile, which specifies what needs to be done. But we have a Makefile. It was created after we ran. configure. Otherwise, all this would not really work. Makefile is very important. We will see a less trivial example soon. If you do not remove the existing object, then you probably wont be able to make it. Make compares timestamps on sources and the object, so unless you change the sources, the recompile of the object will fail. Now, heres another simple example, and note the difference in the size of the created object, once with the debug symbols and once without: If you dont have a Makefile, you can invoke gcc manually using all sorts of flags. You will need kernel headers that match the architecture and the kernel version that was used to create the kernel where the crash occurred, otherwise your freshly compiled objects will be completely different from the ones you may wish to analyze, including functions and offsets. A utility you want to use for disassembly is objdump. You will probably want to use the utility with - S flag, which means display source code intermixed with assembly instructions. You may also want - s flag, which will display contents of all sections, including empty ones. - S implies - d. which displays the assembler mnemonics for the machine instructions from objfile this option only disassembles those sections which are expected to contain instructions. Alternatively, use - D for all sections. Thus, the most inclusive objdump would be: objdump - D - S ltcompiled object with debug symbolsgt gt ltoutput filegt It will look something like this: And an even better example, the memhog dump: Moving on to kernel sources Warming up. Once youre confident practicing with trivial code, time to move to kernel. Make sure you do not just delete any important file. For the sake of exercise, move or rename any existing kernel objects you may find lurking about. Then, recompile them. You will require the. config file used to compile the kernel. It should be included with the sources. Alternatively, you can dump it from procconfig. gz. zcat procconfig. gz gt. config On RedHat machines, you will find the configuration files also under boot. Make sure you use the one that matches the crashed kernel and copy it over into the source directory. If needed, edit some of the options, like CONFIGDEBUGINFO. More about that later. Without the. config file, you wont be able to compile kernel sources: You may also encounter an error where the Makefile is supposedly missing, but its there. In this case, you may be facing a relatively simply problem, with the wrong ARCH environment variable set. For example, i585 versus i686 and x86-64 versus x8664. Pay attention to the error and compare the architecture to the ARCH variable. In the worst case, you may need to export it correctly. For example: As a long term solution, you could also create symbolic links under usrsrclinux from the would-be bad architecture to the right one. This is not strictly related to the analysis of kernel crashes, but if and when you compile kernel sources, you may encounter this issue. Now, regarding the CONFIGDEBUGINFO variable. It should be set to 1 in your. config file. If you recall the Kdump tutorial, this was a prerequisite we asked for, in order to be able to successfully troubleshoot kernel crashes. This tells the compiler to create objects with debug symbols. Alternatively, export the variable in the shell, as CONFIGDEBUGINFO1. Then, take a look at the Makefile. You should see that if this variable is set, the object will be compiled with debug symbols (-g). This is what we need. After that, once again, we will use objdump. Now, Makefile might really be missing. In this case, you will get a whole bunch of errors related to the compilation process. But with the Makefile in place, it should all work smoothly. And then, theres the object up to date example again. If you do not remove an existing one, you wont be able to compile a new one, especially if you need debug symbols for later disassembly. Finally, the disassembled object: What do we do now Well, you look for the function listed in the exception RIP and mark the starting address. Then add the offset to this number, translated to hexadecimal format. Then, go to the line specified. All that is left is to try to understand what really happened. Youll have an assembly instruction listed and possibly some C code, telling us what might have gone wrong. Its not easy. In fact, its very difficult. But its exciting and you may yet succeed, finding bugs in the operating system. Whats more fun than that Above, we learned about the compilation and disassembly procedures, without really doing anything specific. Now that we know how to go about compiling kernel objects and dissecting them into little bits, lets do some real work. Intermediate example We will now try something more serious. Grab a proof-of-concept code that crashes the kernel, compile it, examine the crash report, then look for the right sources, do the whole process we mentioned above, and try to read the alien intermixed assembly and C code. Of course, we will be cheating, cause we will know what were looking for, but still, its a good exercise. The most basic non-trivial example is to create a kernel module that causes panic. Before we panic our kernel, lets do a brief overview of the kernel module programming basics. Create problematic kernel module This exercise forces us to deviate from the crash analysis flow and take a brief look at the C programming language from the kernel perspective. We want to crash our kernel, so we need kernel code. While were going to use C, its a little different from everyday stuff. Kernel has its own rules. We will have a sampling of kernel module programing. Well write our own module and Makefile, compile the module and then insert it into the kernel. Since our module is going to be written badly, it will crash the kernel. Then, we will analyze the crash report. Using the information obtained in the report, we will try to figure out whats wrong with our sources. Step 1: Kernel module We first need to write some C code. Lets begin with hello. c. Without getting too technical, heres the most basic of modules, with the init and cleanup functions. The module does not nothing special except print messages to the kernel logging facility. hello. c - The simplest kernel module. include ltlinuxmodule. hgt Needed by all modules include ltlinuxkernel. hgt Needed for KERNINFO int initmodule(void) printk(KERNINFO Hello world. n) A non 0 return means initmodule failed module cant be loaded. return 0 void cleanupmodule(void) printk(KERNINFO Goodbye world. n) We need to compile this module, so we need a Makefile: all: make - C libmodules(shell uname - r)build M(PWD) modules clean: make - C libmodules(shell uname - r)build M(PWD) clean Now, we need to make the module. In the directory containing your hello. c program and the Makefile, just run make. You will see something like this: Our module has been compiled. Lets insert it into the kernel. This is done using the insmod command. However, a second before we do that, we can examine our module and see what it does. Maybe the module advertises certain bits of information that we might find of value. Use the modinfo command for that. In this case, nothing special. Now, insert it: If the module loads properly into the kernel, you will be able to see it with the lsmod command: sbinlsmod grep hello Notice that the use count for our module is 0. This means that we can unload it from the kernel without causing a problem. Normally, kernel modules are used for various purposes, like communicating with system devices. Finally, to remove the module, use the rmmod command: If you take at a look at varlogmessages, you will notice the Hello and Goodbye messages, belonging to the initmodule and cleanupmodule functions: That was our most trivial example. No crash yet. But we have a mechanism of inserting code into the kernel. If the code is bad, we will have an oops or a panic. Step 2: Kernel panic Well now create a new C program that uses the panic system call on initialization. Not very useful, but good enough for demonstrating the power of crash analysis. Heres the code, we call it kill-kernel. c. kill-kernel. c - The simplest kernel module to crash kernel. include ltlinuxmodule. hgt Needed by all modules include ltlinuxkernel. hgt Needed for KERNINFO int initmodule(void) printk(KERNINFO Hello world. Now we crash. n) panic(Down we go, panic called) void cleanupmodule(void) printk(KERNINFO Goodbye world. n) When inserted, this module will write a message to varlogmessages and then panic. Indeed, this is what happens. Once you execute the insmod command, the machine will freeze, reboot, dump the kernel memory and then reboot back into the production kernel. Step 3: Analysis Lets take a look at the vmcore. And the backtrace: What do we have here First, the interesting bit, the PANIC string: Kernel panic - not syncing: Down we go, panic called That bit looks familiar. Indeed, this is our own message we used on panic. Very informative, as we know what happened. We might use something like this if we encountered an error in the code, to let know the user what the problem is. Another interesting piece is the dumping of the CS register - CS: 0033. Seemingly, we crashed the kernel in user mode. As Ive mentioned before, this can happen if you have hardware problems or if theres a problem with a system call. In our case, its the latter. Well, that was easy - and self-explanatory. So, lets try a more difficult example. For more information about writing kernel modules, including benevolent purposes, please consult the Linux Kernel Module Programming Guide. Difficult example Now another, a more difficult example. We panicked our kernel with. panic. Now, lets try some coding malpractice and create a NULL pointer testcase. Weve seen earlier how to create a kernel module. Now, lets spice up our code. We will now create a classic NULL pointer example, the most typical problem with programs. NULL pointers can lead to all kinds of unexpected behavior, including kernel crashes. Our program, called null-pointer. c. now looks like this: null-pointer. c - A not so simple kernel module to crash kernel. include ltlinuxmodule. hgt Needed by all modules include ltlinuxkernel. hgt Needed for KERNINFO int initmodule(void) printk(KERNINFO We is gonna KABOOM nown) void cleanupmodule(void) printk(KERNINFO Goodbye world. n) We declare a NULL pointer and then dereference it. Not a healthy practice. I guess programmers can explain this more eloquently than I, but you cant have something pointing to nothing get a valid address of a sudden. In kernel, this leads to panic. Indeed, after making this module and trying to insert it, we get panic. Now, the sweet part. Step 1: Analysis Looking at the crash report, we see a goldmine of information: Lets digest the stuff: PANIC: Oops: 0002 1 SMP (check log for details) We have an Oops on CPU 1. 0002 translates to 0010 in binary, meaning no page was found during a write operation in kernel mode. Exactly what were trying to achieve. Were also referred to the log. More about that soon. WARNING: panic task not found There was no task, because we were just trying to load the module, so it died before it could run. In this case, we will need to refer to the log for details. This is done by running log in the crash utility, just as weve learned. The log provides us with what we need: The RIP says nullpointer:initmodule0x190x22. Were making progress here. We know there was a problem with NULL pointer in the initmodule function. Time to disassemble the object and see what went wrong. Theres more useful information, including the fact the kernel was Tainted by our module, the dumping of the CS register and more. Well use this later. First, lets objdump our module. objdump - d - S null-pointer. ko gt tmpwhatever Looking at the file, we see the Rain Man code: The first part, the cleanup is not really interesting. We want the initmodule. The problematic line is even marked for us with a comment: 27 ltinitmodule0x19gt. 27: c6 00 01 movb 0x1,(rax) What do we have here Were trying to load (assembly movb ) value 1 ( 0x1 ) into the RAX register ( rax ). Now, why does it cause such a fuss Lets go back to our log and see the memory address of the RAX register: RAX register is: 0000000000000000. In other words, zero. Were trying to write to memory address 0. This causes the page fault, resulting in kernel panic. Problem solved Of course, in real life, nothing is going to be THAT easy, but its a start. In real life, you will face tons of difficulties, including missing sources, wrong versions of GCC and all kinds of problems that will make crash analysis very, very difficult. Remember that For more information, please take a look at the case study shown in the crash White Paper. Again, its easier when you know what youre looking for. Any example you encounter online will be several orders of magnitude simpler than your real crashes, but it is really difficult demonstrating an all-inclusive, abstract case. Still, I hope my two examples are thorough enough to get you started. Alternative solution (debug kernel) If you have time and space, you may want to download and install a debug kernel for your kernel release. Not for everyday use, of course, but it could come handy when youre analyzing kernel crashes. While it is big and bloated, it may offer additional, useful information that cant be derived from standard kernels. Plus, the objects with debug symbols might be there, so you wont need to recompile them, just dump them and examine the code. Next steps So the big question is, what do crash reports tell us Well, using the available information, we can try to understand what is happening on our troubled systems. First and foremost, we can compare different crashes and try to understand if theres any common element. Then, we can try to look for correlations between separate events, environment changes and system changes, trying to isolate possible culprits to our crashes. Combined with submitting crash reports to vendors and developers, plus the ample use of Google and additional resources, like mailing lists and forums, we might be able to narrow down our search and greatly simply the resolution of problems. Kernel crash bug reporting When your kernel crashes, you may want to take the initiative and submit the report to the vendor, so that they may examine it and possibly fix a bug. This is a very important thing. You will not only be helping yourself but possibly everyone using Linux anywhere. What more, kernel crashes are valuable. If theres a bug somewhere, the developers will find it and fix it. Kerneloops. org is a website dedicated to collecting and listing kernel crashes across the various kernel releases and crash reasons, allowing kernel developers to work on identifying most critical bugs and solving them, as well as providing system administrators, engineers and enthusiasts with a rich database of crucial information. Remember the Fedora 12 kernel crash report We had that nativeapicwritedummy Well, lets see what kerneloops. org has to say about it. As you can see, quite a lot. Not only do you have all sorts of useful statistics, you can actually click on the exception link and go directly to source, to the problematic bit of code and see what gives. This is truly priceless information As we mentioned earlier, some modern Linux distributions have an automated mechanism for kernel crash submission, both anonymously and using a Bugzilla account. For example, Fedora 12 uses the Automatic Bug Reporting Tool (ABRT), which collects crash data, runs a report and then sends it for analysis with the developers. For more details, you may want to read the Wiki. Beforehand, Fedora 11 used kerneloops utility, which sent reports to, yes, you guessed it right, kerneloops. org. Some screenshots. Heres an example of live submission in Fedora 11. And more recently in Fedora 12. Hopefully, all these submissions help make next releases of Linux kernel and the specific distributions smarter, faster, safer, and more stable. Google for information Sounds trivial, but it is not. If youre having a kernel crash, theres a fair chance someone else saw it too. While environments differ from one another, there still might be some commonality for them all. Then again, there might not. A site with 10 database machines and local logins will probably experience different kinds of problems than a 10,000-machine site with heavy use of autofs and NFS. Similarly, companies working with this or that hardware vendor are more likely to undergo platform-specific issues that cant easily be find elsewhere. The simplest way to search for data is to paste the exception RIP into the search box and look for mailing list threads and forum posts discussing same or similar items. Once again, using the Fedora case an an example: Crash analysis results And after you have exhausted all the available channels, its time to go through the information and data collected and try to reach a decisionresolution about the problem at hand. We started with the situation where our kernel is experiencing instability and is crashing. To solve the problem, we setup a robust infrastructure that includes a mechanism for kernel crash collection and tools for the analysis of dumped memory cores. We now understand what the seemingly cryptic reports mean. The combination of all the lessons learned during our long journey allows us to reach a decision what should be done next. How do we treat our crashing machines Are they in for a hardware inspection, reinstallation, something else Maybe theres a bug in the kernel internals Whatever the reason, we have the tools to handle the problems quickly and efficiently. Finally, some last-minute tips, very generic, very generalized, about what to do next: Single crash A single crash may seem as too little information to work with. Dont be discouraged. If you can, analyze the core yourself or send the core to your vendor support. Theres a fair chance you will find something wrong, either with software at hand, the kernel or the hardware underneath. Hardware inspection Speaking of hardware, kernel crashes can be caused by faulty hardware. Such crashes usually seem sporadic and random in reason. If you encounter a host that is experiencing many crashes, all of which have different panic tasks, you may want to considering scheduling some downtime and running a hardware check on the host, including memtest, CPU stress, disk checks, and more. Beyond the scope of this article, Im afraid. The exact definition of what is considered many crashes, how critical the machine is, how much downtime you can afford, and what you intend to do with the situation at hand is individual and will vary from one admin to another. Reinstallation amp software changes Did the software setup change in any way that correlates with the kernel crashes If so, do you know what the change is Can you reproduce the change and the subsequent crashes on other hosts Sometimes, it can be very simple sometimes, you may not be able to easily separate software from the kernel or the underlying hardware. If you can, try to isolate the changes and see how the system responds with or without them. If theres a software bug, then you might be just lucky enough and have to deal with a reproducible error. Kernel crashes due to a certain bug in software should look pretty much the same. But theres no guarantee youll have it that easy. Now, if your system is a generic machine that does not keep any critical data on local disks, you may want to consider wiping the slate clean - start over, with a fresh installation that you know is stable. Its worth a try. Submit to developervendor Regardless of what you discovered or you think the problem is, you should send the kernel crash report to the relevant developer andor vendor. Even if youre absolutely sure you know what the problem is and youve found the cure, you should still leave the official fix in the hands of people who do this kind of work for a living. I have emphasized this several times throughout the article, because I truly believe this is important, valuable and effective. You can easily contribute to the quality of Linux kernel code by submitting a few short text reports. Its as simple and powerful as that. And that would be all for now, I think. Im spent. I still owe you some information, but I cant possibly include everything in a single article. We will revisit some of the stuff when we discuss gdb. Official documentation Heres a selection of highly useful articles and tutorials:

Negoceie Moedas Online Itatiba

Search This Blog

Muing Average Kernel

Comments

Post a Comment

Popular posts from this blog

Tipos De Castiçal Japoneses

Forex Trading Made Easy

Forex121 Review