diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2012-12-11 19:56:33 -0800 | 
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2012-12-11 19:56:33 -0800 | 
| commit | 74b84233458e9db7c160cec67638efdbec748ca9 (patch) | |
| tree | 0d174c7386386dca17f494396d7febc300ffa3dd | |
| parent | 507447473756e316f3f182324071389a51736a83 (diff) | |
| parent | a71c8bc5dfefbbf80ef90739791554ef7ea4401b (diff) | |
| download | olio-linux-3.10-74b84233458e9db7c160cec67638efdbec748ca9.tar.xz olio-linux-3.10-74b84233458e9db7c160cec67638efdbec748ca9.zip | |
Merge branch 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 BSP hotplug changes from Ingo Molnar:
 "This tree enables CPU#0 (the boot processor) to be onlined/offlined on
  x86, just like any other CPU.  Enabled on Intel CPUs for now.
  Allowing this required the identification and fixing of latent CPU#0
  assumptions (such as CPU#0 initializations, etc.) in the x86
  architecture code, plus the identification of barriers to
  BSP-offlining, such as active PIC interrupts which can only be
  serviced on the BSP.
  It's behind a default-off option, and there's a debug option that
  allows the automatic testing of this feature.
  The motivation of this feature is to allow and prepare for true
  CPU-hotplug hardware support: recent changes to MCE support enable us
  to detect a deteriorating but not yet hard-failing L1/L2 cache on a
  CPU that could be soft-unplugged - or a failing L3 cache on a
  multi-socket system.
  Note that true hardware hot-plug is not yet fully enabled by this,
  because that requires a special platform wakeup sequence to be sent to
  the freshly powered up CPU#0.  Future patches for this are planned,
  once such a platform exists.  Chicken and egg"
* 'x86-bsp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86, topology: Debug CPU0 hotplug
  x86/i387.c: Initialize thread xstate only on CPU0 only once
  x86, hotplug: Handle retrigger irq by the first available CPU
  x86, hotplug: The first online processor saves the MTRR state
  x86, hotplug: During CPU0 online, enable x2apic, set_numa_node.
  x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI
  x86-32, hotplug: Add start_cpu0() entry point to head_32.S
  x86-64, hotplug: Add start_cpu0() entry point to head_64.S
  kernel/cpu.c: Add comment for priority in cpu_hotplug_pm_callback
  x86, hotplug, suspend: Online CPU0 for suspend or hibernate
  x86, hotplug: Support functions for CPU0 online/offline
  x86, topology: Don't offline CPU0 if any PIC irq can not be migrated out of it
  x86, Kconfig: Add config switch for CPU0 hotplug
  doc: Add x86 CPU0 online/offline feature
| -rw-r--r-- | Documentation/cpu-hotplug.txt | 24 | ||||
| -rw-r--r-- | Documentation/kernel-parameters.txt | 14 | ||||
| -rw-r--r-- | arch/x86/Kconfig | 44 | ||||
| -rw-r--r-- | arch/x86/include/asm/cpu.h | 4 | ||||
| -rw-r--r-- | arch/x86/include/asm/smp.h | 1 | ||||
| -rw-r--r-- | arch/x86/kernel/apic/io_apic.c | 4 | ||||
| -rw-r--r-- | arch/x86/kernel/cpu/common.c | 5 | ||||
| -rw-r--r-- | arch/x86/kernel/cpu/mtrr/main.c | 9 | ||||
| -rw-r--r-- | arch/x86/kernel/head_32.S | 13 | ||||
| -rw-r--r-- | arch/x86/kernel/head_64.S | 16 | ||||
| -rw-r--r-- | arch/x86/kernel/i387.c | 6 | ||||
| -rw-r--r-- | arch/x86/kernel/smpboot.c | 149 | ||||
| -rw-r--r-- | arch/x86/kernel/topology.c | 101 | ||||
| -rw-r--r-- | arch/x86/power/cpu.c | 82 | ||||
| -rw-r--r-- | kernel/cpu.c | 5 | 
15 files changed, 436 insertions, 41 deletions
| diff --git a/Documentation/cpu-hotplug.txt b/Documentation/cpu-hotplug.txt index 66ef8f35613..9f401350f50 100644 --- a/Documentation/cpu-hotplug.txt +++ b/Documentation/cpu-hotplug.txt @@ -207,6 +207,30 @@ by making it not-removable.  In such cases you will also notice that the online file is missing under cpu0. +Q: Is CPU0 removable on X86? +A: Yes. If kernel is compiled with CONFIG_BOOTPARAM_HOTPLUG_CPU0=y, CPU0 is +removable by default. Otherwise, CPU0 is also removable by kernel option +cpu0_hotplug. + +But some features depend on CPU0. Two known dependencies are: + +1. Resume from hibernate/suspend depends on CPU0. Hibernate/suspend will fail if +CPU0 is offline and you need to online CPU0 before hibernate/suspend can +continue. +2. PIC interrupts also depend on CPU0. CPU0 can't be removed if a PIC interrupt +is detected. + +It's said poweroff/reboot may depend on CPU0 on some machines although I haven't +seen any poweroff/reboot failure so far after CPU0 is offline on a few tested +machines. + +Please let me know if you know or see any other dependencies of CPU0. + +If the dependencies are under your control, you can turn on CPU0 hotplug feature +either by CONFIG_BOOTPARAM_HOTPLUG_CPU0 or by kernel parameter cpu0_hotplug. + +--Fenghua Yu <fenghua.yu@intel.com> +  Q: How do i find out if a particular CPU is not removable?  A: Depending on the implementation, some architectures may show this by the  absence of the "online" file. This is done if it can be determined ahead of diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt index 03d1251a915..5190f170641 100644 --- a/Documentation/kernel-parameters.txt +++ b/Documentation/kernel-parameters.txt @@ -1984,6 +1984,20 @@ bytes respectively. Such letter suffixes can also be entirely omitted.  	nox2apic	[X86-64,APIC] Do not enable x2APIC mode. +	cpu0_hotplug	[X86] Turn on CPU0 hotplug feature when +			CONFIG_BOOTPARAM_HOTPLUG_CPU0 is off. +			Some features depend on CPU0. Known dependencies are: +			1. Resume from suspend/hibernate depends on CPU0. +			Suspend/hibernate will fail if CPU0 is offline and you +			need to online CPU0 before suspend/hibernate. +			2. PIC interrupts also depend on CPU0. CPU0 can't be +			removed if a PIC interrupt is detected. +			It's said poweroff/reboot may depend on CPU0 on some +			machines although I haven't seen such issues so far +			after CPU0 is offline on a few tested machines. +			If the dependencies are under your control, you can +			turn on cpu0_hotplug. +  	nptcg=		[IA-64] Override max number of concurrent global TLB  			purges which is reported from either PAL_VM_SUMMARY or  			SAL PALO. diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 6c304438b50..2d643255c40 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1698,6 +1698,50 @@ config HOTPLUG_CPU  	    automatically on SMP systems. )  	  Say N if you want to disable CPU hotplug. +config BOOTPARAM_HOTPLUG_CPU0 +	bool "Set default setting of cpu0_hotpluggable" +	default n +	depends on HOTPLUG_CPU && EXPERIMENTAL +	---help--- +	  Set whether default state of cpu0_hotpluggable is on or off. + +	  Say Y here to enable CPU0 hotplug by default. If this switch +	  is turned on, there is no need to give cpu0_hotplug kernel +	  parameter and the CPU0 hotplug feature is enabled by default. + +	  Please note: there are two known CPU0 dependencies if you want +	  to enable the CPU0 hotplug feature either by this switch or by +	  cpu0_hotplug kernel parameter. + +	  First, resume from hibernate or suspend always starts from CPU0. +	  So hibernate and suspend are prevented if CPU0 is offline. + +	  Second dependency is PIC interrupts always go to CPU0. CPU0 can not +	  offline if any interrupt can not migrate out of CPU0. There may +	  be other CPU0 dependencies. + +	  Please make sure the dependencies are under your control before +	  you enable this feature. + +	  Say N if you don't want to enable CPU0 hotplug feature by default. +	  You still can enable the CPU0 hotplug feature at boot by kernel +	  parameter cpu0_hotplug. + +config DEBUG_HOTPLUG_CPU0 +	def_bool n +	prompt "Debug CPU0 hotplug" +	depends on HOTPLUG_CPU && EXPERIMENTAL +	---help--- +	  Enabling this option offlines CPU0 (if CPU0 can be offlined) as +	  soon as possible and boots up userspace with CPU0 offlined. User +	  can online CPU0 back after boot time. + +	  To debug CPU0 hotplug, you need to enable CPU0 offline/online +	  feature by either turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during +	  compilation or giving cpu0_hotplug kernel parameter at boot. + +	  If unsure, say N. +  config COMPAT_VDSO  	def_bool y  	prompt "Compat VDSO support" diff --git a/arch/x86/include/asm/cpu.h b/arch/x86/include/asm/cpu.h index 4564c8e28a3..5f9a1243190 100644 --- a/arch/x86/include/asm/cpu.h +++ b/arch/x86/include/asm/cpu.h @@ -28,6 +28,10 @@ struct x86_cpu {  #ifdef CONFIG_HOTPLUG_CPU  extern int arch_register_cpu(int num);  extern void arch_unregister_cpu(int); +extern void __cpuinit start_cpu0(void); +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0 +extern int _debug_hotplug_cpu(int cpu, int action); +#endif  #endif  DECLARE_PER_CPU(int, cpu_state); diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 4f19a152603..b073aaea747 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -166,6 +166,7 @@ void native_send_call_func_ipi(const struct cpumask *mask);  void native_send_call_func_single_ipi(int cpu);  void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle); +void smp_store_boot_cpu_info(void);  void smp_store_cpu_info(int id);  #define cpu_physical_id(cpu)	per_cpu(x86_cpu_to_apicid, cpu) diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c index 1817fa91102..f78fc2b4deb 100644 --- a/arch/x86/kernel/apic/io_apic.c +++ b/arch/x86/kernel/apic/io_apic.c @@ -2199,9 +2199,11 @@ static int ioapic_retrigger_irq(struct irq_data *data)  {  	struct irq_cfg *cfg = data->chip_data;  	unsigned long flags; +	int cpu;  	raw_spin_lock_irqsave(&vector_lock, flags); -	apic->send_IPI_mask(cpumask_of(cpumask_first(cfg->domain)), cfg->vector); +	cpu = cpumask_first_and(cfg->domain, cpu_online_mask); +	apic->send_IPI_mask(cpumask_of(cpu), cfg->vector);  	raw_spin_unlock_irqrestore(&vector_lock, flags);  	return 1; diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 7505f7b13e7..ca165ac6793 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1237,7 +1237,7 @@ void __cpuinit cpu_init(void)  	oist = &per_cpu(orig_ist, cpu);  #ifdef CONFIG_NUMA -	if (cpu != 0 && this_cpu_read(numa_node) == 0 && +	if (this_cpu_read(numa_node) == 0 &&  	    early_cpu_to_node(cpu) != NUMA_NO_NODE)  		set_numa_node(early_cpu_to_node(cpu));  #endif @@ -1269,8 +1269,7 @@ void __cpuinit cpu_init(void)  	barrier();  	x86_configure_nx(); -	if (cpu != 0) -		enable_x2apic(); +	enable_x2apic();  	/*  	 * set up and load the per-CPU TSS diff --git a/arch/x86/kernel/cpu/mtrr/main.c b/arch/x86/kernel/cpu/mtrr/main.c index 6b96110bb0c..e4c1a418453 100644 --- a/arch/x86/kernel/cpu/mtrr/main.c +++ b/arch/x86/kernel/cpu/mtrr/main.c @@ -695,11 +695,16 @@ void mtrr_ap_init(void)  }  /** - * Save current fixed-range MTRR state of the BSP + * Save current fixed-range MTRR state of the first cpu in cpu_online_mask.   */  void mtrr_save_state(void)  { -	smp_call_function_single(0, mtrr_save_fixed_ranges, NULL, 1); +	int first_cpu; + +	get_online_cpus(); +	first_cpu = cpumask_first(cpu_online_mask); +	smp_call_function_single(first_cpu, mtrr_save_fixed_ranges, NULL, 1); +	put_online_cpus();  }  void set_mtrr_aps_delayed_init(void) diff --git a/arch/x86/kernel/head_32.S b/arch/x86/kernel/head_32.S index 4dac2f68ed4..8e7f6556028 100644 --- a/arch/x86/kernel/head_32.S +++ b/arch/x86/kernel/head_32.S @@ -266,6 +266,19 @@ num_subarch_entries = (. - subarch_entries) / 4  	jmp default_entry  #endif /* CONFIG_PARAVIRT */ +#ifdef CONFIG_HOTPLUG_CPU +/* + * Boot CPU0 entry point. It's called from play_dead(). Everything has been set + * up already except stack. We just set up stack here. Then call + * start_secondary(). + */ +ENTRY(start_cpu0) +	movl stack_start, %ecx +	movl %ecx, %esp +	jmp  *(initial_code) +ENDPROC(start_cpu0) +#endif +  /*   * Non-boot CPU entry point; entered from trampoline.S   * We can't lgdt here, because lgdt itself uses a data segment, but diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 94bf9cc2c7e..980053c4b9c 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -252,6 +252,22 @@ ENTRY(secondary_startup_64)  	pushq	%rax		# target address in negative space  	lretq +#ifdef CONFIG_HOTPLUG_CPU +/* + * Boot CPU0 entry point. It's called from play_dead(). Everything has been set + * up already except stack. We just set up stack here. Then call + * start_secondary(). + */ +ENTRY(start_cpu0) +	movq stack_start(%rip),%rsp +	movq	initial_code(%rip),%rax +	pushq	$0		# fake return address to stop unwinder +	pushq	$__KERNEL_CS	# set correct cs +	pushq	%rax		# target address in negative space +	lretq +ENDPROC(start_cpu0) +#endif +  	/* SMP bootup changes these two */  	__REFDATA  	.align	8 diff --git a/arch/x86/kernel/i387.c b/arch/x86/kernel/i387.c index 675a0501244..245a71db401 100644 --- a/arch/x86/kernel/i387.c +++ b/arch/x86/kernel/i387.c @@ -175,7 +175,11 @@ void __cpuinit fpu_init(void)  		cr0 |= X86_CR0_EM;  	write_cr0(cr0); -	if (!smp_processor_id()) +	/* +	 * init_thread_xstate is only called once to avoid overriding +	 * xstate_size during boot time or during CPU hotplug. +	 */ +	if (xstate_size == 0)  		init_thread_xstate();  	mxcsr_feature_mask_init(); diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index f3e2ec878b8..c635663b20d 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -127,8 +127,8 @@ EXPORT_PER_CPU_SYMBOL(cpu_info);  atomic_t init_deasserted;  /* - * Report back to the Boot Processor. - * Running on AP. + * Report back to the Boot Processor during boot time or to the caller processor + * during CPU online.   */  static void __cpuinit smp_callin(void)  { @@ -140,15 +140,17 @@ static void __cpuinit smp_callin(void)  	 * we may get here before an INIT-deassert IPI reaches  	 * our local APIC.  We have to wait for the IPI or we'll  	 * lock up on an APIC access. +	 * +	 * Since CPU0 is not wakened up by INIT, it doesn't wait for the IPI.  	 */ -	if (apic->wait_for_init_deassert) +	cpuid = smp_processor_id(); +	if (apic->wait_for_init_deassert && cpuid != 0)  		apic->wait_for_init_deassert(&init_deasserted);  	/*  	 * (This works even if the APIC is not enabled.)  	 */  	phys_id = read_apic_id(); -	cpuid = smp_processor_id();  	if (cpumask_test_cpu(cpuid, cpu_callin_mask)) {  		panic("%s: phys CPU#%d, CPU#%d already present??\n", __func__,  					phys_id, cpuid); @@ -230,6 +232,8 @@ static void __cpuinit smp_callin(void)  	cpumask_set_cpu(cpuid, cpu_callin_mask);  } +static int cpu0_logical_apicid; +static int enable_start_cpu0;  /*   * Activate a secondary processor.   */ @@ -245,6 +249,8 @@ notrace static void __cpuinit start_secondary(void *unused)  	preempt_disable();  	smp_callin(); +	enable_start_cpu0 = 0; +  #ifdef CONFIG_X86_32  	/* switch away from the initial page table */  	load_cr3(swapper_pg_dir); @@ -281,19 +287,30 @@ notrace static void __cpuinit start_secondary(void *unused)  	cpu_idle();  } +void __init smp_store_boot_cpu_info(void) +{ +	int id = 0; /* CPU 0 */ +	struct cpuinfo_x86 *c = &cpu_data(id); + +	*c = boot_cpu_data; +	c->cpu_index = id; +} +  /*   * The bootstrap kernel entry code has set these up. Save them for   * a given CPU   */ -  void __cpuinit smp_store_cpu_info(int id)  {  	struct cpuinfo_x86 *c = &cpu_data(id);  	*c = boot_cpu_data;  	c->cpu_index = id; -	if (id != 0) -		identify_secondary_cpu(c); +	/* +	 * During boot time, CPU0 has this setup already. Save the info when +	 * bringing up AP or offlined CPU0. +	 */ +	identify_secondary_cpu(c);  }  static bool __cpuinit @@ -483,7 +500,7 @@ void __inquire_remote_apic(int apicid)   * won't ... remember to clear down the APIC, etc later.   */  int __cpuinit -wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip) +wakeup_secondary_cpu_via_nmi(int apicid, unsigned long start_eip)  {  	unsigned long send_status, accept_status = 0;  	int maxlvt; @@ -491,7 +508,7 @@ wakeup_secondary_cpu_via_nmi(int logical_apicid, unsigned long start_eip)  	/* Target chip */  	/* Boot on the stack */  	/* Kick the second */ -	apic_icr_write(APIC_DM_NMI | apic->dest_logical, logical_apicid); +	apic_icr_write(APIC_DM_NMI | apic->dest_logical, apicid);  	pr_debug("Waiting for send to finish...\n");  	send_status = safe_apic_wait_icr_idle(); @@ -651,6 +668,63 @@ static void __cpuinit announce_cpu(int cpu, int apicid)  			node, cpu, apicid);  } +static int wakeup_cpu0_nmi(unsigned int cmd, struct pt_regs *regs) +{ +	int cpu; + +	cpu = smp_processor_id(); +	if (cpu == 0 && !cpu_online(cpu) && enable_start_cpu0) +		return NMI_HANDLED; + +	return NMI_DONE; +} + +/* + * Wake up AP by INIT, INIT, STARTUP sequence. + * + * Instead of waiting for STARTUP after INITs, BSP will execute the BIOS + * boot-strap code which is not a desired behavior for waking up BSP. To + * void the boot-strap code, wake up CPU0 by NMI instead. + * + * This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined + * (i.e. physically hot removed and then hot added), NMI won't wake it up. + * We'll change this code in the future to wake up hard offlined CPU0 if + * real platform and request are available. + */ +static int __cpuinit +wakeup_cpu_via_init_nmi(int cpu, unsigned long start_ip, int apicid, +	       int *cpu0_nmi_registered) +{ +	int id; +	int boot_error; + +	/* +	 * Wake up AP by INIT, INIT, STARTUP sequence. +	 */ +	if (cpu) +		return wakeup_secondary_cpu_via_init(apicid, start_ip); + +	/* +	 * Wake up BSP by nmi. +	 * +	 * Register a NMI handler to help wake up CPU0. +	 */ +	boot_error = register_nmi_handler(NMI_LOCAL, +					  wakeup_cpu0_nmi, 0, "wake_cpu0"); + +	if (!boot_error) { +		enable_start_cpu0 = 1; +		*cpu0_nmi_registered = 1; +		if (apic->dest_logical == APIC_DEST_LOGICAL) +			id = cpu0_logical_apicid; +		else +			id = apicid; +		boot_error = wakeup_secondary_cpu_via_nmi(id, start_ip); +	} + +	return boot_error; +} +  /*   * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad   * (ie clustered apic addressing mode), this is a LOGICAL apic ID. @@ -666,6 +740,7 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)  	unsigned long boot_error = 0;  	int timeout; +	int cpu0_nmi_registered = 0;  	/* Just in case we booted with a single CPU. */  	alternatives_enable_smp(); @@ -713,13 +788,16 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)  	}  	/* -	 * Kick the secondary CPU. Use the method in the APIC driver -	 * if it's defined - or use an INIT boot APIC message otherwise: +	 * Wake up a CPU in difference cases: +	 * - Use the method in the APIC driver if it's defined +	 * Otherwise, +	 * - Use an INIT boot APIC message for APs or NMI for BSP.  	 */  	if (apic->wakeup_secondary_cpu)  		boot_error = apic->wakeup_secondary_cpu(apicid, start_ip);  	else -		boot_error = wakeup_secondary_cpu_via_init(apicid, start_ip); +		boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid, +						     &cpu0_nmi_registered);  	if (!boot_error) {  		/* @@ -784,6 +862,13 @@ static int __cpuinit do_boot_cpu(int apicid, int cpu, struct task_struct *idle)  		 */  		smpboot_restore_warm_reset_vector();  	} +	/* +	 * Clean up the nmi handler. Do this after the callin and callout sync +	 * to avoid impact of possible long unregister time. +	 */ +	if (cpu0_nmi_registered) +		unregister_nmi_handler(NMI_LOCAL, "wake_cpu0"); +  	return boot_error;  } @@ -797,7 +882,7 @@ int __cpuinit native_cpu_up(unsigned int cpu, struct task_struct *tidle)  	pr_debug("++++++++++++++++++++=_---CPU UP  %u\n", cpu); -	if (apicid == BAD_APICID || apicid == boot_cpu_physical_apicid || +	if (apicid == BAD_APICID ||  	    !physid_isset(apicid, phys_cpu_present_map) ||  	    !apic->apic_id_valid(apicid)) {  		pr_err("%s: bad cpu %d\n", __func__, cpu); @@ -995,7 +1080,7 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)  	/*  	 * Setup boot CPU information  	 */ -	smp_store_cpu_info(0); /* Final full version of the data */ +	smp_store_boot_cpu_info(); /* Final full version of the data */  	cpumask_copy(cpu_callin_mask, cpumask_of(0));  	mb(); @@ -1031,6 +1116,11 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus)  	 */  	setup_local_APIC(); +	if (x2apic_mode) +		cpu0_logical_apicid = apic_read(APIC_LDR); +	else +		cpu0_logical_apicid = GET_APIC_LOGICAL_ID(apic_read(APIC_LDR)); +  	/*  	 * Enable IO APIC before setting up error vector  	 */ @@ -1219,19 +1309,6 @@ void cpu_disable_common(void)  int native_cpu_disable(void)  { -	int cpu = smp_processor_id(); - -	/* -	 * Perhaps use cpufreq to drop frequency, but that could go -	 * into generic code. -	 * -	 * We won't take down the boot processor on i386 due to some -	 * interrupts only being able to be serviced by the BSP. -	 * Especially so if we're not using an IOAPIC	-zwane -	 */ -	if (cpu == 0) -		return -EBUSY; -  	clear_local_APIC();  	cpu_disable_common(); @@ -1271,6 +1348,14 @@ void play_dead_common(void)  	local_irq_disable();  } +static bool wakeup_cpu0(void) +{ +	if (smp_processor_id() == 0 && enable_start_cpu0) +		return true; + +	return false; +} +  /*   * We need to flush the caches before going to sleep, lest we have   * dirty data in our caches when we come back up. @@ -1334,6 +1419,11 @@ static inline void mwait_play_dead(void)  		__monitor(mwait_ptr, 0, 0);  		mb();  		__mwait(eax, 0); +		/* +		 * If NMI wants to wake up CPU0, start CPU0. +		 */ +		if (wakeup_cpu0()) +			start_cpu0();  	}  } @@ -1344,6 +1434,11 @@ static inline void hlt_play_dead(void)  	while (1) {  		native_halt(); +		/* +		 * If NMI wants to wake up CPU0, start CPU0. +		 */ +		if (wakeup_cpu0()) +			start_cpu0();  	}  } diff --git a/arch/x86/kernel/topology.c b/arch/x86/kernel/topology.c index 76ee97709a0..6e60b5fe224 100644 --- a/arch/x86/kernel/topology.c +++ b/arch/x86/kernel/topology.c @@ -30,23 +30,110 @@  #include <linux/mmzone.h>  #include <linux/init.h>  #include <linux/smp.h> +#include <linux/irq.h>  #include <asm/cpu.h>  static DEFINE_PER_CPU(struct x86_cpu, cpu_devices);  #ifdef CONFIG_HOTPLUG_CPU + +#ifdef CONFIG_BOOTPARAM_HOTPLUG_CPU0 +static int cpu0_hotpluggable = 1; +#else +static int cpu0_hotpluggable; +static int __init enable_cpu0_hotplug(char *str) +{ +	cpu0_hotpluggable = 1; +	return 1; +} + +__setup("cpu0_hotplug", enable_cpu0_hotplug); +#endif + +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0 +/* + * This function offlines a CPU as early as possible and allows userspace to + * boot up without the CPU. The CPU can be onlined back by user after boot. + * + * This is only called for debugging CPU offline/online feature. + */ +int __ref _debug_hotplug_cpu(int cpu, int action) +{ +	struct device *dev = get_cpu_device(cpu); +	int ret; + +	if (!cpu_is_hotpluggable(cpu)) +		return -EINVAL; + +	cpu_hotplug_driver_lock(); + +	switch (action) { +	case 0: +		ret = cpu_down(cpu); +		if (!ret) { +			pr_info("CPU %u is now offline\n", cpu); +			kobject_uevent(&dev->kobj, KOBJ_OFFLINE); +		} else +			pr_debug("Can't offline CPU%d.\n", cpu); +		break; +	case 1: +		ret = cpu_up(cpu); +		if (!ret) +			kobject_uevent(&dev->kobj, KOBJ_ONLINE); +		else +			pr_debug("Can't online CPU%d.\n", cpu); +		break; +	default: +		ret = -EINVAL; +	} + +	cpu_hotplug_driver_unlock(); + +	return ret; +} + +static int __init debug_hotplug_cpu(void) +{ +	_debug_hotplug_cpu(0, 0); +	return 0; +} + +late_initcall_sync(debug_hotplug_cpu); +#endif /* CONFIG_DEBUG_HOTPLUG_CPU0 */ +  int __ref arch_register_cpu(int num)  { +	struct cpuinfo_x86 *c = &cpu_data(num); + +	/* +	 * Currently CPU0 is only hotpluggable on Intel platforms. Other +	 * vendors can add hotplug support later. +	 */ +	if (c->x86_vendor != X86_VENDOR_INTEL) +		cpu0_hotpluggable = 0; +  	/* -	 * CPU0 cannot be offlined due to several -	 * restrictions and assumptions in kernel. This basically -	 * doesn't add a control file, one cannot attempt to offline -	 * BSP. +	 * Two known BSP/CPU0 dependencies: Resume from suspend/hibernate +	 * depends on BSP. PIC interrupts depend on BSP.  	 * -	 * Also certain PCI quirks require not to enable hotplug control -	 * for all CPU's. +	 * If the BSP depencies are under control, one can tell kernel to +	 * enable BSP hotplug. This basically adds a control file and +	 * one can attempt to offline BSP.  	 */ -	if (num) +	if (num == 0 && cpu0_hotpluggable) { +		unsigned int irq; +		/* +		 * We won't take down the boot processor on i386 if some +		 * interrupts only are able to be serviced by the BSP in PIC. +		 */ +		for_each_active_irq(irq) { +			if (!IO_APIC_IRQ(irq) && irq_has_action(irq)) { +				cpu0_hotpluggable = 0; +				break; +			} +		} +	} +	if (num || cpu0_hotpluggable)  		per_cpu(cpu_devices, num).cpu.hotpluggable = 1;  	return register_cpu(&per_cpu(cpu_devices, num).cpu, num); diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index 218cdb16163..120cee1c3f8 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -21,6 +21,7 @@  #include <asm/suspend.h>  #include <asm/debugreg.h>  #include <asm/fpu-internal.h> /* pcntxt_mask */ +#include <asm/cpu.h>  #ifdef CONFIG_X86_32  static struct saved_context saved_context; @@ -237,3 +238,84 @@ void restore_processor_state(void)  #ifdef CONFIG_X86_32  EXPORT_SYMBOL(restore_processor_state);  #endif + +/* + * When bsp_check() is called in hibernate and suspend, cpu hotplug + * is disabled already. So it's unnessary to handle race condition between + * cpumask query and cpu hotplug. + */ +static int bsp_check(void) +{ +	if (cpumask_first(cpu_online_mask) != 0) { +		pr_warn("CPU0 is offline.\n"); +		return -ENODEV; +	} + +	return 0; +} + +static int bsp_pm_callback(struct notifier_block *nb, unsigned long action, +			   void *ptr) +{ +	int ret = 0; + +	switch (action) { +	case PM_SUSPEND_PREPARE: +	case PM_HIBERNATION_PREPARE: +		ret = bsp_check(); +		break; +#ifdef CONFIG_DEBUG_HOTPLUG_CPU0 +	case PM_RESTORE_PREPARE: +		/* +		 * When system resumes from hibernation, online CPU0 because +		 * 1. it's required for resume and +		 * 2. the CPU was online before hibernation +		 */ +		if (!cpu_online(0)) +			_debug_hotplug_cpu(0, 1); +		break; +	case PM_POST_RESTORE: +		/* +		 * When a resume really happens, this code won't be called. +		 * +		 * This code is called only when user space hibernation software +		 * prepares for snapshot device during boot time. So we just +		 * call _debug_hotplug_cpu() to restore to CPU0's state prior to +		 * preparing the snapshot device. +		 * +		 * This works for normal boot case in our CPU0 hotplug debug +		 * mode, i.e. CPU0 is offline and user mode hibernation +		 * software initializes during boot time. +		 * +		 * If CPU0 is online and user application accesses snapshot +		 * device after boot time, this will offline CPU0 and user may +		 * see different CPU0 state before and after accessing +		 * the snapshot device. But hopefully this is not a case when +		 * user debugging CPU0 hotplug. Even if users hit this case, +		 * they can easily online CPU0 back. +		 * +		 * To simplify this debug code, we only consider normal boot +		 * case. Otherwise we need to remember CPU0's state and restore +		 * to that state and resolve racy conditions etc. +		 */ +		_debug_hotplug_cpu(0, 0); +		break; +#endif +	default: +		break; +	} +	return notifier_from_errno(ret); +} + +static int __init bsp_pm_check_init(void) +{ +	/* +	 * Set this bsp_pm_callback as lower priority than +	 * cpu_hotplug_pm_callback. So cpu_hotplug_pm_callback will be called +	 * earlier to disable cpu hotplug before bsp online check. +	 */ +	pm_notifier(bsp_pm_callback, -INT_MAX); +	return 0; +} + +core_initcall(bsp_pm_check_init); diff --git a/kernel/cpu.c b/kernel/cpu.c index f45657f1eb8..3046a503242 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -603,6 +603,11 @@ cpu_hotplug_pm_callback(struct notifier_block *nb,  static int __init cpu_hotplug_pm_sync_init(void)  { +	/* +	 * cpu_hotplug_pm_callback has higher priority than x86 +	 * bsp_pm_callback which depends on cpu_hotplug_pm_callback +	 * to disable cpu hotplug to avoid cpu hotplug race. +	 */  	pm_notifier(cpu_hotplug_pm_callback, 0);  	return 0;  } |