Nvidia drivers on Debian and Ubuntu
Official Documentation:
- Debian: https://wiki.debian.org/NvidiaGraphicsDrivers
- Ubuntu: https://help.ubuntu.com/community/NvidiaDriversInstallation
Dont forget to ON the GPU in UEFI/BIOS!
Debian
- enable
non-free
andcontrib
for all repos. Eg, update every entry in/etc/apt/sources.list
:
# old
deb http://deb.debian.org/debian bookworm main non-free-firmware
# change to
deb http://deb.debian.org/debian bookworm main non-free-firmware non-free contrib
apt install nvidia-driver firmware-misc-nonfree firmware-misc-nonfree firmware-misc-nonfree
- For Secure Boot:
mokutil --import /var/lib/dkms/mok.pub
Xorg -configure
cp /root/xorg.conf.new /etc/X11/xorg.conf
echo -e 'Section "Device"\n\tIdentifier "My GPU"\n\tDriver "nvidia"\nEndSection' > /etc/X11/xorg.conf.d/20-nvidia.conf
- Settings take effect after reboot
Ubuntu
Ubuntu does most of the setup for you via the Ubuntu Drivers
click-through GUI install tool and most of the time this is fine. Its still possible to mess things up so badly your Laptop cant even get to a terminal though! This is a collection of fixes I’ve had to do in the past obtained from root
s ~/.bash_history
. You can see the driver version change as Im forced to do an emergency upgrade.
My latest Nvidia disaster seemed to have been caused by most of the Nvidia packages being randomly removed from the system. Not sure if this was an errant corporate compliance script, bad packaging or sheer bad luck. It was so bad I even had to reset my XFCE session as well for some reason.
Usually some combination of rebooting, updating the UEFI/BIOS, updating device firmware with fwupd and these tricks will get you up and running again:
Laptop wont boot!
Eliminate hardware problem
EFI/BIOS vendor splash screen/messages appear and screen looks OK? - hardware probably fine. If not (black screen), there are very limited options these days:
- Remove the battery (if not glued… sigh…) wait… replace… reboot
- Reset via the tiny hole if you have one - look carefully, I found one on my laptop in a different location
- Perhaps the laptop screen died? Boot with an external monitor
- Wiggle the screen. Flashing or distorted graphics mean screen is not completely broken but has physical problem (bad/loose/kinked cable, cracked, etc)
- Take Laptop apart and remove any disk drive. Not joking this has happened to me before, a 2.5” SSD failed completely at random after a reboot and whatever happened was so bad that not only was the drive instantly toast, it also prevented the Laptop from booting at all. If the laptop now magically boots, replace drive and reinstall OS.
- No idea, anything spilt inside the computer? At this point youve confirmed bad hardware
Linux problem
Black screen/crashed AFTER the UEFI/BIOS vendor splash screen/messages AND the grub
menu (assuming your system is configured to show one at all…). Could it be the Nvidia driver? Maybe.
If you already configured grub
to boot linux into text mode you should see some messages indicating where the bootstrap is failing that should be helpful. If not configured, you may be stuck with a completely black screen. Pressing esc
should show the boot messages but its possible your laptop has completely frozen. Trying to toggle capslock will often prove this as will the magic sysreq keys:
ctrl
+alt
+sysreq/print screen
+s
- emergency syncctrl
+alt
+sysreq/print screen
+u
- emergency unmountctrl
+alt
+sysreq/print screen
+b
- emergency reboot
If nothing happens its safe to say your Laptop OS is toast for the moment so hold power button for 10 seconds and go find a Ubuntu Live bootable USB matching the laptop OS.
Disconnect all external monitors, boot the USB and mount the host filesystems - see notes in fix/setup grub for how to do this including if your system is using LUKS.
With a chroot
into your Laptop, you can try the steps below:
ubuntu-drivers CLI
If your lucky, you may still have access to a terminal somewhere. In this case you can try to use ubuntu-drivers
to install the Nvidia drivers:
ubuntu-drivers --list
# eg:
ubuntu-drivers install nvidia-driver-525-server
If this looks like it did something, then reboot and hope for the best. Sometimes though, this command will tell you the drivers are already installed when they arent, or at least they arent working. Use nvidia-smi
to test (not in Live USB).
Try to fix nvidia packages by remove/reinstall
Remove the nvidia drivers and then try to reinstall them yourself using apt
, like this:
# use dpkg and grep to find nvidia related packages, eg:
apt remove xserver-xorg-video-nvidia-525 nvidia-prime nvidia-settings screen-resolution-extra
# remove random crap/free up space
apt autoremove
Make sure all available firmwares are installed:
apt install linux-firmware linux-firmware-nonfree firmware-linux-misc
Then try to reinstall the drivers:
apt install nvidia-driver-530 nvidia-dkms-530
DKMS should rebuild the nvidia
modules for you and update initramfs. Verify the module files exist for your kernel (pick the right version yourself):
# eg:
find /lib/modules/5.19.0.45-generic -iname "*nvidia*"
Missing files? Try to force a recompile (DKMS is supposed to do this though):
dpkg-reconfigure nvidia-kernel-source-525
dpkg-reconfigure nvidia-dkms-530
Force rebuilding initramfs:
update-initramfs -u
Nvidia driver fails to build - not enough free space in /boot
Oldschool install guides say to use just a few hundred M for /boot
which is barely enough for one jumbo Ubuntu kernel let alone a handful. A typical 700M /boot
can only support about 2 kernels so upgrades are risky.
Long term, plan to increase space in /boot
to about 2G to allow keeping a few known good kernels on hand and allow routine upgrades to succeed by shrinking the main (LUKS?) partition, moving it left/right and growing the /boot
partition.
The quick fix here is to free up space in /boot
by removing old/unused kernels
Find installed kernels:
dpkg -l |grep linux-image
Kernel running now (irrelevant for Live USB) - dont remove this kernel unless you know what your doing:
uname -a
Remove an old kernel and its modules. Removing may fail due to lack of space, in this case - keep removing more old kernels until there is enough space for scripts to run:
apt remove --purge linux-image-5.19.0-46-generic linux-modules-5.19.0-46-generic
Reinstall the running kernel. This is for experts only, but as you might have gathered from this post the whole procedure is. There is a good chance to break your system enough to need a rescue USB here but this is a good way to re-run previously failed post install scripts for the current kernel that previously failed due to lack of free space on /boot
if enough is now available:
dpkg --force-all --purge linux-image-5.19.0-46-generic linux-modules-5.19.0-46-generic
apt install linux-image-5.19.0-46-generic linux-modules-5.19.0-46-generic
# You can also reinstall using apt, from my history I did this too. Pretty sure
# this was deadlocked due to lack of space in `/boot` which meant only `dpkg`
# worked
apt install --reinstall linux-image-5.19.0-46-generic linux-modules-5.19.0-46-generic
Nvidia driver built but refuses to modprobe/errors in dmesg
Kernel module files were generated, included in initramfs, definitely using Nvidia GPU in UEFI/BIOS and double checked Nvidia GPU hardware is really present in laptop SKU?
Could be something to do with Secure Boot… For you to find out. The phrase to google is “enrolling a Machine-Owners’ Key” or MOK.
Disabling secure boot is a quick workaround but its a pain to turn on and off. Business users should probably leave it on too…
Reboot
Thats about all of my ideas for fixing Nvidia drivers - all thats left to do is reboot and hope for the best.
Give up/use Nouveo
nouveau Sadly does not allow my external display to work but if its stable and works more power to you.
Something like this should switch drivers or at least get you back to a command prompt
# remove all nvidia packages
apt install xserver-xorg-video-nouveau
rm /etc/modprobe.d/blacklist-nvidia-nouveau.conf
update-initramfs -u
reboot
Laptop suspend/resume crashes
Before suspecting Nvidia driver:
- Ensure running the lastest UEFI/BIOS
- Ensure “Windows/Linux” sleep mode set in UEFI/BIOS if available. If this doesnt work try
Linux/S3
Black/frozen screen on resume? Not sleeping (also check you configured power management to suspend in the first place…)? Completely dead system? Kernel command line options seem to be able to fix this for some models of ThinkPad at least:
edit /etc/default/grub
:
GRUB_CMDLINE_LINUX_DEFAULT="init_on_alloc=0 intel_iommu=off enable_mtrr_cleanup mtrr_spare_reg_nr=4 text"
This will update the default
Linux menu item for your convenience. That makes it easy to not use these options by selecting rescue
instead if still having problems.
Dont forget to run update-grub
after changing /etc/default/grub
.
Testing the Nvidia driver
- External display working? (some laptops only support external display with the proprietary Nvidia driver)
nvidia-smi
should output something like:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06 Driver Version: 525.125.06 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:02:00.0 Off | N/A |
| N/A 53C P8 N/A / N/A | 0MiB / 2048MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
glxgears
loads and displays some gearsglxinfo | grep render
lists vendor as Nvidia. In my case its listed as Mesa so system is not fully accelerated but I have very limited requirements since Im not doing 3D work on this machine. This may be something to do with Nvidia Optimus (prime)- Test if suspend/resume works