0xDEADBEEF odkazy RSS | ⇉deadbeef blog
05/2022 Branch/cmove and compiler optimizations
04/2022 Changing std::sort at Google’s Scale and Beyond
12/2021 Persistence for the Masses: RRB-Vectors in a Systems Language
12/2021 Streaming Graph Partitioning for Large Distributed Graphs
12/2021 Near linear time algorithm to detect community structures in large-scale networks
12/2021 A Simple and Efficient Implementation for Small Databases
12/2021 Popping the Hood on Golden Cove
11/2021 AMX coprocessor
11/2021 Evaluating the Cost of Atomic Operations onModern Architectures
11/2021 Indirection Stream Semantic Register Architecture for Efficient Sparse-Dense Linear Algebra
11/2021 Snitch: A tiny Pseudo Dual-Issue Processor for Area and Energy Efficient Execution of Floating-Point Intensive Workloads
10/2021 Fibonacci Hashing: The Optimization that the World Forgot (or: a Better Alternative to Integer Modulo)
10/2021 How Zen 2’s Op Cache Affects Performance
10/2021 Better Bit Mixing - Improving on MurmurHash3's 64-bit Finalizer
10/2021 Implementing Hash Tables in C
09/2021 Auto-Predication of Critical Branches*
09/2021 Evolution of the Samsung Exynos CPU Microarchitecture
09/2021 Data Compression Accelerator on IBM POWER9and z15 Processors
09/2021 Xuantie-910: A Commercial Multi-Core 12-Stage Pipeline Out-of-Order 64-bit High Performance RISC-V Processor with Vector Extension
09/2021 What Is Macroscalar?
09/2021 The Weird and Wacky World of VIA, the 3rd player in the “Modern” x86 market
08/2021 Accelerating ML Recommendation with over a ThousandRISC-V/Tensor Processors on Esperanto’s ET-SoC-1 Chip
08/2021 ARM or x86? ISA Doesn’t Matter
08/2021 What scientists must know about hardware to write fast code
08/2021 B-Trees: More Than I Thought I'd Want to Know
08/2021 Don't Throw Out Your Algorithms Book Just Yet: Classical Data Structures That Can Outperform Learned Indexes
08/2021 A fast alternative to the modulo reduction
08/2021 Simple and Fast BlockQuicksort using Lomuto’s Partitioning Scheme
08/2021 Accurate Throughput Prediction of Basic Blocks on Recent Intel Microarchitectures
07/2021 Do Low-level Optimizations Matter?
07/2021 shift_dfa.md
07/2021 Beating the L1 cache with value speculation
07/2021 Reverse-engineering the Mali G78
07/2021 BlockQuicksort: How Branch Mispredictions don’t affect Quicksort
07/2021 DeepWalk: Online Learning of Social Representations
06/2021 Scaling Up All Pairs Similarity Search
06/2021 LeapIO: Efficient and Portable Virtual NVMe Storageon ARM SoCs
06/2021 ZoneFS - Zone filesystem for Zoned block devices
06/2021 Don’t Be a Blockhead: Zoned Namespaces Make Workon Conventional SSDs Obsolete
06/2021 Cores that don’t count
05/2021 Computing the number of digits of an integer quickly
05/2021 Hash, displace, and compress (perfect hashing)
05/2021 External Memory Based Algorithm (perfect hashing)
05/2021 I See Deadμops: Leaking Secrets via Intel/AMDMicro-Op Caches
04/2021 Inheritance was invented as a performance hack
04/2021 Apple M1: Load and Store Queue Measurements
04/2021 FITing-Tree: A Data-aware Index Structure
03/2021 An Efficient Algorithmfor Exploiting Multiple Arithmetic Units
03/2021 IBM POWER9 processor core
03/2021 Speculating the entire x86-64 Instruction Set In Seconds with This One Weird Trick
03/2021 Apple M1 Microarchitecture Research (instruction latency and throughput)
03/2021 WarpCore: A Library for fast Hash Tables on GPUs
03/2021 A Variable Vector Length SIMD Architecture forHW/SW Co-designed Processors
03/2021 KUTrace: Where have all the nanoseconds gone?
03/2021 Benchmarking "Hello, World!"
03/2021 Learned Garbage Collection
03/2021 SLAP: A split latency adaptive VLIW pipeline architecture which enables on-the-fly variable SIMD vector-length
03/2021 C-for-Metal: High Performance SIMD Programming on Intel GPUs
03/2021 End-of-buffer checks in decompressors
03/2021 NoFTL-KV: Tackling Write-Amplification on KV-Stores with Native Storage Management
03/2021 Open-Channel SSD (What is it Good For)
03/2021 Evolution of Development Priorities in Key-value Stores Serving Large-scale Applications:The RocksDB Experience
03/2021 Zone Append: A New Way of Writing to Zoned Storage
02/2021 The LibreSOC Project: Simple-V Vectorisation
02/2021 Building Faster AMD64 Memset Routines
01/2021 "RDNA 2" Instruction Set Architecture
01/2021 But how, exactly, databases use mmap?
01/2021 Inlining and Compiler Optimizations
01/2021 That XOR Trick
01/2021 Is this a branch?
01/2021 Inline caching
01/2021 ARM Cortex-A72 execution and load/store
01/2021 On GPUs, ranges, latency, and superoptimisers
01/2021 Parsing: a timeline
01/2021 Generational References
01/2021 A Comprehensive (and Animated) Guide to InnoDB Locking
01/2021 RecSSD: Near Data Processing for Solid State Drive Based Recommendation Inference Extended Abstract
01/2021 NOREBA: A Compiler-Informed Non-Speculative Out-of-Order Commit Processor Extended Abstract
01/2021 SIMDRAM: A Framework for Bit-Serial SIMD Processing Using DRAM Extended Abstract
01/2021 VEGEN: A Vectorizer Generator for SIMD and Beyond
01/2021 Fast Local Page-Tables for Virtualized NUMA Servers with vMitosis Extended Abstract
12/2020 AIR-FI:Generating Covert Wi-Fi Signals fromAir-Gapped Computers
12/2020 Converting floating-point numbers to integers while preserving order
12/2020 Regex literals optimization
12/2020 D's Auto Decoding and You
12/2020 A Rule-Based Style and Grammar Checker
11/2020 How fast does interpolation search converge?
11/2020 Transport triggered architecture
11/2020 HiPEAC 2020 keynote 1: James Mickens on software-defined microarchitecture
11/2020 Achieving 100Gbps intrusion prevention on a single server
11/2020 PopCount on ARM64 in Go Assembler
11/2020 Engineering In-place (Shared-memory) Sorting Algorithms
11/2020 JIT Compiler of PCRE2
11/2020 Producing Wrong Data Without Doing Anything Obviously Wrong!
10/2020 An Empirical Evaluation of Set Similarity Join Techniques
10/2020 An empirical evaluation of exact set similarity join techniques using GPUs
10/2020 PM-LSH: A Fast and Accurate LSH Framework for High-Dimensional Approximate NN Search
10/2020 Scalable Blocking for Very Large Databases
10/2020 Finding Bytes in Arrays
10/2020 A fastk-means implementation using coresets
10/2020 Ridiculously fast unicode (UTF-8) validation
10/2020 The Arm64 memory tagging extension in Linux
10/2020 k-means++: The Advantages of Careful Seeding
10/2020 A Fast Exact k-Nearest Neighbors Algorithm for High Dimensional Search Using k-Means Clustering and Triangle Inequality
10/2020 Storage strategies for collections in dynamically typed languages
10/2020 Why Aren’t More Users More Happy With Our VMs? Part 2
10/2020 Why Aren’t More Users More Happy With Our VMs? Part 1
10/2020 Custom Allocators Demystified
10/2020 Loading CSV File at the Speed Limit of the NVMe Storage
10/2020 When Network is Faster than Cache
10/2020 Understand std::atomic::compare_exchange_weak() in C++11
09/2020 SIMD transposes 1
09/2020 D Slices
09/2020 Static Analysis of Java Enterprise Applications: Frameworks and Caches, the Elephants in the Room
09/2020 Hoare’s Rebuttal and Bubble Sort’s Comeback
09/2020 HW and SW rules of thumb.
09/2020 Faster intersections between sorted arrays with shotgun
09/2020 Virtual Memory Tricks
09/2020 CARAT: A Case for Virtual Memory through Compiler- and Runtime-Based Address Translation
09/2020 The Cost of Software-Based Memory Management Without Virtual Memory
09/2020 Go Your Own Way (Part Two: The Heap)
09/2020 Go Your Own Way (Part One: The Stack)
09/2020 Life in the Fast Lane
09/2020 Don’t Fear the Reaper
09/2020 Fast random pair divisive construction of kNN graph using generic distance measures
09/2020 NN-Descent on High-Dimensional Data
09/2020 Efficient K-Nearest Neighbor Graph Construction for Generic Similarity Measures
09/2020 Improving Locality Sensitive Hashing by Efficiently Finding Projected Nearest Neighbors
09/2020 On Modern Hardware the Min-Max Heap beats a Binary Heap
09/2020 Sentinels can be faster
08/2020 Performance Impact of Parallel Disk Access
08/2020 split-forwarding.md
08/2020 Zen 2 - Microarchitectures - AMD
08/2020 Intelligent Probing for Locality Sensitive Hashing: Multi-Probe LSH and Beyond
08/2020 Query-Aware Locality-Sensitive Hashing for Approximate Nearest Neighbor Search
08/2020 Locality-Sensitive Hashing Scheme Based on Dynamic Collision Counting
08/2020 Fast Search of Binary Codes with Distinctive Bits
08/2020 Ultra Fast Medoid Identification via Correlated Sequential Halving
08/2020 Medoids in almost linear time via multi-armed bandits
08/2020 Fast Approximation of Centrality
08/2020 fuzzy jaccard
08/2020 The Power of Comparative Reasoning (WTA hash, winner take all)
08/2020 What Is The Minimal Set Of Optimizations Needed For Zero-Cost Abstraction?
08/2020 4K Aliasing
08/2020 Garbage Collector Code Artifacts: Card Marking
08/2020 Hardware Store Elimination
08/2020 When Escape Analysis fails you?
08/2020 Speculation in JavaScriptCore
08/2020 The ABC’s of Templates in D
07/2020 Coding for Random Projections
07/2020 Min-Max Hash for Jaccard Similarity
07/2020 Efficient nearest neighbors inspired by the fruit fly brain
07/2020 Programmers Need To Learn Statistics Or I Will Kill Them All
07/2020 SonicBOOM: The 3rd Generation Berkeley Out-of-Order Machine
07/2020 LSH Forest: Self-Tuning Indexes for Similarity Search
07/2020 Fast Intersection of Sorted Lists Using SSE Instructions
07/2020 Latency implications of virtual memory
07/2020 Why Java's TLABs are so important and why write contention is a performance killer in multicore environments
07/2020 A Concurrency Cost Hierarchy
07/2020 How JIT Compilers are Implemented and Fast: Julia, Pypy, LuaJIT, Graal and More
07/2020 A Deep Introduction to JIT Compilers: JITs are not very Just-in-time
07/2020 Improved Densification of One Permutation Hashing
07/2020 Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search
07/2020 Rapid Similarity Search with Weighted Min-Hash
07/2020 'Fastware' - Andrei Alexandrescu
07/2020 Moving Garbage Collection with Low-Variation Memory Overhead and Deterministic Concurrent Relocation
07/2020 How do 'hot and cold' objects behave?
06/2020 Advanced Matrix Extension (AMX) - x86
06/2020 Asymmetric Minwise Hashing
06/2020 Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS)
06/2020 The rapid growth of io_uring
06/2020 Radix sort: sorting integers (often) faster than std::sort.
06/2020 Faster than radix sort: Kirkpatrick-Reisch sorting
06/2020 The Cache Replacement Problem
06/2020 The CHERI CPU Hardware software co design for security
05/2020 Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections
05/2020 AVX-512 Mask Registers, Again
05/2020 Benchmarking for Good with Aleksey Shipilev
05/2020 Ice Lake Store Elimination
05/2020 Lomuto’s Comeback
05/2020 Unikernels: The Next Stage of Linux’s Dominance
05/2020 How the Go runtime implements maps efficiently (without generics)
05/2020 An history of NVidia Stream Multiprocessor
05/2020 Graph-of-word and TW-IDF: New Approach to Ad Hoc IR
04/2020 MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams
04/2020 Java Objects Inside Out
04/2020 A Text Network Representation Model
03/2020 Undocumented CPU Behavior: Analyzing Undocumented Opcodes on Intel x86-64
03/2020 Understanding CPU Microarchitecture to Increase Performance
03/2020 Measuring Time: From Java to Kernel and back
03/2020 An empirical guide to the behavior and use of scalable persistent memory
03/2020 Avoiding cache line overlap by replacing one 256-bit store with two 128-bit stores
03/2020 Writing a full-text search engine using Bloom filters
01/2020 Modern B-Tree Techniques
01/2020 Everything I know about SSDs
01/2020 A Look At Celerity’s Second-Gen 496-Core RISC-V Mesh NoC
01/2020 Random Indexing Explained with High Probability
01/2020 ComputeDRAM: In-Memory Compute Using Off-the-Shelf DRAMs
01/2020 Performance variation in 2,386 ‘identical’ processors
12/2019 A Position-Biased PageRank Algorithm for Keyphrase Extraction
12/2019 TextRank: Bringing Order into Texts
12/2019 Xor Filters: Faster and Smaller Than Bloom and CuckooFilters
12/2019 Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology
12/2019 Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives
11/2019 Fast Bulk Bitwise AND and OR in DRAM
11/2019 Computer Architecture - Lecture 6b: Computation in Memory I - Onur Mutlu
11/2019 Base64 encoding and decoding at almost the speed of a memory copy (Lemire)
10/2019 Could a Neuroscientist Understand a Microprocessor?
09/2019 Hierarchical PLABs, CLABs, TLABs in Hotspot
08/2019 Faster threshold queries with cache-sensitive scancount
07/2019 Revec: Program Rejuvenation through Revectorization
07/2019 Hiding Data in Hard-Drive’s Service Areas
07/2019 Hard Drive of Hearing: Disks that Eavesdrop with a Synthesized Microphone
07/2019 Future Directions for Optimizing Compilers
05/2019 Multilayer ROP Protection via Microarchitectural Units Available in Commodity Hardware
05/2019 x86-64 Instruction Usage among C/C++ Applications
05/2019 Basic Performance Measurements of the Intel Optane DC Persistent Memory Module
05/2019 Cheap transistors, expensive wires
05/2019 A Hardware Accelerator for Tracing Garbage Collection
05/2019 Word Hy-phen-a-tion by Com-put-er
05/2019 I/O Is Faster Than the CPU – Let’s Partition Resources and Eliminate (Most) OS Abstractions
04/2019 Design of the RISC-V Instruction Set Architecture
04/2019 DIMMer: A case of turning off DIMMs in clouds.
04/2019 And Then There Were None: A Stall-Free Real-Time Garbage Collector for Reconfigurable Hardware
03/2019 Lost in translation: Exposing hidden compiler optimization opportunities
03/2019 Accelerators for Data Processing
02/2019 Array Bounds Check Elimination for the Java HotSpot TM Client Compiler
02/2019 Mesh: Compacting Memory Management for C/C++ Applications
02/2019 Large-Scale Reconfigurable Computing in a Microsoft Datacenter (FPGA)
01/2019 The Case for Network-Accelerated Query Processing
01/2019 Faster intersections between sorted arrays with shotgun
12/2018 MinHashing
12/2018 Why Systolic Architectures
11/2018 BOLT: A Practical Binary Optimizer for Data Centers and Beyond
11/2018 WaveFunctionCollapse (Bitmap & tilemap generation from a single example with the help of ideas from quantum mechanics.)
11/2018 Beating hash tables with trees? The ART-ful radix trie
11/2018 ISA Semantics for ARMv8-A, RISC-V, and CHERI-MIPS
11/2018 Measuring the memory-level parallelism of a system using a small C++ program?
11/2018 How to implement strings
09/2018 How to Architect a Query Compiler, Revisited
09/2018 Shenandoah GC: The Garbage Collector That Could : Aleksey Shipilev
08/2018 Getting 4 bytes or a full cache line: same speed or not?
07/2018 Dynamic Vectorization in the E2 Dynamic Multicore Architecture
07/2018 An Evaluation of the TRIPS Computer System
07/2018 Exploiting Superword Level Parallelism with Multimedia Instruction Sets
05/2018 ispc: A SPMD Compiler for High-Performance CPU Programming
05/2018 Lecture 7: The Programmable GPU Core
05/2018 Mison: A Fast JSON Parser for Data Analytics
04/2018 Active Pages: A Computation Model for Intelligent Memory
04/2018 Radix Sort for Vector Multiprocessors
04/2018 Scalable Processors in the Billion-Transistor Era: IRAM
04/2018 Iterating in batches over data structures can be much faster…
04/2018 ROLP: Runtime Object Lifetime Profiling for Big Data Memory Management
04/2018 The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities
03/2018 Fast, Flexible, Polyglot Instrumentation Support for Debuggers and other Tools
03/2018 The surprising creativity of digital evolution
02/2018 Parallel generational-copying garbage collection with a block-structured heap
01/2018 JOP: A Java Optimized Processor
01/2018 Using a Java Optimized Processor in a Real World Application
01/2018 High-performance throughput computing
01/2018 Fantastic Timers and Where to Find Them: High-Resolution Microarchitectural Attacks in JavaScript
01/2018 What a difference a JVM makes?
12/2017 JVM Anatomy Park #18: Scalar Replacement
12/2017 Aleksandar Prokopec - Making Collection Operations Optimal with Aggressive JIT Compilation
11/2017 KV-Direct: High-performance in-memory key-value store with programmable NIC
11/2017 A fast alternative to the modulo reduction
11/2017 Fast exact integer divisions using floating-point operations
11/2017 UnifiedMap: How it works?
11/2017 A Branchless UTF-8 Decoder
11/2017 What every systems programmer should know about lockless concurrency
10/2017 Zebras All the Way Down - Bryan Cantrill, Uptime 2017
10/2017 Virtual Machine Warmup Blows Hot and Cold
10/2017 The Renewed Case for the Reduced Instruction Set Computer: Avoiding ISA Bloat with Macro-Op Fusion for RISC-V
09/2017 STREAM VBYTE: Faster Byte-Oriented Integer Compression
09/2017 SFS: Random Write Considered Harmful in Solid State Drives
09/2017 Energy Efficiency across Programming Languages
09/2017 B-trees, Shadowing, and Clones
08/2017 Strategies for Branch Target Buffers
08/2017 The YAGS Branch Prediction Scheme
08/2017 Dynamic Branch Prediction with Perceptrons
08/2017 JavaScript for extending low-latency in-memory key-value stores
08/2017 Efficient Immutable Collections
08/2017 One-pass Code Generation in V8
08/2017 Nom, a byte oriented, streaming, zero copy, parser combinators library in Rust
08/2017 NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories
08/2017 Breaking the x86 ISA
07/2017 Pruning spaces from strings quickly on ARM processors
06/2017 Top speed for top-k queries
06/2017 Practical Partial Evaluation for High-Performance Dynamic Language Runtimes
06/2017 NG2C: Pretenuring N-Generational GC for HotSpot Big Data Applications
06/2017 QuickSelect versus binary heap for top-k queries
06/2017 Quickly returning the top-k elements: computer science vs. the real world
05/2017 Counting exactly the number of distinct elements: sorted arrays vs. hash sets?
05/2017 Typed Architectures: architectural support for lightweight scripting
05/2017 Adaptive Cuckoo Filters
04/2017 Improving user perceived page load time using gaze
04/2017 Glob Matching Can Be Simple And Fast Too
04/2017 JIT and instanceof
04/2017 Towards Efficient Dynamic Integer Overflow Detection on ARM Processors
04/2017 Java Microbenchmark Harness: The Lesser of Two Evils
04/2017 Beauty and the Burst: Remote Identification of Encrypted Video Streams
04/2017 Vectorization in HotSpot JVM
04/2017 Dynamic Coarse Grained Reconfigurable Architectures
03/2017 The Structure and Performance of Efficient Interpreters
03/2017 Strided Sampling Hashed Perceptron Predictor
02/2017 Beyond the words: predicting user personality from heterogeneous information
02/2017 Programming Languages: History and Future
01/2017 Devirtualization
01/2017 The Pauseless GC Algorithm
01/2017 Grail Quest: A New Proposal for Hardware-assisted Garbage Collection
01/2017 Tightly Packed Tries: How to Fit Large Models into Memory, and Make them Load Fast, Too
01/2017 Multiprocessors Should Support Simple Memory Consistency Models
01/2017 The Silently Shifting Semicolon (memory models)
01/2017 WeeFence: Toward Making Fences Free in TSO
01/2017 Dynamo: A Transparent Dynamic Optimization System
01/2017 Fast Haskell: Competing with C at parsing XML
01/2017 Runtime Pointer Disambiguation
01/2017 Be nice to your cache
01/2017 DawnCC: a Source-to-Source Automatic Parallelizer of C and C++ Programs
12/2016 Sorting improves word-aligned bitmap indexes
12/2016 Faster Population Counts using AVX2 Instructions
12/2016 Myths and Realities: The Performance Impact of Garbage Collection
12/2016 The LuaJIT Wiki Garbage Collector
12/2016 Transforming static data structures to dynamic structures
12/2016 The Bloomier Filter: An Efficient Data Structure for Static Support Lookup Tables
12/2016 Garbage Collection Algorithms
12/2016 Scapegoat tree
11/2016 Optimizing Data Structures in High-Level Programs - New Directions for Extensible Compilers based on Staging
11/2016 FUNCTIONAL PEARL The Zipper
11/2016 Cache Conscious Indexing for Decision-Support in Main Memory
11/2016 Permutation Search Methods are Efficient, Yet Faster Search is Possible
11/2016 Engineering Efficient and Effective Non-Metric Space Library
11/2016 Succinct Nearest Neighbor Search (NAPP, Neighborhood approximation inverted index)
11/2016 Learning to Prune in Metric and Non-Metric Spaces
11/2016 Hamming Compatible Quantization for Hashing
11/2016 Off the Beaten Path: Let’s Replace Term-Based Retrieval with k-NN Search
11/2016 Large-Scale Distributed Locality-Sensitive Hashing for General Metric Data (DFLSH, Voroni LSH)
11/2016 Effective Proximity Retrieval by Ordering Permutations
11/2016 Speeding Up Permutation Based Indexing with Indexing
11/2016 Practical and Optimal LSH for Angular Distance (cross-polytope LSH)
11/2016 Large-scale similarity data management with distributed Metric Index (data space mapping)
11/2016 Metric Space Searching Based on Random Bisectors and Binary Fingerprints
11/2016 A Brief Index for Proximity Searching (Brief Permutation Index)
11/2016 On Locality Sensitive Hashing in metric spaces (Brief Permutation Index)
11/2016 LSH forest: self-tuning indexes for similarity search.
11/2016 Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search
11/2016 K-medoids LSH: a new locality sensitive hashing in general metric space
11/2016 Comparative Analysis of Data Structures for Approximate Nearest Neighbor Search (small worls graphs are the fastest)
11/2016 Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs
11/2016 Accelerating Search and Recognition Workloads with SSE 4.2 String and Text Processing Instructions
11/2016 Fast Sorted-Set Intersection using SIMD Instructions
10/2016 ELB trees - efficient lock-free B+trees
10/2016 Optimal Incremental Sorting
10/2016 A Fast Algorithm for Computing Longest Common Subsequences
10/2016 Algorithm for Computing Maximal Common Subsequence (longest common subsequence)
10/2016 Ordered hash table
10/2016 A Fast Write Barrier for Generational Garbage Collectors
10/2016 SparseDTW: A Novel Approach to Speed up Dynamic Time Warping
10/2016 SparseDTW: A Novel Approach to Speed up Dynamic Time Warping
10/2016 FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space
09/2016 Efficient and Thread-Safe Objects for Dynamically-Typed Languages
09/2016 How the JVM compares your strings using the craziest x86 instruction you've never heard of
08/2016 Improving the Energy Efficiency of Big Cores
04/2016 Why Nothing Matters: The Impact of Zeroing
04/2016 Fast Algorithms for Sorting and Searching Strings (radix quicksort, multi-key quicksort)
04/2016 Efficient Trie-Based Sorting of Large Sets of Strings
04/2016 Burst Tries: A Fast, Efficient Data Structure for String Keys
04/2016 Cache-Conscious Sorting of Large Sets of Strings with Dynamic Tries
04/2016 The Average Case Analysis of Partition Sorts
03/2016 Min-Max Heaps and Generalized Priority Queues
03/2016 Partial Escape Analysis and Scalar Replacement for Java
03/2016 Generic Multiset Programming with Discrimination-based Joins and Symbolic Cartesian Products
03/2016 Generic Top-down Discrimination for Sorting and Partitioning in Linear Time
03/2016 Bloofi: Multidimensional Bloom Filters
03/2016 Lock Holder Preemption Avoidance via Transactional Lock Elision
03/2016 Optimal sorting algorithms for parallel computers
02/2016 Concurrent Search Tree by Lazy Splaying
02/2016 Implementing sets efficiently in a functional language (trees of bounded balance) http://groups.csail.mit.edu/mac/users/adams/BB/
01/2016 Efficient Set Intersection for Inverted Indexing
01/2016 Better bitmap performance with Roaring bitmaps
01/2016 The Technology Behind Crusoe™ Processors
01/2016 Simple, proven approaches to text retrieval
12/2015 Virtually free - JVM callsite optimization by example
12/2015 Programming Interfaces to Non-Volatile Memory
11/2015 FlexSC: Flexible System Call Scheduling with Exception-Less System Calls
11/2015 Fun C Micro-optimizations - restrict
11/2015 Interlude: Numerical experiments in hashing
11/2015 More numerical experiments in hashing: a conclusion (Robin Hood hashing)
11/2015 Robin Hood Hashing should be your default Hash Table implementation
11/2015 Robin Hood hashing
11/2015 R-trees Have Grown Everywhere
11/2015 Elastic Binary Trees - ebtree
11/2015 Code Specialization for Memory Efficient Hash Tries (Short Paper)
11/2015 Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections
11/2015 The Cache Performance and Optimization of Blocked Algorithms
10/2015 Nearest neighbors and vector models – epilogue – curse of dimensionality
10/2015 How does Java Both Optimize Hot Loops and Allow Debugging
10/2015 Mobile Processors for Energy-Efficient Web Search
10/2015 Finger Trees Custom Persistent Collections - Chris Houser
10/2015 Evaluating HTM for pauseless garbage collectors in Java
09/2015 Array layouts for comparison-based searching
09/2015 Conc-Trees for Functional and Parallel Programming
09/2015 High Performance and Scalable Radix Sorting: A case study of implementing dynamic parallelism for GPU computing
09/2015 Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort
09/2015 Generic Top-down Discrimination for Sorting and Partitioning in Linear Time
09/2015 Adaptive Lock-Free Maps: Purely-Functional to Scalable
09/2015 Making Data Structures Persistent
09/2015 Cache-oblivious data structures
09/2015 Memory Coherence in Shared Virtual Memory Systems
09/2015 Unsupervised Feature Selection on Data Streams / Streaming Anomaly Detection Using Randomized Matrix Sketching
09/2015 ZuriHac 2015 - Discrimination is Wrong: Improving Productivity (Kmett)
09/2015 IFL 2012. Fritz Henglein: Generic sorting and partitioning in linear time and fully abstractly
08/2015 How Branch Mispredictions Affect Quicksort
08/2015 PARADIS: An Efficient Parallel Algorithm for In-place Radix Sort
08/2015 Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited
08/2015 Functional Pearl: A SQL to C Compiler in 500 Lines of Code
08/2015 Fenwick Trees (prefix sums)
08/2015 Branch Prediction and the Performance of Interpreters - Don't Trust Folklore
08/2015 A Primer on Memory Consistency and Cache Coherence
07/2015 Jump Threading
07/2015 Faster Cover Trees
07/2015 Carnegie Mellon - Parallel Computer Architecture 2012-Onur Mutlu - Lec 22 - Dataflow I
07/2015 Adaptive Just-in-time Value Class Optimization
07/2015 Trace-based Just-in-time Compilation for Lazy Functional Programming Languages
07/2015 Adaptive Range Filters for Cold Data: Avoiding Trips to Siberia
07/2015 Diff-Index: Differentiated Index in Distributed Log-Structured Data Stores
07/2015 Stratified B-trees and versioning dictionaries
07/2015 Log Structured Merge Trees (LSMT)
07/2015 A Relational Model of Data for Large Shared Data Banks
07/2015 Monet - A Next-Generation DBMS Kernel For Query-Intensive Applications
07/2015 Your computer is already a distributed system. Why isn't your OS?
06/2015 MonetDB/X100: Hyper-Pipelining Query Execution
06/2015 MonetDB: Two Decades of Research in Column-oriented Database Architectures
06/2015 mov is Turing-complete
06/2015 SnapQueue: Lock-Free Queue with Constant Time Snapshots
06/2015 Scalable Bloom Filters
06/2015 Zero-Overhead Metaprogramming Reflection and Metaobject Protocols Fast and without Compromises (GraalVM)
06/2015 Early Experience with a Commercial Hardware Transactional Memory Implementation
06/2015 Accelerating Native Calls using Transactional Memory
06/2015 Architecture of a Database System
06/2015 Purely Functional Data Structures (Okasaki)
06/2015 CS 61B Lecture 34: Splay Trees
06/2015 CS 61B Lecture 31: Disjoint Sets
06/2015 CS 61B Lecture 36: Randomized Analysis
06/2015 CS 61B Lecture 35: Amortized Analysis
06/2015 Scalability! But at what COST? (paper)
06/2015 Scalability! But at what COST?
06/2015 IA Memory Ordering (x86 memory model)
05/2015 Module 2.4 - Cache Coherence - 740: Computer Architecture 2013 - Carnegie Mellon - Onur Mutlu
05/2015 Stable Distributions, Pseudorandom Generators, Embeddings, and Data Stream Computation
05/2015 Hashing, sketching, and other approximate algorithms for high-dimensional data
05/2015 Locality-Sensitive Hashing Scheme Based on p-Stable Distributions
05/2015 Beyond the PDP-11: Architectural support for a memory-safe C abstract machine
05/2015 Pycket: A Tracing JIT For a Functional Language
05/2015 Collaborative Filtering Recommender Systems
05/2015 Scaling Concurrent Log-Structured Data Stores
04/2015 Asynchronized Concurrency: The Secret to Scaling Concurrent Search Data Structures
04/2015 A Study of CRDTs that do Computations
04/2015 Advanced Data Structures: Session 10: Dictionaries
04/2015 The Design and Implementation of Modern Column-Oriented Database Systems
03/2015 x86 is a high-level language
03/2015 Introduction to HAMT
03/2015 Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers
03/2015 Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores
03/2015 Paper: Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores
02/2015 Compressed Text Indexes: From Theory to Practice!
02/2015 Command-line tools can be 235x faster than your Hadoop cluster
02/2015 Predecessor search for Big Data: x-fast tries, locality of reference and all that
02/2015 Understanding and Expressing scalable Concurrency
02/2015 The Art of Approximating Distributions: Histograms and Quantiles at Scale
02/2015 q-digest - Medians and Beyond: New Aggregation Techniques for Sensor Networks
02/2015 q-digest : an algorithm for computing approximate quantiles on a collection of integers
02/2015 References for Data Stream Algorithms
02/2015 A closer Look at GPUs
02/2015 Lecture 14 - Out-of-Order Execution - Carnegie Mellon - Computer Architecture 2013 - Onur Mutlu
01/2015 Operation fusion and deforestation for Scala
01/2015 Specialized Evolution of the General-Purpose CPU
01/2015 Immutability Changes Everything
01/2015 The Design of Approximation Algorithms
01/2015 Scalability! But at what COST?
01/2015 Programming on Parallel Machines - GPU, Multicore, Clusters and More
01/2015 Mergeable persistent data structures
01/2015 High Performance Hardware-Accelerated Flash Key-Value Store
12/2014 SipHash: a fast short-input PRF
12/2014 Analysis of Pivot Sampling in Dual-Pivot Quicksort, A Holistic Analysis of Yaroslavskiy’s Partitioning Scheme
11/2014 Suffix Trees and their Applications in String Algorithms
11/2014 Borislav Petkov: x86 instruction encoding and the nasty hacks we do in the kernel
11/2014 Don’t Thrash: How to Cache Your Hash on Flash
10/2014 Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming
10/2014 Cuckoo Filter: Practically Better than Bloom
10/2014 Resizable, Scalable, Concurrent Hash Tables via Relativistic Programming
10/2014 What's the deal with Hardware Transactional Memory!?! (linux.conf.au 2014)
10/2014 SimHash or the way to compare quickly two datasets
09/2014 An Overview of Kernel Lock Improvements (on huge NUMA systems)
09/2014 Invertible Bloom Lookup Tables
09/2014 Instruction Sets Should Be Free: The Case For RISC-V
08/2014 Accurate Methods for the Statistics of Surprise and Coincidence
08/2014 Epiphany Architecture Reference
08/2014 Perceptual Hashing (searching similar images, reverse image search)
08/2014 The Bw-Tree: A B-tree for New Hardware Platforms
07/2014 Philip Wadler: Why no one uses functional languages
07/2014 Rank and select for succinct data structures
07/2014 Bridging Islands of Specialized Code using Macros and Reified Types
06/2014 Data types ala carte
06/2014 Throw away the keys: Easy, Minimal Perfect Hashing
06/2014 MICA: A Holistic Approach to Fast In-Memory Key-Value Storage
05/2014 SQL versus coSQL — a compendium to Erik Meijer’s paper
04/2014 Mark Hill CPU, TLB
03/2014 Similarity Measurement on Leaf-labelled Trees
03/2014 Modern Microprocessors - A 90 Minute Guide!
03/2014 Is Parallel Programming Hard, And, If So, What Can You Do About It?, Hardware and its Habits
03/2014 Why Functional Programming Matters
02/2014 Fast Computation of min-Hash Signatures for Image Collections
02/2014 Scalable, Example-Based Refactorings with Refaster
01/2014 Efficient Implementation of Sorting on Multi-Core SIMD CPU Architecture
01/2014 AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors
01/2014 An Experimental Study of Sorting and Branch Prediction
01/2014 Data Structures and Algorithms for Nearest Neighbor Search in General Metric Spaces (VP-tree)
01/2014 Duplicate News Story Detection Revisited
12/2013 Monoids: Theme and Variations (Functional Pearl)
12/2013 Amazon.com Recommendations - Item-to-Item Collaborative Filtering
12/2013 HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm
12/2013 Phil Bagwell, Ideal Hash Trees, 2001
12/2013 Phil Bagwell, Fast And Space Efficient Trie Searches, 2000
11/2013 HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm
11/2013 Memory system compression and its benefits
11/2013 The Night Watch https://research.microsoft.com/en-us/people/mickens/thenightwatch.pdf
10/2013 Miniboxing: Improving the Speed to Code Size Tradeoff in Parametric Polymorphism Translations
10/2013 Iterative Ranking from Pair-wise Comparisons
09/2013 Fast Mergable Integer Maps
09/2013 Sketch of the Day: Frugal Streaming (median, rank)
09/2013 TRASH A dynamic LC-trie and hash data structure
09/2013 RAY: Integrating Rx and Async for Direct-Style Reactive Streams
09/2013 Instruction tables Lists of instruction latencies, throughputs and micro-operation break-downs for Intel, AMD and VIA CPUs
07/2013 Memory Barriers: a Hardware View for Software Hackers
07/2013 Monads for functional programming
01/2013 RRB-Trees: Efficient Immutable Vectors
10/2012 Five Myths about Hash Tables
01/2012 dablooms - an open source, scalable, counting bloom filter library